Atom AI Labs - AI-Powered Multi-Tenant Platform

Redis Monitoring & Alerting Guide

Overview

This guide shows how to set up monitoring for your Upstash Redis usage to prevent future suspensions.

---

Quick Setup

1. Test the Monitor Script

cd backend-saas
python3 scripts/monitor_redis_usage.py

**Expected Output:**

Redis Usage Monitor Starting...
Command Limit: 10,000/day
Warning Threshold: 70.0%
Critical Threshold: 90.0%
Redis Usage Summary:
  Total Commands: 1,234
  Current Rate: 2.5 ops/sec
  Predicted Daily: 5,400
  Keys: 45
  Hit Rate: 85.3%

2. Set Up Alerts (Optional)

Send alerts to Slack or Discord when usage spikes:

# For Slack
export ALERT_WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

# For Discord
export ALERT_WEBHOOK_URL="https://discord.com/api/webhooks/YOUR/WEBHOOK/URL"

# Run monitor
python3 scripts/monitor_redis_usage.py

3. Schedule Automatic Checks

Option A: Cron (Recommended)

# Edit crontab
crontab -e

# Add these lines:
# Check Redis usage every 15 minutes
*/15 * * * * cd /app/backend-saas && /usr/bin/python3 scripts/monitor_redis_usage.py >> /var/log/redis_monitor.log 2>&1

Option B: Systemd Timer

Create /etc/systemd/system/redis-monitor.service:

[Unit]
Description=Redis Usage Monitor
After=network.target

[Service]
Type=oneshot
WorkingDirectory=/app/backend-saas
ExecStart=/usr/bin/python3 scripts/monitor_redis_usage.py
EnvironmentFile=/etc/environment

Create /etc/systemd/system/redis-monitor.timer:

[Unit]
Description=Run Redis monitor every 15 minutes

[Timer]
OnCalendar=*:0/15
Persistent=true

[Install]
WantedBy=timers.target

Enable the timer:

sudo systemctl daemon-reload
sudo systemctl enable redis-monitor.timer
sudo systemctl start redis-monitor.timer

Option C: Fly.io Cron Job

Add to your fly.toml:

[experimental]
  cmd = ["python3", "scripts/monitor_redis_usage.py"]

[[vm]]
  cpu_kind = "shared"
  cpus = 1
  memory_mb = 256

---

Understanding the Metrics

Total Commands

**What:** Total Redis operations since database creation
**Good:** Increasing slowly (10-100 per second)
**Bad:** Increasing rapidly (1000+ per second)
**Critical:** Approaching daily limit (10K for free tier)

Ops Per Sec

**What:** Current operation rate
**Normal:** 1-10 ops/sec for inactive system
**High:** 100+ ops/sec during heavy usage
**Dangerous:** 1000+ ops/sec (cache bug or infinite loop)

Predicted Daily Usage

**What:** Expected total commands by end of day
**Formula:** current + (rate × seconds_remaining)
**Action:** If predicted > limit, investigate immediately

Hit Rate

**What:** Percentage of cache hits vs misses
**Good:** 80%+ (caching working well)
**Bad:** <50% (caching ineffective)
**Terrible:** <20% (no caching or wrong cache keys)

---

Alert Thresholds

Warning (70% of limit)

**Triggers:** At 7,000 commands (for 10K limit)
**Action:** Investigate what's causing high usage
**Check:** Are you testing? Load testing? Background jobs?

Critical (90% of limit)

**Triggers:** At 9,000 commands (for 10K limit)
**Action:** Immediate investigation required
**Consequence:** Database will be suspended soon

---

Troubleshooting High Usage

Step 1: Check Current Rate

python3 scripts/monitor_redis_usage.py

If ops_per_sec > 100, you have a problem.

Step 2: Find the Source

Check for Tight Loops

grep -rn "asyncio.sleep(0" backend-saas/core --include="*.py"

Look for:

sleep(0.01) = 100 polls/sec
sleep(0.1) = 10 polls/sec
sleep(0.5) = 2 polls/sec

Check Cache Usage

grep -rn "cache.get\|redis_cache.get" backend-saas/api --include="*.py" | wc -l

If count > 50, you might be over-caching.

Check Background Workers

ps aux | grep -E "worker|scheduler|monitor"

Look for multiple instances of the same worker.

Step 3: Fix the Issue

Common Fixes

**Add Caching**

# ✅ GOOD: Uses cache

def get_data(key):

cached = cache.get(f"data:{key}")

if cached:

return cached

data = db.query("SELECT * FROM data WHERE id = ?", key)

cache.set(f"data:{key}", data, ttl=3600)

return data

```

**Reduce Polling Frequency**

# ✅ GOOD: Polls every minute

while True:

check_status()

await asyncio.sleep(60)

```

**Batch Operations**

# ✅ GOOD: 1 query

items = db.query("SELECT * FROM items WHERE id IN (?)", item_ids)

```

---

Upstash Pricing Tiers

Tier	Price	Commands/Day	Commands/Month	Use Case
Free	$0	10,000	300,000	Development
Basic	$0.20/day	10,000	300,000	Production (low traffic)
Pro	$0.50/day	100,000	3,000,000	Production (medium traffic)
Scale	$2.00/day	1,000,000	30,000,000	Production (high traffic)

**Recommendation:** Start with **Pro tier** ($0.50/day) for production to avoid suspensions.

---

Emergency Procedures

If Redis Gets Suspended

**Check Status**

**Contact Upstash Support**

Email: support@upstash.com
Subject: "Database Suspension - {Your Database ID}"
Message: "My Redis database was suspended for exceeding limits. I've fixed the bug causing excessive reads. Can you unsuspend it?"

**Upgrade Your Plan**

Log in to Upstash console
Navigate to your database
Upgrade to higher tier

**Verify Fix is Deployed**

**Monitor Closely**

Run monitor script every 5 minutes
Watch for usage spikes
Check logs for errors

---

Best Practices

✅ DO

**Cache Frequently Accessed Data**

Tenant lookups (TTL: 1 hour)
User sessions (TTL: 30 minutes)
Configuration (TTL: 15 minutes)

**Use Appropriate TTLs**

Static data: 1 hour+
Dynamic data: 5-15 minutes
Session data: 30 minutes

**Monitor Usage**

Check stats daily
Set up alerts
Review trends weekly

**Test in Staging**

Load test in staging environment
Monitor Redis usage during tests
Catch issues before production

❌ DON'T

**Cache Everything**

Don't cache rapidly changing data
Don't cache large objects (>1MB)
Don't cache without TTL

**Poll Aggressively**

Avoid sub-second polling
Use webhooks instead of polling
Batch operations when possible

**Ignore Limits**

Know your provider's limits
Set alerts at 70% of limit
Upgrade before hitting limits

**Deploy Without Testing**

Test in staging first
Monitor Redis usage during deployment
Have rollback plan ready

---

Additional Resources

**Upstash Documentation:** https://upstash.com/docs
**Upstash Pricing:** https://upstash.com/pricing
**Redis Best Practices:** https://redis.io/topics/lru-cache
**Redis Monitoring:** https://redis.io/topics/admin

---

**Last Updated:** 2026-04-09

**Version:** 1.0

**Maintainer:** DevOps Team