ATOM Documentation

← Back to App

Redis Monitoring & Alerting Guide

Overview

This guide shows how to set up monitoring for your Upstash Redis usage to prevent future suspensions.

---

Quick Setup

1. Test the Monitor Script

cd backend-saas
python3 scripts/monitor_redis_usage.py

**Expected Output:**

Redis Usage Monitor Starting...
Command Limit: 10,000/day
Warning Threshold: 70.0%
Critical Threshold: 90.0%
Redis Usage Summary:
  Total Commands: 1,234
  Current Rate: 2.5 ops/sec
  Predicted Daily: 5,400
  Keys: 45
  Hit Rate: 85.3%

2. Set Up Alerts (Optional)

Send alerts to Slack or Discord when usage spikes:

# For Slack
export ALERT_WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

# For Discord
export ALERT_WEBHOOK_URL="https://discord.com/api/webhooks/YOUR/WEBHOOK/URL"

# Run monitor
python3 scripts/monitor_redis_usage.py

3. Schedule Automatic Checks

# Edit crontab
crontab -e

# Add these lines:
# Check Redis usage every 15 minutes
*/15 * * * * cd /app/backend-saas && /usr/bin/python3 scripts/monitor_redis_usage.py >> /var/log/redis_monitor.log 2>&1

Option B: Systemd Timer

Create /etc/systemd/system/redis-monitor.service:

[Unit]
Description=Redis Usage Monitor
After=network.target

[Service]
Type=oneshot
WorkingDirectory=/app/backend-saas
ExecStart=/usr/bin/python3 scripts/monitor_redis_usage.py
EnvironmentFile=/etc/environment

Create /etc/systemd/system/redis-monitor.timer:

[Unit]
Description=Run Redis monitor every 15 minutes

[Timer]
OnCalendar=*:0/15
Persistent=true

[Install]
WantedBy=timers.target

Enable the timer:

sudo systemctl daemon-reload
sudo systemctl enable redis-monitor.timer
sudo systemctl start redis-monitor.timer

Option C: Fly.io Cron Job

Add to your fly.toml:

[experimental]
  cmd = ["python3", "scripts/monitor_redis_usage.py"]

[[vm]]
  cpu_kind = "shared"
  cpus = 1
  memory_mb = 256

---

Understanding the Metrics

Total Commands

  • **What:** Total Redis operations since database creation
  • **Good:** Increasing slowly (10-100 per second)
  • **Bad:** Increasing rapidly (1000+ per second)
  • **Critical:** Approaching daily limit (10K for free tier)

Ops Per Sec

  • **What:** Current operation rate
  • **Normal:** 1-10 ops/sec for inactive system
  • **High:** 100+ ops/sec during heavy usage
  • **Dangerous:** 1000+ ops/sec (cache bug or infinite loop)

Predicted Daily Usage

  • **What:** Expected total commands by end of day
  • **Formula:** current + (rate × seconds_remaining)
  • **Action:** If predicted > limit, investigate immediately

Hit Rate

  • **What:** Percentage of cache hits vs misses
  • **Good:** 80%+ (caching working well)
  • **Bad:** <50% (caching ineffective)
  • **Terrible:** <20% (no caching or wrong cache keys)

---

Alert Thresholds

Warning (70% of limit)

  • **Triggers:** At 7,000 commands (for 10K limit)
  • **Action:** Investigate what's causing high usage
  • **Check:** Are you testing? Load testing? Background jobs?

Critical (90% of limit)

  • **Triggers:** At 9,000 commands (for 10K limit)
  • **Action:** Immediate investigation required
  • **Consequence:** Database will be suspended soon

---

Troubleshooting High Usage

Step 1: Check Current Rate

python3 scripts/monitor_redis_usage.py

If ops_per_sec > 100, you have a problem.

Step 2: Find the Source

Check for Tight Loops

grep -rn "asyncio.sleep(0" backend-saas/core --include="*.py"

Look for:

  • sleep(0.01) = 100 polls/sec
  • sleep(0.1) = 10 polls/sec
  • sleep(0.5) = 2 polls/sec

Check Cache Usage

grep -rn "cache.get\|redis_cache.get" backend-saas/api --include="*.py" | wc -l

If count > 50, you might be over-caching.

Check Background Workers

ps aux | grep -E "worker|scheduler|monitor"

Look for multiple instances of the same worker.

Step 3: Fix the Issue

Common Fixes

  1. **Add Caching**

# ✅ GOOD: Uses cache

def get_data(key):

cached = cache.get(f"data:{key}")

if cached:

return cached

data = db.query("SELECT * FROM data WHERE id = ?", key)

cache.set(f"data:{key}", data, ttl=3600)

return data

```

  1. **Reduce Polling Frequency**

# ✅ GOOD: Polls every minute

while True:

check_status()

await asyncio.sleep(60)

```

  1. **Batch Operations**

# ✅ GOOD: 1 query

items = db.query("SELECT * FROM items WHERE id IN (?)", item_ids)

```

---

Upstash Pricing Tiers

TierPriceCommands/DayCommands/MonthUse Case
Free$010,000300,000Development
Basic$0.20/day10,000300,000Production (low traffic)
Pro$0.50/day100,0003,000,000Production (medium traffic)
Scale$2.00/day1,000,00030,000,000Production (high traffic)

**Recommendation:** Start with **Pro tier** ($0.50/day) for production to avoid suspensions.

---

Emergency Procedures

If Redis Gets Suspended

  1. **Check Status**
  1. **Contact Upstash Support**
  • Email: support@upstash.com
  • Subject: "Database Suspension - {Your Database ID}"
  • Message: "My Redis database was suspended for exceeding limits. I've fixed the bug causing excessive reads. Can you unsuspend it?"
  1. **Upgrade Your Plan**
  • Log in to Upstash console
  • Navigate to your database
  • Upgrade to higher tier
  1. **Verify Fix is Deployed**
  1. **Monitor Closely**
  • Run monitor script every 5 minutes
  • Watch for usage spikes
  • Check logs for errors

---

Best Practices

✅ DO

  1. **Cache Frequently Accessed Data**
  • Tenant lookups (TTL: 1 hour)
  • User sessions (TTL: 30 minutes)
  • Configuration (TTL: 15 minutes)
  1. **Use Appropriate TTLs**
  • Static data: 1 hour+
  • Dynamic data: 5-15 minutes
  • Session data: 30 minutes
  1. **Monitor Usage**
  • Check stats daily
  • Set up alerts
  • Review trends weekly
  1. **Test in Staging**
  • Load test in staging environment
  • Monitor Redis usage during tests
  • Catch issues before production

❌ DON'T

  1. **Cache Everything**
  • Don't cache rapidly changing data
  • Don't cache large objects (>1MB)
  • Don't cache without TTL
  1. **Poll Aggressively**
  • Avoid sub-second polling
  • Use webhooks instead of polling
  • Batch operations when possible
  1. **Ignore Limits**
  • Know your provider's limits
  • Set alerts at 70% of limit
  • Upgrade before hitting limits
  1. **Deploy Without Testing**
  • Test in staging first
  • Monitor Redis usage during deployment
  • Have rollback plan ready

---

Additional Resources

  • **Upstash Documentation:** https://upstash.com/docs
  • **Upstash Pricing:** https://upstash.com/pricing
  • **Redis Best Practices:** https://redis.io/topics/lru-cache
  • **Redis Monitoring:** https://redis.io/topics/admin

---

**Last Updated:** 2026-04-09

**Version:** 1.0

**Maintainer:** DevOps Team