Redis Monitoring & Alerting Guide
Overview
This guide shows how to set up monitoring for your Upstash Redis usage to prevent future suspensions.
---
Quick Setup
1. Test the Monitor Script
cd backend-saas
python3 scripts/monitor_redis_usage.py**Expected Output:**
Redis Usage Monitor Starting...
Command Limit: 10,000/day
Warning Threshold: 70.0%
Critical Threshold: 90.0%
Redis Usage Summary:
Total Commands: 1,234
Current Rate: 2.5 ops/sec
Predicted Daily: 5,400
Keys: 45
Hit Rate: 85.3%2. Set Up Alerts (Optional)
Send alerts to Slack or Discord when usage spikes:
# For Slack
export ALERT_WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
# For Discord
export ALERT_WEBHOOK_URL="https://discord.com/api/webhooks/YOUR/WEBHOOK/URL"
# Run monitor
python3 scripts/monitor_redis_usage.py3. Schedule Automatic Checks
Option A: Cron (Recommended)
# Edit crontab
crontab -e
# Add these lines:
# Check Redis usage every 15 minutes
*/15 * * * * cd /app/backend-saas && /usr/bin/python3 scripts/monitor_redis_usage.py >> /var/log/redis_monitor.log 2>&1Option B: Systemd Timer
Create /etc/systemd/system/redis-monitor.service:
[Unit]
Description=Redis Usage Monitor
After=network.target
[Service]
Type=oneshot
WorkingDirectory=/app/backend-saas
ExecStart=/usr/bin/python3 scripts/monitor_redis_usage.py
EnvironmentFile=/etc/environmentCreate /etc/systemd/system/redis-monitor.timer:
[Unit]
Description=Run Redis monitor every 15 minutes
[Timer]
OnCalendar=*:0/15
Persistent=true
[Install]
WantedBy=timers.targetEnable the timer:
sudo systemctl daemon-reload
sudo systemctl enable redis-monitor.timer
sudo systemctl start redis-monitor.timerOption C: Fly.io Cron Job
Add to your fly.toml:
[experimental]
cmd = ["python3", "scripts/monitor_redis_usage.py"]
[[vm]]
cpu_kind = "shared"
cpus = 1
memory_mb = 256---
Understanding the Metrics
Total Commands
- **What:** Total Redis operations since database creation
- **Good:** Increasing slowly (10-100 per second)
- **Bad:** Increasing rapidly (1000+ per second)
- **Critical:** Approaching daily limit (10K for free tier)
Ops Per Sec
- **What:** Current operation rate
- **Normal:** 1-10 ops/sec for inactive system
- **High:** 100+ ops/sec during heavy usage
- **Dangerous:** 1000+ ops/sec (cache bug or infinite loop)
Predicted Daily Usage
- **What:** Expected total commands by end of day
- **Formula:**
current + (rate × seconds_remaining) - **Action:** If predicted > limit, investigate immediately
Hit Rate
- **What:** Percentage of cache hits vs misses
- **Good:** 80%+ (caching working well)
- **Bad:** <50% (caching ineffective)
- **Terrible:** <20% (no caching or wrong cache keys)
---
Alert Thresholds
Warning (70% of limit)
- **Triggers:** At 7,000 commands (for 10K limit)
- **Action:** Investigate what's causing high usage
- **Check:** Are you testing? Load testing? Background jobs?
Critical (90% of limit)
- **Triggers:** At 9,000 commands (for 10K limit)
- **Action:** Immediate investigation required
- **Consequence:** Database will be suspended soon
---
Troubleshooting High Usage
Step 1: Check Current Rate
python3 scripts/monitor_redis_usage.pyIf ops_per_sec > 100, you have a problem.
Step 2: Find the Source
Check for Tight Loops
grep -rn "asyncio.sleep(0" backend-saas/core --include="*.py"Look for:
sleep(0.01)= 100 polls/secsleep(0.1)= 10 polls/secsleep(0.5)= 2 polls/sec
Check Cache Usage
grep -rn "cache.get\|redis_cache.get" backend-saas/api --include="*.py" | wc -lIf count > 50, you might be over-caching.
Check Background Workers
ps aux | grep -E "worker|scheduler|monitor"Look for multiple instances of the same worker.
Step 3: Fix the Issue
Common Fixes
- **Add Caching**
# ✅ GOOD: Uses cache
def get_data(key):
cached = cache.get(f"data:{key}")
if cached:
return cached
data = db.query("SELECT * FROM data WHERE id = ?", key)
cache.set(f"data:{key}", data, ttl=3600)
return data
```
- **Reduce Polling Frequency**
# ✅ GOOD: Polls every minute
while True:
check_status()
await asyncio.sleep(60)
```
- **Batch Operations**
# ✅ GOOD: 1 query
items = db.query("SELECT * FROM items WHERE id IN (?)", item_ids)
```
---
Upstash Pricing Tiers
| Tier | Price | Commands/Day | Commands/Month | Use Case |
|---|---|---|---|---|
| Free | $0 | 10,000 | 300,000 | Development |
| Basic | $0.20/day | 10,000 | 300,000 | Production (low traffic) |
| Pro | $0.50/day | 100,000 | 3,000,000 | Production (medium traffic) |
| Scale | $2.00/day | 1,000,000 | 30,000,000 | Production (high traffic) |
**Recommendation:** Start with **Pro tier** ($0.50/day) for production to avoid suspensions.
---
Emergency Procedures
If Redis Gets Suspended
- **Check Status**
- **Contact Upstash Support**
- Email: support@upstash.com
- Subject: "Database Suspension - {Your Database ID}"
- Message: "My Redis database was suspended for exceeding limits. I've fixed the bug causing excessive reads. Can you unsuspend it?"
- **Upgrade Your Plan**
- Log in to Upstash console
- Navigate to your database
- Upgrade to higher tier
- **Verify Fix is Deployed**
- **Monitor Closely**
- Run monitor script every 5 minutes
- Watch for usage spikes
- Check logs for errors
---
Best Practices
✅ DO
- **Cache Frequently Accessed Data**
- Tenant lookups (TTL: 1 hour)
- User sessions (TTL: 30 minutes)
- Configuration (TTL: 15 minutes)
- **Use Appropriate TTLs**
- Static data: 1 hour+
- Dynamic data: 5-15 minutes
- Session data: 30 minutes
- **Monitor Usage**
- Check stats daily
- Set up alerts
- Review trends weekly
- **Test in Staging**
- Load test in staging environment
- Monitor Redis usage during tests
- Catch issues before production
❌ DON'T
- **Cache Everything**
- Don't cache rapidly changing data
- Don't cache large objects (>1MB)
- Don't cache without TTL
- **Poll Aggressively**
- Avoid sub-second polling
- Use webhooks instead of polling
- Batch operations when possible
- **Ignore Limits**
- Know your provider's limits
- Set alerts at 70% of limit
- Upgrade before hitting limits
- **Deploy Without Testing**
- Test in staging first
- Monitor Redis usage during deployment
- Have rollback plan ready
---
Additional Resources
- **Upstash Documentation:** https://upstash.com/docs
- **Upstash Pricing:** https://upstash.com/pricing
- **Redis Best Practices:** https://redis.io/topics/lru-cache
- **Redis Monitoring:** https://redis.io/topics/admin
---
**Last Updated:** 2026-04-09
**Version:** 1.0
**Maintainer:** DevOps Team