Hybrid HASH + Local Cache Quota Manager - Implementation Guide
**Date**: 2026-04-09
**Goal**: Replace quota manager with hybrid approach (HASH storage + local caching)
**Scale**: Up to 10M tenants
**Redis GET Reduction**: 95% (with local cache 80% hit rate)
---
šÆ What We're Implementing
Architecture
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Application Layer ā
ā (10,000+ requests per second) ā
āāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā check_quota()
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Hybrid Quota Manager ā
ā ā
ā 1. Check Local Cache (60s TTL) ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā Cache Hit (80% of requests) ā ā
ā ā Return: allowed=True/False ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā
ā 2. Cache Miss (20% of requests) ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā Check Redis HASH ā ā
ā ā HGET quota:hash:{date} ā ā
ā ā Return: allowed=True/False ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā
ā 3. Update Local Cache ā
ā Store result for 60 seconds ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Redis HASH Storage ā
ā ā
ā Key: quota:hash:2026-04-09 ā
ā Fields: ā
ā - tenant:abc123...: "150" ā
ā - tenant:def456...: "75" ā
ā - tenant:ghi789...: "exceeded" ā
ā ā
ā Operations: ā
ā - HGET (O(1)) - Check single tenant ā
ā - HINCRBY (O(1)) - Atomic increment ā
ā - HGETALL (O(N)) - Get all tenants ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāBenefits
- **Scalability**: Handles 10M+ tenants
- **Performance**: 80% cache hit rate (sub-millisecond)
- **Efficiency**: Single Redis HASH instead of N keys
- **Real-time**: No 30-second delay
- **Monitoring**: Built-in stats and health checks
---
š Implementation Checklist
Phase 1: Code Implementation (DONE ā )
- [x] Create
core/hybrid_quota_manager.py - [x] Update
core/cache.pyto use HybridQuotaManager - [x] Create
api/routes/admin_quota.pyfor monitoring - [x] Add cache statistics tracking
- [x] Implement cache warming and eviction
Phase 2: Testing
# 1. Test locally
cd backend-saas
python3 -c "
import asyncio
from core.cache import redis_cache
async def test():
# Test quota checking
result = await redis_cache.quota_manager.check_quota('test-tenant-123', 'free')
print(f'ā
Quota check: {result}')
# Test quota recording
await redis_cache.quota_manager.record_command('test-tenant-123', 'free')
print('ā
Quota recorded')
# Test cache stats
stats = await redis_cache.quota_manager.get_cache_stats()
print(f'ā
Cache stats: {stats}')
# Test get all usage
all_usage = await redis_cache.quota_manager.get_all_usage('2026-04-09', limit=10)
print(f'ā
All usage: {all_usage}')
asyncio.run(test())
"Phase 3: Deploy
# 1. Deploy to Fly.io
fly deploy -a atom-saas
# 2. Monitor logs
fly logs -a atom-saas --tail 100 | grep -i "quota\|hybrid"
# Expected output:
# ā
Using HybridQuotaManager (HASH storage + local cache)
# ā
Created direct Redis client for quota managerPhase 4: Verify
# 1. Check health endpoint
curl https://app.atomagentos.com/api/admin/quota/health
# Expected response:
{
"status": "healthy",
"checks": {
"redis_connection": "ok",
"cache_performance": {
"hit_rate_percent": 85.2,
"status": "good"
},
"hash_storage": {
"tenant_count": 123,
"estimated_size_mb": 0.01
}
}
}
# 2. Check cache stats
curl https://app.atomagentos.com/api/admin/quota/stats
# 3. Check quota usage
curl https://app.atomagentos.com/api/admin/quota/usage?limit=100---
š§ Configuration Options
Environment Variables
Add to .env or Fly.io secrets:
# Quota cache configuration
QUOTA_CACHE_TTL=60 # Local cache TTL in seconds (default: 60)
QUOTA_CACHE_SIZE=10000 # Max tenants to cache locally (default: 10K)
# Enable/disable quota system
ENABLE_REDIS_QUOTA=true # Enable quota enforcement (default: true)Tuning Guidelines
**For High-Volume Applications** (1000+ ops/sec):
QUOTA_CACHE_TTL=120 # 2 minutes (reduces Redis calls)
QUOTA_CACHE_SIZE=50000 # 50K tenants in local cache**For Memory-Constrained Environments** (2GB RAM):
QUOTA_CACHE_TTL=30 # 30 seconds (fresher data)
QUOTA_CACHE_SIZE=1000 # 1K tenants in local cache---
š Monitoring
Key Metrics to Track
- **Cache Hit Rate** (should be >80%)
curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.cache_performance.hit_rate_percent'- **Redis HASH Size** (warn if >100MB)
curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.hash_storage.estimated_size_mb'- **Total Tenants** (monitor growth)
curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.hash_storage.tenant_count'Dashboard Queries
**Cache Performance Over Time**:
-- Not applicable for Redis, but you can log stats
-- Check logs for "Cache warming complete" messages
fly logs -a atom-saas --json | grep "hit_rate_percent"**Tenant Growth**:
# Daily tenant count
curl https://app.atomagentos.com/api/admin/quota/usage | jq '.statistics.total_tenants'---
š Deployment Steps
Step 1: Verify Implementation
# Check files exist
ls -la backend-saas/core/hybrid_quota_manager.py
ls -la backend-saas/api/routes/admin_quota.py
# Verify cache.py imports it
grep -n "HybridQuotaManager" backend-saas/core/cache.pyStep 2: Test Locally
cd backend-saas
# Run quota test
python3 -c "
import asyncio
from core.cache import redis_cache
async def test():
# Warm cache with 10 tenants
tenants = [f'tenant-{i}' for i in range(10)]
await redis_cache.quota_manager.warm_cache(tenants, 'free')
# Get stats
stats = await redis_cache.quota_manager.get_cache_stats()
print(f'Cache hits: {stats[\"cache_hits\"]}')
print(f'Cache misses: {stats[\"cache_misses\"]}')
print(f'Hit rate: {stats[\"hit_rate_percent\"]}%')
asyncio.run(test())
"Step 3: Deploy to Staging (if available)
# Deploy to staging environment
fly deploy -a atom-saas-stagingStep 4: Deploy to Production
# Deploy to production
fly deploy -a atom-saas
# Monitor deployment
fly logs -a atom-saas --tail 100 | grep -i "hybrid\|quota"Step 5: Verify Deployment
# 1. Check health
curl https://app.atomagentos.com/api/admin/quota/health
# 2. Check stats
curl https://app.atomagentos.com/api/admin/quota/stats
# 3. Check quota usage
curl https://app.atomagentos.com/api/admin/quota/usage---
š Expected Results
Performance Metrics
| Metric | Target | Acceptable | Poor |
|---|---|---|---|
| **Cache hit rate** | >80% | 60-80% | <60% |
| **Quota latency** | <1ms | <5ms | >10ms |
| **HASH size** | <10MB | 10-100MB | >100MB |
| **Redis GETs** | ~110K/day | ~500K/day | >1M/day |
Scalability Milestones
| Tenants | HASH Size | Local Cache | Recommended |
|---|---|---|---|
| **1K** | 0.1MB | 1K (100%) | ā Single HASH |
| **10K** | 1MB | 10K (100%) | ā Single HASH |
| **100K** | 10MB | 10K (10%) | ā Hybrid (cache helps) |
| **1M** | 100MB | 10K (1%) | ā Hybrid (still works) |
| **10M** | 1GB | 10K (0.1%) | ā Hybrid + consider sharding |
---
š Troubleshooting
Issue: Low cache hit rate (<60%)
**Diagnose**:
curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.cache_performance'**Possible Causes**:
- Cache TTL too short (increase
QUOTA_CACHE_TTL) - Cache size too small (increase
QUOTA_CACHE_SIZE) - High tenant churn (normal, consider acceptable)
**Fix**:
# Update environment variables
fly secrets set QUOTA_CACHE_TTL=120 -a atom-saas
fly secrets set QUOTA_CACHE_SIZE=50000 -a atom-saas
fly deploy -a atom-saasIssue: HASH size growing too fast (>100MB)
**Diagnose**:
curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.hash_storage'**Possible Causes**:
- Not expiring old HASH keys (check TTL logic)
- Too many tenants (consider scaling strategy)
**Fix**:
# Check if old HASH keys are being cleaned up
# In hybrid_quota_manager.py, ensure expire() is called
# The HASH should expire at end of day (86400 seconds TTL)Issue: Redis connection errors
**Diagnose**:
curl https://app.atomagentos.com/api/admin/quota/health | jq '.checks.redis_connection'**Fix**:
# Check Redis URL
fly secrets list -a atom-saas | grep REDIS
# Verify Redis is accessible
redis-cli -u <REDIS_URL> ping---
š Best Practices
1. Monitor Cache Hit Rate
**Why**: Ensures local cache is working effectively
**How**: Check /api/admin/quota/stats endpoint
**Target**: >80% hit rate
2. Warm Cache on Startup
**Why**: Prevents cold start for active tenants
**How**:
# In application startup
active_tenants = get_active_tenants() # Your logic here
await redis_cache.quota_manager.warm_cache(active_tenants, "free")3. Use Pagination for Admin Dashboards
**Why**: HGETALL is slow with millions of tenants
**How**:
# Get first 1000 tenants
usage = await quota_manager.get_all_usage(limit=1000)
# Get next 1000 tenants (offset support can be added)
# usage = await quota_manager.get_all_usage(limit=1000, offset=1000)4. Set Alert Thresholds
**Why**: Catch issues before they impact users
**Alert on**:
- Cache hit rate < 70% for 5 minutes
- HASH size > 100MB
- Redis connection errors > 1% rate
---
š API Endpoints Reference
`/api/admin/quota/stats`
**Method**: GET
**Description**: Get quota system performance statistics
**Response**:
{
"success": true,
"stats": {
"cache_hits": 8500,
"cache_misses": 1500,
"total_requests": 10000,
"hit_rate_percent": 85.0,
"cache_size": 1000,
"cache_max_size": 10000,
"redis_calls": 1500
}
}`/api/admin/quota/usage`
**Method**: GET
**Query Params**: limit (default: 100), date_str (default: today)
**Description**: Get quota usage for all tenants
**Response**:
{
"success": true,
"date": "2026-04-09",
"tenants": {
"tenant123": 150,
"tenant456": 75,
"tenant789": "EXCEEDED"
},
"statistics": {
"total_tenants": 3,
"exceeded_tenants": 1,
"active_tenants": 2,
"limit": 100,
"truncated": false
}
}`/api/admin/quota/cache/warm`
**Method**: POST
**Body**:
{
"tenant_ids": ["tenant1", "tenant2", "tenant3"],
"plan_type": "free"
}**Description**: Pre-warm local cache for multiple tenants
`/api/admin/quota/cache/clear`
**Method**: POST
**Description**: Clear local quota cache (for testing/troubleshooting)
`/api/admin/quota/health`
**Method**: GET
**Description**: Get quota system health status
**Response**:
{
"status": "healthy",
"checks": {
"redis_connection": "ok",
"cache_performance": {
"hit_rate_percent": 85.2,
"status": "good"
},
"hash_storage": {
"tenant_count": 123,
"estimated_size_mb": 0.01
}
}
}---
ā Success Criteria
Deployment is successful when ALL of these are true:
- [ ] Health endpoint returns
"status": "healthy" - [ ] Cache hit rate > 80% (check
/api/admin/quota/stats) - [ ] HASH size < 10MB (check
/api/admin/quota/stats) - [ ] No errors in logs related to quota checking
- [ ] Application performance unchanged (no latency increase)
- [ ] Redis GETs reduced by 95% (monitor Upstash dashboard)
---
š Bonus Features
1. Admin Dashboard
# Get all tenant quotas for admin dashboard
async def get_admin_quota_data():
# Get quota usage
usage = await redis_cache.quota_manager.get_all_usage(limit=1000)
# Get cache stats
stats = await redis_cache.quota_manager.get_cache_stats()
return {
"usage": usage,
"stats": stats,
}2. Automated Alerts
# Check cache health and alert if needed
async def check_quota_health():
stats = await redis_cache.quota_manager.get_cache_stats()
if stats["hit_rate_percent"] < 70:
# Alert: Cache performance degraded
send_alert(f"Low cache hit rate: {stats['hit_rate_percent']}%")
if stats["redis_calls"] > 10000:
# Alert: Too many Redis calls
send_alert(f"High Redis call volume: {stats['redis_calls']}")3. Cache Warming Strategy
# On application startup, warm cache with active tenants
async def startup_cache_warming():
# Get recently active tenants (last 24 hours)
active_tenants = await get_active_tenants(hours=24)
# Warm cache
await redis_cache.quota_manager.warm_cache(active_tenants, "free")
logger.info(f"Warmed cache for {len(active_tenants)} active tenants")---
š Summary
**What Changed**:
- Replaced
RedisQuotaManagerwithHybridQuotaManager - Uses Redis HASH for storage (1 key instead of N keys)
- Uses local cache for hot tenants (80% hit rate)
- Scales to 10M+ tenants
- 95% reduction in Redis GETs
**Expected Impact**:
- **Before**: 2.2M Redis GETs/day (individual keys + circular dependency)
- **After**: ~110K Redis GETs/day (HASH + local cache)
- **Scalability**: Up to 10M tenants (vs 1K before)
- **Performance**: <1ms quota checks (80% from cache)
**Files Modified**:
backend-saas/core/cache.py- Use HybridQuotaManagerbackend-saas/core/hybrid_quota_manager.py- NEW (HASH + local cache)backend-saas/api/routes/admin_quota.py- NEW (monitoring endpoints)
**Next Steps**:
- ā Code implementation (DONE)
- ā Deploy and verify (DO THIS NOW)
- ā Monitor for 24 hours
- ā Adjust tuning if needed (cache TTL, cache size)
---
**Ready to deploy?** Run fly deploy -a atom-saas and monitor the /api/admin/quota/health endpoint! š