Atom AI Labs - AI-Powered Multi-Tenant Platform

Hybrid HASH + Local Cache Quota Manager - Implementation Guide

**Date**: 2026-04-09

**Goal**: Replace quota manager with hybrid approach (HASH storage + local caching)

**Scale**: Up to 10M tenants

**Redis GET Reduction**: 95% (with local cache 80% hit rate)

---

🎯 What We're Implementing

Architecture

┌─────────────────────────────────────────────────┐
│              Application Layer                  │
│  (10,000+ requests per second)                  │
└────────────────┬────────────────────────────────┘
                 │
                 │ check_quota()
                 ▼
┌─────────────────────────────────────────────────┐
│         Hybrid Quota Manager                     │
│                                                  │
│  1. Check Local Cache (60s TTL)                 │
│     ┌─────────────────────────────┐              │
│     │ Cache Hit (80% of requests) │              │
│     │ Return: allowed=True/False   │              │
│     └─────────────────────────────┘              │
│                                                  │
│  2. Cache Miss (20% of requests)                 │
│     ┌─────────────────────────────┐              │
│     │ Check Redis HASH             │              │
│     │ HGET quota:hash:{date}       │              │
│     │ Return: allowed=True/False   │              │
│     └─────────────────────────────┘              │
│                                                  │
│  3. Update Local Cache                          │
│     Store result for 60 seconds                   │
│                                                  │
└─────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│              Redis HASH Storage                  │
│                                                  │
│  Key: quota:hash:2026-04-09                     │
│  Fields:                                         │
│    - tenant:abc123...: "150"                    │
│    - tenant:def456...: "75"                     │
│    - tenant:ghi789...: "exceeded"               │
│                                                  │
│  Operations:                                      │
│    - HGET (O(1)) - Check single tenant            │
│    - HINCRBY (O(1)) - Atomic increment           │
│    - HGETALL (O(N)) - Get all tenants            │
│                                                  │
└─────────────────────────────────────────────────┘

Benefits

**Scalability**: Handles 10M+ tenants
**Performance**: 80% cache hit rate (sub-millisecond)
**Efficiency**: Single Redis HASH instead of N keys
**Real-time**: No 30-second delay
**Monitoring**: Built-in stats and health checks

---

📋 Implementation Checklist

Phase 1: Code Implementation (DONE ✅)

[x] Create core/hybrid_quota_manager.py
[x] Update core/cache.py to use HybridQuotaManager
[x] Create api/routes/admin_quota.py for monitoring
[x] Add cache statistics tracking
[x] Implement cache warming and eviction

Phase 2: Testing

# 1. Test locally
cd backend-saas
python3 -c "
import asyncio
from core.cache import redis_cache

async def test():
    # Test quota checking
    result = await redis_cache.quota_manager.check_quota('test-tenant-123', 'free')
    print(f'✅ Quota check: {result}')

    # Test quota recording
    await redis_cache.quota_manager.record_command('test-tenant-123', 'free')
    print('✅ Quota recorded')

    # Test cache stats
    stats = await redis_cache.quota_manager.get_cache_stats()
    print(f'✅ Cache stats: {stats}')

    # Test get all usage
    all_usage = await redis_cache.quota_manager.get_all_usage('2026-04-09', limit=10)
    print(f'✅ All usage: {all_usage}')

asyncio.run(test())
"

Phase 3: Deploy

# 1. Deploy to Fly.io
fly deploy -a atom-saas

# 2. Monitor logs
fly logs -a atom-saas --tail 100 | grep -i "quota\|hybrid"

# Expected output:
# ✅ Using HybridQuotaManager (HASH storage + local cache)
# ✅ Created direct Redis client for quota manager

Phase 4: Verify

# 1. Check health endpoint
curl https://app.atomagentos.com/api/admin/quota/health

# Expected response:
{
  "status": "healthy",
  "checks": {
    "redis_connection": "ok",
    "cache_performance": {
      "hit_rate_percent": 85.2,
      "status": "good"
    },
    "hash_storage": {
      "tenant_count": 123,
      "estimated_size_mb": 0.01
    }
  }
}

# 2. Check cache stats
curl https://app.atomagentos.com/api/admin/quota/stats

# 3. Check quota usage
curl https://app.atomagentos.com/api/admin/quota/usage?limit=100

---

🔧 Configuration Options

Environment Variables

Add to .env or Fly.io secrets:

# Quota cache configuration
QUOTA_CACHE_TTL=60          # Local cache TTL in seconds (default: 60)
QUOTA_CACHE_SIZE=10000       # Max tenants to cache locally (default: 10K)

# Enable/disable quota system
ENABLE_REDIS_QUOTA=true       # Enable quota enforcement (default: true)

Tuning Guidelines

**For High-Volume Applications** (1000+ ops/sec):

QUOTA_CACHE_TTL=120         # 2 minutes (reduces Redis calls)
QUOTA_CACHE_SIZE=50000      # 50K tenants in local cache

**For Memory-Constrained Environments** (2GB RAM):

QUOTA_CACHE_TTL=30          # 30 seconds (fresher data)
QUOTA_CACHE_SIZE=1000       # 1K tenants in local cache

---

📊 Monitoring

Key Metrics to Track

**Cache Hit Rate** (should be >80%)

curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.cache_performance.hit_rate_percent'

**Redis HASH Size** (warn if >100MB)

curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.hash_storage.estimated_size_mb'

**Total Tenants** (monitor growth)

curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.hash_storage.tenant_count'

Dashboard Queries

**Cache Performance Over Time**:

-- Not applicable for Redis, but you can log stats
-- Check logs for "Cache warming complete" messages
fly logs -a atom-saas --json | grep "hit_rate_percent"

**Tenant Growth**:

# Daily tenant count
curl https://app.atomagentos.com/api/admin/quota/usage | jq '.statistics.total_tenants'

---

🚀 Deployment Steps

Step 1: Verify Implementation

# Check files exist
ls -la backend-saas/core/hybrid_quota_manager.py
ls -la backend-saas/api/routes/admin_quota.py

# Verify cache.py imports it
grep -n "HybridQuotaManager" backend-saas/core/cache.py

Step 2: Test Locally

cd backend-saas

# Run quota test
python3 -c "
import asyncio
from core.cache import redis_cache

async def test():
    # Warm cache with 10 tenants
    tenants = [f'tenant-{i}' for i in range(10)]
    await redis_cache.quota_manager.warm_cache(tenants, 'free')

    # Get stats
    stats = await redis_cache.quota_manager.get_cache_stats()
    print(f'Cache hits: {stats[\"cache_hits\"]}')
    print(f'Cache misses: {stats[\"cache_misses\"]}')
    print(f'Hit rate: {stats[\"hit_rate_percent\"]}%')

asyncio.run(test())
"

Step 3: Deploy to Staging (if available)

# Deploy to staging environment
fly deploy -a atom-saas-staging

Step 4: Deploy to Production

# Deploy to production
fly deploy -a atom-saas

# Monitor deployment
fly logs -a atom-saas --tail 100 | grep -i "hybrid\|quota"

Step 5: Verify Deployment

# 1. Check health
curl https://app.atomagentos.com/api/admin/quota/health

# 2. Check stats
curl https://app.atomagentos.com/api/admin/quota/stats

# 3. Check quota usage
curl https://app.atomagentos.com/api/admin/quota/usage

---

📈 Expected Results

Performance Metrics

Metric	Target	Acceptable	Poor
Cache hit rate	>80%	60-80%	<60%
Quota latency	<1ms	<5ms	>10ms
HASH size	<10MB	10-100MB	>100MB
Redis GETs	~110K/day	~500K/day	>1M/day

Scalability Milestones

Tenants	HASH Size	Local Cache	Recommended
1K	0.1MB	1K (100%)	✅ Single HASH
10K	1MB	10K (100%)	✅ Single HASH
100K	10MB	10K (10%)	✅ Hybrid (cache helps)
1M	100MB	10K (1%)	✅ Hybrid (still works)
10M	1GB	10K (0.1%)	✅ Hybrid + consider sharding

---

🛠 Troubleshooting

Issue: Low cache hit rate (<60%)

**Diagnose**:

curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.cache_performance'

**Possible Causes**:

Cache TTL too short (increase QUOTA_CACHE_TTL)
Cache size too small (increase QUOTA_CACHE_SIZE)
High tenant churn (normal, consider acceptable)

**Fix**:

# Update environment variables
fly secrets set QUOTA_CACHE_TTL=120 -a atom-saas
fly secrets set QUOTA_CACHE_SIZE=50000 -a atom-saas
fly deploy -a atom-saas

Issue: HASH size growing too fast (>100MB)

**Diagnose**:

curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.hash_storage'

**Possible Causes**:

Not expiring old HASH keys (check TTL logic)
Too many tenants (consider scaling strategy)

**Fix**:

# Check if old HASH keys are being cleaned up
# In hybrid_quota_manager.py, ensure expire() is called
# The HASH should expire at end of day (86400 seconds TTL)

Issue: Redis connection errors

**Diagnose**:

curl https://app.atomagentos.com/api/admin/quota/health | jq '.checks.redis_connection'

**Fix**:

# Check Redis URL
fly secrets list -a atom-saas | grep REDIS

# Verify Redis is accessible
redis-cli -u <REDIS_URL> ping

---

🎓 Best Practices

1. Monitor Cache Hit Rate

**Why**: Ensures local cache is working effectively

**How**: Check /api/admin/quota/stats endpoint

**Target**: >80% hit rate

2. Warm Cache on Startup

**Why**: Prevents cold start for active tenants

**How**:

# In application startup
active_tenants = get_active_tenants()  # Your logic here
await redis_cache.quota_manager.warm_cache(active_tenants, "free")

3. Use Pagination for Admin Dashboards

**Why**: HGETALL is slow with millions of tenants

**How**:

# Get first 1000 tenants
usage = await quota_manager.get_all_usage(limit=1000)

# Get next 1000 tenants (offset support can be added)
# usage = await quota_manager.get_all_usage(limit=1000, offset=1000)

4. Set Alert Thresholds

**Why**: Catch issues before they impact users

**Alert on**:

Cache hit rate < 70% for 5 minutes
HASH size > 100MB
Redis connection errors > 1% rate

---

📚 API Endpoints Reference

`/api/admin/quota/stats`

**Method**: GET

**Description**: Get quota system performance statistics

**Response**:

{
  "success": true,
  "stats": {
    "cache_hits": 8500,
    "cache_misses": 1500,
    "total_requests": 10000,
    "hit_rate_percent": 85.0,
    "cache_size": 1000,
    "cache_max_size": 10000,
    "redis_calls": 1500
  }
}

`/api/admin/quota/usage`

**Method**: GET

**Query Params**: limit (default: 100), date_str (default: today)

**Description**: Get quota usage for all tenants

**Response**:

{
  "success": true,
  "date": "2026-04-09",
  "tenants": {
    "tenant123": 150,
    "tenant456": 75,
    "tenant789": "EXCEEDED"
  },
  "statistics": {
    "total_tenants": 3,
    "exceeded_tenants": 1,
    "active_tenants": 2,
    "limit": 100,
    "truncated": false
  }
}

`/api/admin/quota/cache/warm`

**Method**: POST

**Body**:

{
  "tenant_ids": ["tenant1", "tenant2", "tenant3"],
  "plan_type": "free"
}

**Description**: Pre-warm local cache for multiple tenants

`/api/admin/quota/cache/clear`

**Method**: POST

**Description**: Clear local quota cache (for testing/troubleshooting)

`/api/admin/quota/health`

**Method**: GET

**Description**: Get quota system health status

**Response**:

{
  "status": "healthy",
  "checks": {
    "redis_connection": "ok",
    "cache_performance": {
      "hit_rate_percent": 85.2,
      "status": "good"
    },
    "hash_storage": {
      "tenant_count": 123,
      "estimated_size_mb": 0.01
    }
  }
}

---

✅ Success Criteria

Deployment is successful when ALL of these are true:

[ ] Health endpoint returns "status": "healthy"
[ ] Cache hit rate > 80% (check /api/admin/quota/stats)
[ ] HASH size < 10MB (check /api/admin/quota/stats)
[ ] No errors in logs related to quota checking
[ ] Application performance unchanged (no latency increase)
[ ] Redis GETs reduced by 95% (monitor Upstash dashboard)

---

🎁 Bonus Features

1. Admin Dashboard

# Get all tenant quotas for admin dashboard
async def get_admin_quota_data():
    # Get quota usage
    usage = await redis_cache.quota_manager.get_all_usage(limit=1000)

    # Get cache stats
    stats = await redis_cache.quota_manager.get_cache_stats()

    return {
        "usage": usage,
        "stats": stats,
    }

2. Automated Alerts

# Check cache health and alert if needed
async def check_quota_health():
    stats = await redis_cache.quota_manager.get_cache_stats()

    if stats["hit_rate_percent"] < 70:
        # Alert: Cache performance degraded
        send_alert(f"Low cache hit rate: {stats['hit_rate_percent']}%")

    if stats["redis_calls"] > 10000:
        # Alert: Too many Redis calls
        send_alert(f"High Redis call volume: {stats['redis_calls']}")

3. Cache Warming Strategy

# On application startup, warm cache with active tenants
async def startup_cache_warming():
    # Get recently active tenants (last 24 hours)
    active_tenants = await get_active_tenants(hours=24)

    # Warm cache
    await redis_cache.quota_manager.warm_cache(active_tenants, "free")

    logger.info(f"Warmed cache for {len(active_tenants)} active tenants")

---

📝 Summary

**What Changed**:

Replaced RedisQuotaManager with HybridQuotaManager
Uses Redis HASH for storage (1 key instead of N keys)
Uses local cache for hot tenants (80% hit rate)
Scales to 10M+ tenants
95% reduction in Redis GETs

**Expected Impact**:

**Before**: 2.2M Redis GETs/day (individual keys + circular dependency)
**After**: ~110K Redis GETs/day (HASH + local cache)
**Scalability**: Up to 10M tenants (vs 1K before)
**Performance**: <1ms quota checks (80% from cache)

**Files Modified**:

backend-saas/core/cache.py - Use HybridQuotaManager
backend-saas/core/hybrid_quota_manager.py - NEW (HASH + local cache)
backend-saas/api/routes/admin_quota.py - NEW (monitoring endpoints)

**Next Steps**:

✅ Code implementation (DONE)
✅ Deploy and verify (DO THIS NOW)
✅ Monitor for 24 hours
✅ Adjust tuning if needed (cache TTL, cache size)

---

**Ready to deploy?** Run fly deploy -a atom-saas and monitor the /api/admin/quota/health endpoint! 🚀