ATOM Documentation

← Back to App

Hybrid HASH + Local Cache Quota Manager - Implementation Guide

**Date**: 2026-04-09

**Goal**: Replace quota manager with hybrid approach (HASH storage + local caching)

**Scale**: Up to 10M tenants

**Redis GET Reduction**: 95% (with local cache 80% hit rate)

---

šŸŽÆ What We're Implementing

Architecture

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│              Application Layer                  │
│  (10,000+ requests per second)                  │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
                 │
                 │ check_quota()
                 ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│         Hybrid Quota Manager                     │
│                                                  │
│  1. Check Local Cache (60s TTL)                 │
│     ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”              │
│     │ Cache Hit (80% of requests) │              │
│     │ Return: allowed=True/False   │              │
│     ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜              │
│                                                  │
│  2. Cache Miss (20% of requests)                 │
│     ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”              │
│     │ Check Redis HASH             │              │
│     │ HGET quota:hash:{date}       │              │
│     │ Return: allowed=True/False   │              │
│     ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜              │
│                                                  │
│  3. Update Local Cache                          │
│     Store result for 60 seconds                   │
│                                                  │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
                 │
                 ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│              Redis HASH Storage                  │
│                                                  │
│  Key: quota:hash:2026-04-09                     │
│  Fields:                                         │
│    - tenant:abc123...: "150"                    │
│    - tenant:def456...: "75"                     │
│    - tenant:ghi789...: "exceeded"               │
│                                                  │
│  Operations:                                      │
│    - HGET (O(1)) - Check single tenant            │
│    - HINCRBY (O(1)) - Atomic increment           │
│    - HGETALL (O(N)) - Get all tenants            │
│                                                  │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Benefits

  1. **Scalability**: Handles 10M+ tenants
  2. **Performance**: 80% cache hit rate (sub-millisecond)
  3. **Efficiency**: Single Redis HASH instead of N keys
  4. **Real-time**: No 30-second delay
  5. **Monitoring**: Built-in stats and health checks

---

šŸ“‹ Implementation Checklist

Phase 1: Code Implementation (DONE āœ…)

  • [x] Create core/hybrid_quota_manager.py
  • [x] Update core/cache.py to use HybridQuotaManager
  • [x] Create api/routes/admin_quota.py for monitoring
  • [x] Add cache statistics tracking
  • [x] Implement cache warming and eviction

Phase 2: Testing

# 1. Test locally
cd backend-saas
python3 -c "
import asyncio
from core.cache import redis_cache

async def test():
    # Test quota checking
    result = await redis_cache.quota_manager.check_quota('test-tenant-123', 'free')
    print(f'āœ… Quota check: {result}')

    # Test quota recording
    await redis_cache.quota_manager.record_command('test-tenant-123', 'free')
    print('āœ… Quota recorded')

    # Test cache stats
    stats = await redis_cache.quota_manager.get_cache_stats()
    print(f'āœ… Cache stats: {stats}')

    # Test get all usage
    all_usage = await redis_cache.quota_manager.get_all_usage('2026-04-09', limit=10)
    print(f'āœ… All usage: {all_usage}')

asyncio.run(test())
"

Phase 3: Deploy

# 1. Deploy to Fly.io
fly deploy -a atom-saas

# 2. Monitor logs
fly logs -a atom-saas --tail 100 | grep -i "quota\|hybrid"

# Expected output:
# āœ… Using HybridQuotaManager (HASH storage + local cache)
# āœ… Created direct Redis client for quota manager

Phase 4: Verify

# 1. Check health endpoint
curl https://app.atomagentos.com/api/admin/quota/health

# Expected response:
{
  "status": "healthy",
  "checks": {
    "redis_connection": "ok",
    "cache_performance": {
      "hit_rate_percent": 85.2,
      "status": "good"
    },
    "hash_storage": {
      "tenant_count": 123,
      "estimated_size_mb": 0.01
    }
  }
}

# 2. Check cache stats
curl https://app.atomagentos.com/api/admin/quota/stats

# 3. Check quota usage
curl https://app.atomagentos.com/api/admin/quota/usage?limit=100

---

šŸ”§ Configuration Options

Environment Variables

Add to .env or Fly.io secrets:

# Quota cache configuration
QUOTA_CACHE_TTL=60          # Local cache TTL in seconds (default: 60)
QUOTA_CACHE_SIZE=10000       # Max tenants to cache locally (default: 10K)

# Enable/disable quota system
ENABLE_REDIS_QUOTA=true       # Enable quota enforcement (default: true)

Tuning Guidelines

**For High-Volume Applications** (1000+ ops/sec):

QUOTA_CACHE_TTL=120         # 2 minutes (reduces Redis calls)
QUOTA_CACHE_SIZE=50000      # 50K tenants in local cache

**For Memory-Constrained Environments** (2GB RAM):

QUOTA_CACHE_TTL=30          # 30 seconds (fresher data)
QUOTA_CACHE_SIZE=1000       # 1K tenants in local cache

---

šŸ“Š Monitoring

Key Metrics to Track

  1. **Cache Hit Rate** (should be >80%)
curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.cache_performance.hit_rate_percent'
  1. **Redis HASH Size** (warn if >100MB)
curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.hash_storage.estimated_size_mb'
  1. **Total Tenants** (monitor growth)
curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.hash_storage.tenant_count'

Dashboard Queries

**Cache Performance Over Time**:

-- Not applicable for Redis, but you can log stats
-- Check logs for "Cache warming complete" messages
fly logs -a atom-saas --json | grep "hit_rate_percent"

**Tenant Growth**:

# Daily tenant count
curl https://app.atomagentos.com/api/admin/quota/usage | jq '.statistics.total_tenants'

---

šŸš€ Deployment Steps

Step 1: Verify Implementation

# Check files exist
ls -la backend-saas/core/hybrid_quota_manager.py
ls -la backend-saas/api/routes/admin_quota.py

# Verify cache.py imports it
grep -n "HybridQuotaManager" backend-saas/core/cache.py

Step 2: Test Locally

cd backend-saas

# Run quota test
python3 -c "
import asyncio
from core.cache import redis_cache

async def test():
    # Warm cache with 10 tenants
    tenants = [f'tenant-{i}' for i in range(10)]
    await redis_cache.quota_manager.warm_cache(tenants, 'free')

    # Get stats
    stats = await redis_cache.quota_manager.get_cache_stats()
    print(f'Cache hits: {stats[\"cache_hits\"]}')
    print(f'Cache misses: {stats[\"cache_misses\"]}')
    print(f'Hit rate: {stats[\"hit_rate_percent\"]}%')

asyncio.run(test())
"

Step 3: Deploy to Staging (if available)

# Deploy to staging environment
fly deploy -a atom-saas-staging

Step 4: Deploy to Production

# Deploy to production
fly deploy -a atom-saas

# Monitor deployment
fly logs -a atom-saas --tail 100 | grep -i "hybrid\|quota"

Step 5: Verify Deployment

# 1. Check health
curl https://app.atomagentos.com/api/admin/quota/health

# 2. Check stats
curl https://app.atomagentos.com/api/admin/quota/stats

# 3. Check quota usage
curl https://app.atomagentos.com/api/admin/quota/usage

---

šŸ“ˆ Expected Results

Performance Metrics

MetricTargetAcceptablePoor
**Cache hit rate**>80%60-80%<60%
**Quota latency**<1ms<5ms>10ms
**HASH size**<10MB10-100MB>100MB
**Redis GETs**~110K/day~500K/day>1M/day

Scalability Milestones

TenantsHASH SizeLocal CacheRecommended
**1K**0.1MB1K (100%)āœ… Single HASH
**10K**1MB10K (100%)āœ… Single HASH
**100K**10MB10K (10%)āœ… Hybrid (cache helps)
**1M**100MB10K (1%)āœ… Hybrid (still works)
**10M**1GB10K (0.1%)āœ… Hybrid + consider sharding

---

šŸ›  Troubleshooting

Issue: Low cache hit rate (<60%)

**Diagnose**:

curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.cache_performance'

**Possible Causes**:

  1. Cache TTL too short (increase QUOTA_CACHE_TTL)
  2. Cache size too small (increase QUOTA_CACHE_SIZE)
  3. High tenant churn (normal, consider acceptable)

**Fix**:

# Update environment variables
fly secrets set QUOTA_CACHE_TTL=120 -a atom-saas
fly secrets set QUOTA_CACHE_SIZE=50000 -a atom-saas
fly deploy -a atom-saas

Issue: HASH size growing too fast (>100MB)

**Diagnose**:

curl https://app.atomagentos.com/api/admin/quota/stats | jq '.stats.hash_storage'

**Possible Causes**:

  1. Not expiring old HASH keys (check TTL logic)
  2. Too many tenants (consider scaling strategy)

**Fix**:

# Check if old HASH keys are being cleaned up
# In hybrid_quota_manager.py, ensure expire() is called
# The HASH should expire at end of day (86400 seconds TTL)

Issue: Redis connection errors

**Diagnose**:

curl https://app.atomagentos.com/api/admin/quota/health | jq '.checks.redis_connection'

**Fix**:

# Check Redis URL
fly secrets list -a atom-saas | grep REDIS

# Verify Redis is accessible
redis-cli -u <REDIS_URL> ping

---

šŸŽ“ Best Practices

1. Monitor Cache Hit Rate

**Why**: Ensures local cache is working effectively

**How**: Check /api/admin/quota/stats endpoint

**Target**: >80% hit rate

2. Warm Cache on Startup

**Why**: Prevents cold start for active tenants

**How**:

# In application startup
active_tenants = get_active_tenants()  # Your logic here
await redis_cache.quota_manager.warm_cache(active_tenants, "free")

3. Use Pagination for Admin Dashboards

**Why**: HGETALL is slow with millions of tenants

**How**:

# Get first 1000 tenants
usage = await quota_manager.get_all_usage(limit=1000)

# Get next 1000 tenants (offset support can be added)
# usage = await quota_manager.get_all_usage(limit=1000, offset=1000)

4. Set Alert Thresholds

**Why**: Catch issues before they impact users

**Alert on**:

  • Cache hit rate < 70% for 5 minutes
  • HASH size > 100MB
  • Redis connection errors > 1% rate

---

šŸ“š API Endpoints Reference

`/api/admin/quota/stats`

**Method**: GET

**Description**: Get quota system performance statistics

**Response**:

{
  "success": true,
  "stats": {
    "cache_hits": 8500,
    "cache_misses": 1500,
    "total_requests": 10000,
    "hit_rate_percent": 85.0,
    "cache_size": 1000,
    "cache_max_size": 10000,
    "redis_calls": 1500
  }
}

`/api/admin/quota/usage`

**Method**: GET

**Query Params**: limit (default: 100), date_str (default: today)

**Description**: Get quota usage for all tenants

**Response**:

{
  "success": true,
  "date": "2026-04-09",
  "tenants": {
    "tenant123": 150,
    "tenant456": 75,
    "tenant789": "EXCEEDED"
  },
  "statistics": {
    "total_tenants": 3,
    "exceeded_tenants": 1,
    "active_tenants": 2,
    "limit": 100,
    "truncated": false
  }
}

`/api/admin/quota/cache/warm`

**Method**: POST

**Body**:

{
  "tenant_ids": ["tenant1", "tenant2", "tenant3"],
  "plan_type": "free"
}

**Description**: Pre-warm local cache for multiple tenants

`/api/admin/quota/cache/clear`

**Method**: POST

**Description**: Clear local quota cache (for testing/troubleshooting)

`/api/admin/quota/health`

**Method**: GET

**Description**: Get quota system health status

**Response**:

{
  "status": "healthy",
  "checks": {
    "redis_connection": "ok",
    "cache_performance": {
      "hit_rate_percent": 85.2,
      "status": "good"
    },
    "hash_storage": {
      "tenant_count": 123,
      "estimated_size_mb": 0.01
    }
  }
}

---

āœ… Success Criteria

Deployment is successful when ALL of these are true:

  • [ ] Health endpoint returns "status": "healthy"
  • [ ] Cache hit rate > 80% (check /api/admin/quota/stats)
  • [ ] HASH size < 10MB (check /api/admin/quota/stats)
  • [ ] No errors in logs related to quota checking
  • [ ] Application performance unchanged (no latency increase)
  • [ ] Redis GETs reduced by 95% (monitor Upstash dashboard)

---

šŸŽ Bonus Features

1. Admin Dashboard

# Get all tenant quotas for admin dashboard
async def get_admin_quota_data():
    # Get quota usage
    usage = await redis_cache.quota_manager.get_all_usage(limit=1000)

    # Get cache stats
    stats = await redis_cache.quota_manager.get_cache_stats()

    return {
        "usage": usage,
        "stats": stats,
    }

2. Automated Alerts

# Check cache health and alert if needed
async def check_quota_health():
    stats = await redis_cache.quota_manager.get_cache_stats()

    if stats["hit_rate_percent"] < 70:
        # Alert: Cache performance degraded
        send_alert(f"Low cache hit rate: {stats['hit_rate_percent']}%")

    if stats["redis_calls"] > 10000:
        # Alert: Too many Redis calls
        send_alert(f"High Redis call volume: {stats['redis_calls']}")

3. Cache Warming Strategy

# On application startup, warm cache with active tenants
async def startup_cache_warming():
    # Get recently active tenants (last 24 hours)
    active_tenants = await get_active_tenants(hours=24)

    # Warm cache
    await redis_cache.quota_manager.warm_cache(active_tenants, "free")

    logger.info(f"Warmed cache for {len(active_tenants)} active tenants")

---

šŸ“ Summary

**What Changed**:

  • Replaced RedisQuotaManager with HybridQuotaManager
  • Uses Redis HASH for storage (1 key instead of N keys)
  • Uses local cache for hot tenants (80% hit rate)
  • Scales to 10M+ tenants
  • 95% reduction in Redis GETs

**Expected Impact**:

  • **Before**: 2.2M Redis GETs/day (individual keys + circular dependency)
  • **After**: ~110K Redis GETs/day (HASH + local cache)
  • **Scalability**: Up to 10M tenants (vs 1K before)
  • **Performance**: <1ms quota checks (80% from cache)

**Files Modified**:

  • backend-saas/core/cache.py - Use HybridQuotaManager
  • backend-saas/core/hybrid_quota_manager.py - NEW (HASH + local cache)
  • backend-saas/api/routes/admin_quota.py - NEW (monitoring endpoints)

**Next Steps**:

  1. āœ… Code implementation (DONE)
  2. āœ… Deploy and verify (DO THIS NOW)
  3. āœ… Monitor for 24 hours
  4. āœ… Adjust tuning if needed (cache TTL, cache size)

---

**Ready to deploy?** Run fly deploy -a atom-saas and monitor the /api/admin/quota/health endpoint! šŸš€