ATOM Documentation

← Back to App

Governance Architecture - Unified Frontend-Backend System

**Version:** 2.0

**Last Updated:** 2026-04-12

**Status:** Production-Ready

Overview

The governance system enforces agent maturity-based access control, ensuring AI agents only perform actions appropriate to their experience level. The architecture follows a **unified frontend-backend model** where the Python backend is the single source of truth for all governance decisions.

Key Principles

  1. **Backend Authority**: All governance decisions made by Python backend
  2. **Fail-Closed**: Deny actions if backend unavailable (never allow by default)
  3. **Client-Side Caching**: 30-second TTL to reduce API load
  4. **Audit Trail**: All decisions logged with latency tracking
  5. **Tenant Isolation**: Multi-tenant safety via tenant_id filtering

Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                         Frontend (Next.js)                      │
│                                                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  AgentGovernanceService                                  │  │
│  │  - canPerformAction()                                    │  │
│  │  - Client-side cache (30s TTL)                           │  │
│  │  - Fail-closed on errors                                 │  │
│  │  - Telemetry (hits/misses/errors)                        │  │
│  └───────────────┬──────────────────────────────────────────┘  │
│                  │                                               │
│                  │ fetch('/api/v1/agent-governance/evaluate')   │
│                  ▼                                               │
└──────────────────┼───────────────────────────────────────────────┘
                   │
                   │ HTTP POST
                   │
┌──────────────────┼───────────────────────────────────────────────┐
│                  ▼                                               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Next.js API Route                                       │  │
│  │  /api/v1/agent-governance/evaluate                       │  │
│  │  - Validates request (tenant_id, agent_id, action_type)  │  │
│  │  - Proxies to Python backend                             │  │
│  │  - Returns fail-closed response on errors                │  │
│  └───────────────┬──────────────────────────────────────────┘  │
│                  │                                               │
│                  │ HTTP POST (internal)                          │
│                  ▼                                               │
└──────────────────┼───────────────────────────────────────────────┘
                   │
                   │
┌──────────────────┼───────────────────────────────────────────────┐
│                  ▼                                               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Python Backend (FastAPI)                               │  │
│  │  /api/v1/agent-governance/enforce-action                 │  │
│  │  - AgentGovernanceService                                │  │
│  │  - ACTION_COMPLEXITY mapping                             │  │
│  │  - MATURITY_REQUIREMENTS                                 │  │
│  │  - Database queries (agent_registry)                     │  │
│  │  - Governance decision logic                             │  │
│  └───────────────┬──────────────────────────────────────────┘  │
│                  │                                               │
│                  │                                               │
│  ┌───────────────┴──────────────────────────────────────────┐  │
│  │  PostgreSQL Database                                     │  │
│  │  - agent_registry (status, confidence_score)             │  │
│  │  - tenant_id filtering (security)                        │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

API Contract

Frontend → Next.js API

**Endpoint:** POST /api/v1/agent-governance/evaluate

**Request:**

{
  tenant_id: string      // Required: Tenant ID for multi-tenancy
  agent_id: string       // Required: Agent to evaluate
  action_type: string    // Required: Action to perform
  context?: object       // Optional: Additional context
}

**Response (200 OK):**

{
  allowed: boolean           // Whether action is permitted
  reason: string             // Human-readable explanation
  requires_approval: boolean // Whether human approval needed
  maturity_level: 'student' | 'intern' | 'supervised' | 'autonomous'
  complexity: number         // Action complexity (1-4)
  confidence_score?: number  // Agent confidence (0.0-1.0)
  budget_remaining?: number  // Remaining budget (optional)
}

**Error Responses:**

All errors return **200 OK with allowed: false** (fail-closed):

// Backend unavailable
{
  allowed: false,
  reason: 'Backend governance service unavailable',
  requires_approval: true,
  maturity_level: 'student',
  complexity: 0
}

// Invalid agent ID
{
  allowed: false,
  reason: 'Reserved or invalid agent ID',
  requires_approval: true,
  maturity_level: 'student',
  complexity: 0
}

// Agent not found
{
  allowed: false,
  reason: 'Agent not found',
  requires_approval: true,
  maturity_level: 'student',
  complexity: 0
}

Next.js API → Python Backend

**Endpoint:** POST /api/v1/agent-governance/enforce-action

**Request:**

{
  agent_id: string
  action_type: string
  action_details?: object
}

**Headers:**

X-Tenant-ID: <tenant_id>
X-User-ID: <user_id>
Authorization: Bearer <jwt_token>

**Response:** Same as frontend response format

Client-Side Caching

Cache Implementation

// Cache key format
const cacheKey = `${tenantId}:${agentId}:${actionType}`

// Cache entry structure
interface CacheEntry {
  data: GovernanceEvaluationResult
  expiry: number  // Unix timestamp in milliseconds
}

// TTL configuration
private static CACHE_TTL_MS = 30 * 1000  // 30 seconds

Cache Behavior

ScenarioBehaviorTTL
**Cache Hit**Return cached data immediatelyN/A
**Cache Miss**Call backend API, cache response30s
**Backend Error**Cache denied response10s (shorter)
**Agent Maturity Change**Invalidate all agent cache entriesImmediate

Cache Telemetry

The service tracks cache performance:

interface CacheTelemetry {
  hits: number      // Cache hits
  misses: number    // Cache misses
  errors: number    // Backend errors
  lastReset: number // Last telemetry reset timestamp
}

Telemetry logged every 5 minutes:

[Governance] Cache telemetry (last 300s): hits=450, misses=50, errors=5, hit_rate=0.82

Fail-Closed Behavior

Security Principle

**Never allow an action without explicit backend approval.**

If the backend is unavailable, the frontend MUST deny the action rather than allow it.

Implementation

try {
  const response = await fetch('/api/v1/agent-governance/evaluate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ tenant_id, agent_id, action_type })
  })

  if (!response.ok) {
    // Backend error -> DENY
    return {
      allowed: false,
      reason: 'Backend governance service unavailable',
      requires_approval: true,
      maturity_level: 'student',
      complexity: 0
    }
  }

  const result = await response.json()
  return result

} catch (error) {
  // Network error -> DENY
  return {
    allowed: false,
    reason: 'Backend governance service unavailable',
    requires_approval: true,
    maturity_level: 'student',
    complexity: 0
  }
}

Error Scenarios

Error TypeFrontend ResponseCache TTL
**Backend 500**allowed: false10s
**Network Timeout**allowed: false10s
**Invalid Response**allowed: false10s
**Tenant Not Found**allowed: falseN/A (404)

Usage Examples

Frontend Usage

import { AgentGovernanceService } from '@/lib/ai/agent-governance'

// Initialize service
const db = new DatabaseService()
const governance = new AgentGovernanceService(db)

// Check if agent can perform action
const result = await governance.canPerformAction(
  'tenant_123',
  'agent_sales_bot',
  'delete'
)

if (result.allowed) {
  // Execute action
  await executeDeleteOperation()
} else {
  // Show error or request approval
  showMessage(result.reason)
  if (result.requires_approval) {
    await requestApproval(result)
  }
}

With Context

const result = await governance.canPerformAction(
  'tenant_123',
  'agent_sales_bot',
  'send_email',
  { recipient: 'user@example.com', subject: 'Important' }
)

Backend Usage (Python)

from core.agent_governance_service import AgentGovernanceService

# Initialize service
governance = AgentGovernanceService(db)

# Check if agent can perform action
result = await governance.enforce_action(
    agent_id='agent_sales_bot',
    action_type='delete',
    action_details={'resource': 'document_789'}
)

if result['proceed']:
    # Execute action
    await execute_delete()
else:
    # Log denial
    logger.info(f"Action denied: {result['reason']}")

Maturity Levels

LevelConfidenceMax ComplexityAuto-ApproveDescription
**student**0.0-0.51 (Read-only)NoLearning from examples
**intern**0.5-0.72 (Analysis)NoCan suggest, needs approval
**supervised**0.7-0.93 (Mutation)PartialLive monitoring required
**autonomous**0.9-1.04 (Critical)YesFull autonomy

Action Complexity

ComplexityActionsMaturity Required
**1 (Low)**search, read, list, get, fetch, summarizeStudent+
**2 (Medium-Low)**analyze, suggest, draft, generate, recommendIntern+
**3 (Medium)**create, update, send_email, post_message, scheduleSupervised+
**4 (High)**delete, execute, deploy, transfer, payment, approveAutonomous only

Troubleshooting

Backend Unavailable

**Symptom:** All actions denied with "Backend governance service unavailable"

**Debug Steps:**

  1. Check Python backend health: curl http://localhost:8000/health/live
  2. Check Next.js API logs for proxy errors
  3. Verify network connectivity between Next.js and Python

**Solution:**

  • Start Python backend: cd backend-saas && uvicorn main:app --reload
  • Check backend logs for errors
  • Verify internal networking configuration

Cache Issues

**Symptom:** Stale governance decisions (old maturity level used)

**Debug Steps:**

  1. Check cache telemetry logs
  2. Verify cache TTL (should be 30s)
  3. Check if agent maturity changed recently

**Solution:**

  • Wait 30s for cache expiry
  • Manually invalidate cache by updating agent score
  • Restart frontend to clear cache

Permission Denied

**Symptom:** Action denied despite correct maturity level

**Debug Steps:**

  1. Check agent status in database: SELECT status FROM agent_registry WHERE id = 'agent_abc'
  2. Check action complexity mapping
  3. Review backend governance logs

**Solution:**

  • Promote agent to higher maturity level
  • Reduce action complexity (use alternative approach)
  • Request manual approval

Performance

Metrics

MetricTargetCurrent
**Cache Hit Rate**>50%~80% (measured in production)
**API Latency (p50)**<100ms~50ms
**API Latency (p99)**<500ms~200ms
**Cache Lookup**<1ms<0.1ms

Optimization Tips

  1. **Batch Governance Checks:** Check multiple actions in parallel
  2. **Prefetch:** Check governance before user interaction
  3. **Cache Warming:** Populate cache with common actions on startup
  4. **Monitoring:** Track cache hit rate to optimize TTL

Security

Threat Model

ThreatMitigation
**Client Bypass**Backend is authoritative, frontend cannot override
**Cache Poisoning**Cache is read-only, short TTL (30s)
**Tenant Leakage**All queries filtered by tenant_id
**Agent Spoofing**Reserved names blocked (admin, root, system)
**Backend Down**Fail-closed denies all actions

Audit Trail

All governance decisions logged:

{
  "event": "governance_check",
  "timestamp": "2026-04-12T20:00:00.000Z",
  "tenant_id": "tenant_123",
  "agent_id": "agent_sales_bot",
  "action_type": "delete",
  "decision": "DENIED",
  "reason": "Agent intern lacks maturity for delete (Req: autonomous)",
  "maturity": "intern",
  "complexity": 4,
  "latency_ms": 45
}

Migration Notes

From Client-Side Governance (v1.0)

**Changes:**

  • ✅ Removed ACTION_COMPLEXITY constant from frontend
  • ✅ Removed MATURITY_REQUIREMENTS constant from frontend
  • ✅ Removed RESERVED_AGENT_NAMES validation from frontend
  • ✅ Refactored canPerformAction() to call backend API
  • ✅ Added client-side caching (30s TTL)
  • ✅ Implemented fail-closed behavior

**Benefits:**

  • Single source of truth (backend)
  • Real-time governance updates
  • No code duplication
  • Enhanced security (server-side validation)

Rollback Plan

If issues arise, revert to client-side governance:

  1. Restore ACTION_COMPLEXITY and MATURITY_REQUIREMENTS constants
  2. Revert canPerformAction() to client-side logic
  3. Remove backend API calls
  4. Clear client-side cache
  • **Backend API Contracts:** backend-saas/docs/GOVERNANCE_API_CONTRACTS.md
  • **Agent Governance Service:** backend-saas/core/agent_governance_service.py
  • **Frontend Service:** src/lib/ai/agent-governance.ts
  • **API Route:** src/app/api/v1/agent-governance/evaluate/route.ts

Changelog

v2.0 (2026-04-12)

**Breaking Changes:**

  • Frontend now requires backend API for governance decisions
  • Client-side constants removed, use backend API instead

**New Features:**

  • Unified frontend-backend governance architecture
  • Client-side caching with 30s TTL
  • Fail-closed behavior on backend errors
  • Cache telemetry tracking
  • Audit logging with latency

**Bug Fixes:**

  • Fixed code duplication between frontend and backend
  • Fixed stale governance decisions in frontend cache
  • Fixed security risk of client-side bypass

v1.0 (Legacy)

  • Client-side governance with hardcoded constants
  • No backend API integration
  • Vulnerable to client manipulation