Atom AI Labs - AI-Powered Multi-Tenant Platform

Governance Architecture - Unified Frontend-Backend System

**Version:** 2.0

**Last Updated:** 2026-04-12

**Status:** Production-Ready

Overview

The governance system enforces agent maturity-based access control, ensuring AI agents only perform actions appropriate to their experience level. The architecture follows a **unified frontend-backend model** where the Python backend is the single source of truth for all governance decisions.

Key Principles

**Backend Authority**: All governance decisions made by Python backend
**Fail-Closed**: Deny actions if backend unavailable (never allow by default)
**Client-Side Caching**: 30-second TTL to reduce API load
**Audit Trail**: All decisions logged with latency tracking
**Tenant Isolation**: Multi-tenant safety via tenant_id filtering

Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                         Frontend (Next.js)                      │
│                                                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  AgentGovernanceService                                  │  │
│  │  - canPerformAction()                                    │  │
│  │  - Client-side cache (30s TTL)                           │  │
│  │  - Fail-closed on errors                                 │  │
│  │  - Telemetry (hits/misses/errors)                        │  │
│  └───────────────┬──────────────────────────────────────────┘  │
│                  │                                               │
│                  │ fetch('/api/v1/agent-governance/evaluate')   │
│                  ▼                                               │
└──────────────────┼───────────────────────────────────────────────┘
                   │
                   │ HTTP POST
                   │
┌──────────────────┼───────────────────────────────────────────────┐
│                  ▼                                               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Next.js API Route                                       │  │
│  │  /api/v1/agent-governance/evaluate                       │  │
│  │  - Validates request (tenant_id, agent_id, action_type)  │  │
│  │  - Proxies to Python backend                             │  │
│  │  - Returns fail-closed response on errors                │  │
│  └───────────────┬──────────────────────────────────────────┘  │
│                  │                                               │
│                  │ HTTP POST (internal)                          │
│                  ▼                                               │
└──────────────────┼───────────────────────────────────────────────┘
                   │
                   │
┌──────────────────┼───────────────────────────────────────────────┐
│                  ▼                                               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Python Backend (FastAPI)                               │  │
│  │  /api/v1/agent-governance/enforce-action                 │  │
│  │  - AgentGovernanceService                                │  │
│  │  - ACTION_COMPLEXITY mapping                             │  │
│  │  - MATURITY_REQUIREMENTS                                 │  │
│  │  - Database queries (agent_registry)                     │  │
│  │  - Governance decision logic                             │  │
│  └───────────────┬──────────────────────────────────────────┘  │
│                  │                                               │
│                  │                                               │
│  ┌───────────────┴──────────────────────────────────────────┐  │
│  │  PostgreSQL Database                                     │  │
│  │  - agent_registry (status, confidence_score)             │  │
│  │  - tenant_id filtering (security)                        │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

API Contract

Frontend → Next.js API

**Endpoint:** POST /api/v1/agent-governance/evaluate

**Request:**

{
  tenant_id: string      // Required: Tenant ID for multi-tenancy
  agent_id: string       // Required: Agent to evaluate
  action_type: string    // Required: Action to perform
  context?: object       // Optional: Additional context
}

**Response (200 OK):**

{
  allowed: boolean           // Whether action is permitted
  reason: string             // Human-readable explanation
  requires_approval: boolean // Whether human approval needed
  maturity_level: 'student' | 'intern' | 'supervised' | 'autonomous'
  complexity: number         // Action complexity (1-4)
  confidence_score?: number  // Agent confidence (0.0-1.0)
  budget_remaining?: number  // Remaining budget (optional)
}

**Error Responses:**

All errors return **200 OK with allowed: false** (fail-closed):

// Backend unavailable
{
  allowed: false,
  reason: 'Backend governance service unavailable',
  requires_approval: true,
  maturity_level: 'student',
  complexity: 0
}

// Invalid agent ID
{
  allowed: false,
  reason: 'Reserved or invalid agent ID',
  requires_approval: true,
  maturity_level: 'student',
  complexity: 0
}

// Agent not found
{
  allowed: false,
  reason: 'Agent not found',
  requires_approval: true,
  maturity_level: 'student',
  complexity: 0
}

Next.js API → Python Backend

**Endpoint:** POST /api/v1/agent-governance/enforce-action

**Request:**

{
  agent_id: string
  action_type: string
  action_details?: object
}

**Headers:**

X-Tenant-ID: <tenant_id>
X-User-ID: <user_id>
Authorization: Bearer <jwt_token>

**Response:** Same as frontend response format

Client-Side Caching

Cache Implementation

// Cache key format
const cacheKey = `${tenantId}:${agentId}:${actionType}`

// Cache entry structure
interface CacheEntry {
  data: GovernanceEvaluationResult
  expiry: number  // Unix timestamp in milliseconds
}

// TTL configuration
private static CACHE_TTL_MS = 30 * 1000  // 30 seconds

Cache Behavior

Scenario	Behavior	TTL
Cache Hit	Return cached data immediately	N/A
Cache Miss	Call backend API, cache response	30s
Backend Error	Cache denied response	10s (shorter)
Agent Maturity Change	Invalidate all agent cache entries	Immediate

Cache Telemetry

The service tracks cache performance:

interface CacheTelemetry {
  hits: number      // Cache hits
  misses: number    // Cache misses
  errors: number    // Backend errors
  lastReset: number // Last telemetry reset timestamp
}

Telemetry logged every 5 minutes:

[Governance] Cache telemetry (last 300s): hits=450, misses=50, errors=5, hit_rate=0.82

Fail-Closed Behavior

Security Principle

**Never allow an action without explicit backend approval.**

If the backend is unavailable, the frontend MUST deny the action rather than allow it.

Implementation

try {
  const response = await fetch('/api/v1/agent-governance/evaluate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ tenant_id, agent_id, action_type })
  })

  if (!response.ok) {
    // Backend error -> DENY
    return {
      allowed: false,
      reason: 'Backend governance service unavailable',
      requires_approval: true,
      maturity_level: 'student',
      complexity: 0
    }
  }

  const result = await response.json()
  return result

} catch (error) {
  // Network error -> DENY
  return {
    allowed: false,
    reason: 'Backend governance service unavailable',
    requires_approval: true,
    maturity_level: 'student',
    complexity: 0
  }
}

Error Scenarios

Error Type	Frontend Response	Cache TTL
Backend 500	`allowed: false`	10s
Network Timeout	`allowed: false`	10s
Invalid Response	`allowed: false`	10s
Tenant Not Found	`allowed: false`	N/A (404)

Usage Examples

Frontend Usage

import { AgentGovernanceService } from '@/lib/ai/agent-governance'

// Initialize service
const db = new DatabaseService()
const governance = new AgentGovernanceService(db)

// Check if agent can perform action
const result = await governance.canPerformAction(
  'tenant_123',
  'agent_sales_bot',
  'delete'
)

if (result.allowed) {
  // Execute action
  await executeDeleteOperation()
} else {
  // Show error or request approval
  showMessage(result.reason)
  if (result.requires_approval) {
    await requestApproval(result)
  }
}

With Context

const result = await governance.canPerformAction(
  'tenant_123',
  'agent_sales_bot',
  'send_email',
  { recipient: 'user@example.com', subject: 'Important' }
)

Backend Usage (Python)

from core.agent_governance_service import AgentGovernanceService

# Initialize service
governance = AgentGovernanceService(db)

# Check if agent can perform action
result = await governance.enforce_action(
    agent_id='agent_sales_bot',
    action_type='delete',
    action_details={'resource': 'document_789'}
)

if result['proceed']:
    # Execute action
    await execute_delete()
else:
    # Log denial
    logger.info(f"Action denied: {result['reason']}")

Maturity Levels

Level	Confidence	Max Complexity	Auto-Approve	Description
student	0.0-0.5	1 (Read-only)	No	Learning from examples
intern	0.5-0.7	2 (Analysis)	No	Can suggest, needs approval
supervised	0.7-0.9	3 (Mutation)	Partial	Live monitoring required
autonomous	0.9-1.0	4 (Critical)	Yes	Full autonomy

Action Complexity

Complexity	Actions	Maturity Required
1 (Low)	search, read, list, get, fetch, summarize	Student+
2 (Medium-Low)	analyze, suggest, draft, generate, recommend	Intern+
3 (Medium)	create, update, send_email, post_message, schedule	Supervised+
4 (High)	delete, execute, deploy, transfer, payment, approve	Autonomous only

Troubleshooting

Backend Unavailable

**Symptom:** All actions denied with "Backend governance service unavailable"

**Debug Steps:**

Check Python backend health: curl http://localhost:8000/health/live
Check Next.js API logs for proxy errors
Verify network connectivity between Next.js and Python

**Solution:**

Start Python backend: cd backend-saas && uvicorn main:app --reload
Check backend logs for errors
Verify internal networking configuration

Cache Issues

**Symptom:** Stale governance decisions (old maturity level used)

**Debug Steps:**

Check cache telemetry logs
Verify cache TTL (should be 30s)
Check if agent maturity changed recently

**Solution:**

Wait 30s for cache expiry
Manually invalidate cache by updating agent score
Restart frontend to clear cache

Permission Denied

**Symptom:** Action denied despite correct maturity level

**Debug Steps:**

Check agent status in database: SELECT status FROM agent_registry WHERE id = 'agent_abc'
Check action complexity mapping
Review backend governance logs

**Solution:**

Promote agent to higher maturity level
Reduce action complexity (use alternative approach)
Request manual approval

Performance

Metrics

Metric	Target	Current
Cache Hit Rate	>50%	~80% (measured in production)
API Latency (p50)	<100ms	~50ms
API Latency (p99)	<500ms	~200ms
Cache Lookup	<1ms	<0.1ms

Optimization Tips

**Batch Governance Checks:** Check multiple actions in parallel
**Prefetch:** Check governance before user interaction
**Cache Warming:** Populate cache with common actions on startup
**Monitoring:** Track cache hit rate to optimize TTL

Security

Threat Model

Threat	Mitigation
Client Bypass	Backend is authoritative, frontend cannot override
Cache Poisoning	Cache is read-only, short TTL (30s)
Tenant Leakage	All queries filtered by tenant_id
Agent Spoofing	Reserved names blocked (admin, root, system)
Backend Down	Fail-closed denies all actions

Audit Trail

All governance decisions logged:

{
  "event": "governance_check",
  "timestamp": "2026-04-12T20:00:00.000Z",
  "tenant_id": "tenant_123",
  "agent_id": "agent_sales_bot",
  "action_type": "delete",
  "decision": "DENIED",
  "reason": "Agent intern lacks maturity for delete (Req: autonomous)",
  "maturity": "intern",
  "complexity": 4,
  "latency_ms": 45
}

Migration Notes

From Client-Side Governance (v1.0)

**Changes:**

✅ Removed ACTION_COMPLEXITY constant from frontend
✅ Removed MATURITY_REQUIREMENTS constant from frontend
✅ Removed RESERVED_AGENT_NAMES validation from frontend
✅ Refactored canPerformAction() to call backend API
✅ Added client-side caching (30s TTL)
✅ Implemented fail-closed behavior

**Benefits:**

Single source of truth (backend)
Real-time governance updates
No code duplication
Enhanced security (server-side validation)

Rollback Plan

If issues arise, revert to client-side governance:

Restore ACTION_COMPLEXITY and MATURITY_REQUIREMENTS constants
Revert canPerformAction() to client-side logic
Remove backend API calls
Clear client-side cache

**Backend API Contracts:** backend-saas/docs/GOVERNANCE_API_CONTRACTS.md
**Agent Governance Service:** backend-saas/core/agent_governance_service.py
**Frontend Service:** src/lib/ai/agent-governance.ts
**API Route:** src/app/api/v1/agent-governance/evaluate/route.ts

Changelog

v2.0 (2026-04-12)

**Breaking Changes:**

Frontend now requires backend API for governance decisions
Client-side constants removed, use backend API instead

**New Features:**

Unified frontend-backend governance architecture
Client-side caching with 30s TTL
Fail-closed behavior on backend errors
Cache telemetry tracking
Audit logging with latency

**Bug Fixes:**

Fixed code duplication between frontend and backend
Fixed stale governance decisions in frontend cache
Fixed security risk of client-side bypass

v1.0 (Legacy)

Client-side governance with hardcoded constants
No backend API integration
Vulnerable to client manipulation

Governance Architecture - Unified Frontend-Backend System

Overview

Key Principles

Architecture Diagram

API Contract

Frontend → Next.js API

Next.js API → Python Backend

Client-Side Caching

Cache Implementation

Cache Behavior

Cache Telemetry

Fail-Closed Behavior

Security Principle

Implementation

Error Scenarios

Usage Examples

Frontend Usage

With Context

Backend Usage (Python)

Maturity Levels

Action Complexity

Troubleshooting

Backend Unavailable

Cache Issues

Permission Denied

Performance

Metrics

Optimization Tips

Security

Threat Model

Audit Trail

Migration Notes

From Client-Side Governance (v1.0)

Rollback Plan

Related Documentation

Changelog

v2.0 (2026-04-12)

v1.0 (Legacy)