ATOM Documentation

← Back to App

Architecture Overview

Complete system architecture for ATOM SaaS - a multi-tenant AI agent platform with cognitive architectures, learning engines, and enterprise-grade governance.


High-Level Architecture

ATOM SaaS follows a layered architecture with clear separation of concerns:


Technology Stack

Frontend (Presentation Layer)

Web Application:

  • Framework: Next.js 14 (App Router)
  • Language: TypeScript 5.x
  • UI Library: React 18
  • Styling: Tailwind CSS
  • Components: Radix UI primitives
  • Editor: Monaco (VS Code editor)
  • State: React Context + Server Components

Desktop Application:

  • Framework: Tauri 2.0
  • Language: Rust (backend), JavaScript (frontend)
  • Features: Terminal access, Docker integration, local execution
  • Security: Sandboxed execution with permission prompts

Backend (API Layer)

Unified Backend:

  • Runtime: Managed Compute Node running dual processes via supervisord
  • Frontend Port: 3000 (Next.js)
  • Backend Port: 8000 (FastAPI)
  • Internal Comm: Next.js proxies /api/v1 requests to local FastAPI instance

Data Layer

Primary Database:

  • Database: PostgreSQL 15+
  • Extension: pgvector (vector similarity)
  • Security: Row-Level Security (RLS) for tenant isolation
  • Hosting: Neon PostgreSQL (serverless)

Vector Database:

  • Database: LanceDB
  • Purpose: Semantic search for World Model
  • Storage: Local file system (persistent volumes)

Caching:

  • Cache: Redis
  • Purpose: Rate limiting, session caching, pub/sub
  • Hosting: Upstash Redis

File Storage:

  • Storage: AWS S3
  • Purpose: User uploads, agent artifacts, canvas exports
  • Isolation: Tenant-specific prefixes (s3://atom-saas/{tenant_id}/)

Infrastructure

Hosting:

  • Platform: ATOM Cloud Platform
  • Regions: Multiple regions for low latency (Anycast network)
  • Features: Auto-scaling, health checks, rolling deployments

CI/CD:

  • Pipeline: GitHub Actions
  • Testing: 212 E2E tests (100% compliance)
  • Deployment: Automated on merge to main

Brain Systems Architecture

The brain systems are the core intelligence layer that enables human-like agent behavior:

Brain System Responsibilities

1. Cognitive Architecture

  • Human-like reasoning process
  • Attention allocation
  • Memory recall coordination
  • Language processing
  • Problem-solving strategies

2. Learning Engine

  • Experience recording (RLHF)
  • Pattern recognition
  • Adaptation generation
  • Behavior modification
  • Performance optimization

3. World Model

  • Long-term memory storage
  • Semantic similarity search
  • Experience recall by relevance
  • Canvas context tracking
  • Feedback-aware retrieval

4. Reasoning Engine

  • Proactive intelligence
  • Intervention generation
  • Opportunity identification
  • Automation suggestions
  • Trend analysis

5. Cross-System Reasoning

  • Multi-agent coordination
  • Cross-system data correlation
  • Complex problem decomposition
  • Knowledge synthesis

6. Alpha Evolver

  • Autonomous code mutation
  • Sandbox-based variant testing
  • Workflow performance optimization
  • Self-improving toolsets

7. Agent Governance

  • Permission validation
  • Maturity Calibration (AI-driven)
  • Safety checks
  • Audit logging
  • Rate limiting

Detailed Brain Systems →


Multi-Tenancy Architecture

Tenant isolation is implemented at multiple layers for enterprise-grade security:

Tenant Isolation Layers

1. Subdomain Routing

  • Each tenant gets unique subdomain: tenant.atomagentos.com
  • Custom domains supported
  • Subdomain mapped to tenant_id in database

2. Row-Level Security (RLS)

-- RLS Policy Example ALTER TABLE agents ENABLE ROW LEVEL SECURITY; CREATE POLICY tenant_isolation ON agents FOR ALL USING (tenant_id = current_setting('app.current_tenant_id')::UUID);

3. S3 Prefix Isolation

  • Each tenant gets dedicated S3 prefix
  • Path format: s3://atom-saas/{tenant_id}/uploads/
  • Bucket policies enforce prefix access

4. Redis Namespace

  • Keys namespaced: tenant:{tenant_id}:rate_limit
  • Pub/sub channels scoped: tenant:{tenant_id}:events
  • Session isolation guaranteed

5. Application-Level Filtering

  • All queries include WHERE tenant_id = ?
  • API responses filter tenant data
  • Background jobs scoped to tenant

Detailed Multi-Tenancy →


Agent Execution Flow

Complete request lifecycle from user input to agent response:

Execution Stages

1. Request Validation

  • Authenticate user session
  • Extract tenant context
  • Validate request schema

2. Governance Checks

  • Rate limit validation (per-tenant)
  • Permission check (agent maturity)
  • Safety guardrails

3. Context Resolution

  • Load agent configuration
  • Resolve task context
  • Fetch relevant settings

4. Cognitive Processing

  • Recall relevant experiences (World Model)
  • Generate reasoning chain
  • Determine optimal approach

5. Skill Execution

  • Load required skills
  • Execute actions
  • Handle integration calls

6. Learning & Recording

  • Record experience to World Model
  • Extract learnings
  • Update patterns

7. Response Generation

  • Format response
  • Include metadata
  • Return to user

Data Flow Diagrams

Agent Creation Flow

Graduation Exam Flow

Skill Execution Flow


Security Architecture

Multiple security layers protect tenant data and ensure safe agent behavior:

Security Layers

1. Network Security

  • TLS 1.3 for all connections
  • DDoS protection (Global edge network)
  • IP whitelisting (enterprise)

2. Authentication

  • JWT-based sessions
  • OAuth 2.0 for integrations
  • API key support (BYOK)

3. Tenant Isolation

  • Subdomain-based routing
  • Row-Level Security (PostgreSQL)
  • Storage prefix isolation
  • Cache namespace separation

4. Agent Governance

  • Maturity-based permissions
  • Real-time permission validation
  • Constitutional guardrails
  • Comprehensive audit logging

5. Abuse Protection

  • Per-tenant rate limits
  • Resource quotas (storage, API calls)
  • Anomaly detection
  • Automatic throttling

Scalability Architecture

Horizontal and vertical scaling strategies:

Horizontal Scaling

Auto-Scaling:

  • CPU-based scaling triggers
  • Memory-based scaling triggers
  • Request queue-based scaling
  • Regional distribution

Vertical Scaling

Database:

  • Connection pooling (PgBouncer)
  • Read replicas for analytics
  • Partitioned tables (by tenant)
  • Index optimization

Cache:

  • Redis cluster for high availability
  • Tiered caching (L1: memory, L2: Redis)
  • Intelligent cache invalidation

Monitoring & Observability

Detailed Monitoring →


Technology Rationale

Why Next.js?

  • React Server Components for performance
  • Built-in API routes for backend logic
  • Excellent developer experience
  • Strong TypeScript support
  • SEO optimization

Why FastAPI?

  • Native async support
  • Automatic OpenAPI documentation
  • High performance (comparable to Node.js)
  • Strong type validation (Pydantic)
  • Easy testing

Why PostgreSQL?

  • ACID compliance
  • Row-Level Security
  • pgvector for vector similarity
  • Excellent reliability
  • Strong ecosystem

Why Neon?

  • Serverless PostgreSQL
  • Auto-scaling storage
  • Branch-based development
  • Built-in connection pooling
  • Competitive pricing

Why LanceDB?

  • Embedded vector database
  • High-performance semantic search
  • Python-native
  • No separate infrastructure
  • Open source

Why Redis?

  • In-memory performance
  • Rich data structures
  • Pub/sub support
  • Rate limiting capabilities
  • Session management

Why ATOM Managed Infrastructure?

  • Simple deployment model
  • Built-in load balancing
  • Multi-region support
  • Integrated security
  • Optimized performance

Architecture Patterns Used

1. Layered Architecture

  • Clear separation of concerns
  • Each layer has specific responsibility
  • Easy to test and maintain

2. Event-Driven Architecture

  • Agent executions trigger events
  • Background jobs process asynchronously
  • Real-time updates via pub/sub

3. Multi-Tenancy Patterns

  • Subdomain-based routing
  • Row-Level Security
  • Tenant-scoped caching
  • Isolated storage

4. Plugin Architecture

  • Skill registry for dynamic loading
  • Integration adapters
  • Extensible brain systems

5. CQRS (Command Query Responsibility Segregation)

  • Separate read and write models
  • Optimized for each use case
  • Complex queries use read replicas

Performance Considerations

Database Optimization

  • Connection pooling (max 20 connections)
  • Read replicas for analytics queries
  • Indexed foreign keys
  • Partitioned tables by tenant

Caching Strategy

  • L1 cache: In-memory (frequently accessed)
  • L2 cache: Redis (shared across instances)
  • Cache TTL: 5-60 minutes depending on data
  • Invalidation on updates

API Performance

  • Response time target: < 200ms (p95)
  • Rate limits: 50/day (free), 5000/day (team)
  • Pagination for large result sets
  • Compression enabled (gzip)

Background Jobs

  • Async task processing
  • Job queues (Redis-based)
  • Automatic retries with exponential backoff
  • Dead letter queue for failed jobs

Next Steps

Explore Specific Systems:

Implementation Guides:


Last Updated: 2025-02-06 Architecture Version: 8.0 (Production Ready)