ATOM Documentation

← Back to App

Federation Protocol

Overview

The ATOM Federation Protocol enables cross-instance communication between self-hosted ATOM deployments, allowing agents, skills, and domain templates to be shared across organizational boundaries while maintaining security, authentication, and multi-tenant isolation.

Architecture Diagram

Protocol Specification

Version

Current Version: 1.0.0

Compatibility:

  • Instances running v13.0+ can federate with each other
  • Backward compatible with v12.x for read operations
  • Forward compatible with schema versioning

Base URL

https://atom-saas.fly.dev/api/federation

For self-hosted instances:

https://your-instance.com/api/federation

Authentication

API Key Authentication

All federation requests require a shared secret API key:

Header:

X-Federation-Key: sk-federation-shared-key-...

Key Rotation:

# Set multiple keys (comma-separated) FEDERATION_API_KEY=sk-federation-key-1,sk-federation-key-2,sk-federation-key-3 # First key is primary for validation # Remaining keys support rotation

Security:

  • Keys should be exchanged via secure channels (e.g., password managers, secret management)
  • Keys should be rotated regularly (recommended: quarterly)
  • Never include keys in code repositories
  • Use environment variables or secret management systems

Key Validation:

def verify_federation_key(request: Request, x_federation_key: str = Header(...)): """ Verify federation API key from X-Federation-Key header. Supports key rotation - checks against all valid keys from environment. Logs all auth attempts for security auditing. """ client_ip = request.client.host if request.client else "unknown" # Check against all valid keys (supports rotation) if x_federation_key not in _FEDERATION_API_KEYS: logger.warning( "federation_auth_failed", extra={ "action": "auth_failed", "client_ip": client_ip, "timestamp": datetime.now(UTC).isoformat(), "reason": "invalid_api_key" } ) raise HTTPException(status_code=403, detail="Invalid Federation Key") # Log successful auth logger.info( "federation_auth_success", extra={ "action": "auth_success", "client_ip": client_ip, "timestamp": datetime.now(UTC).isoformat() } ) return True

Rate Limiting

Limits:

  • 100 requests per minute per IP address
  • Sliding window using Redis
  • Graceful degradation if Redis unavailable

Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1708412800

Exceeded Response:

{ "detail": "Rate limit exceeded. Please try again later.", "code": "RATE_LIMIT_EXCEEDED" }

Implementation:

async def rate_limit_federation(request: Request): """Rate limiting for federation endpoints using Redis.""" if not cache_service.enabled: return # Graceful degradation client_ip = request.client.host if request.client else "unknown" rate_limit = 100 window_seconds = 60 current_minute = datetime.now(UTC).strftime("%Y-%m-%d:%H:%M") rate_limit_key = f"federation:rate_limit:{client_ip}:{current_minute}" current_count = cache_service.get(rate_limit_key) or 0 if int(current_count) >= rate_limit: raise HTTPException( status_code=429, detail="Rate limit exceeded", headers={"Retry-After": str(window_seconds)} ) cache_service.set(rate_limit_key, int(current_count) + 1, ttl=window_seconds)

API Endpoints

1. List Federated Agents

Endpoint: GET /api/federation/agents

Description: Retrieve a paginated list of public, approved agent templates available for federation.

Authentication: Required (X-Federation-Key)

Rate Limited: Yes (100 req/min per IP)

Parameters:

ParameterTypeRequiredDescription
limitintegerNoPage size (1-100). Default: 50
offsetintegerNoPage offset. Default: 0

Example Request:

curl -X GET "https://atom-saas.fly.dev/api/federation/agents?limit=10&offset=0" \ -H "X-Federation-Key: sk-federation-shared-key-..."

Response:

{ "agents": [ { "id": "123e4567-e89b-12d3-a456-426614174000", "name": "Sales Assistant", "description": "Automated sales outreach and lead qualification", "category": "sales", "price": 0.0, "version": "1.2.0", "rating": 4.5 } ], "total": 150, "limit": 10, "offset": 0, "has_more": true }

Error Responses:

401 Unauthorized:

{ "detail": "Missing X-Federation-Key header" }

403 Forbidden:

{ "detail": "Invalid Federation Key" }

429 Too Many Requests:

{ "detail": "Rate limit exceeded. Please try again later." }

2. Get Agent Bundle

Endpoint: GET /api/federation/agents/{template_id}

Description: Download the complete agent template bundle including configuration, capabilities, canvas UI schemas, and anonymized memory for local installation.

Authentication: Required (X-Federation-Key)

Rate Limited: Yes (100 req/min per IP)

Parameters:

ParameterTypeRequiredDescription
template_idstringYesAgent template UUID

Example Request:

curl -X GET "https://atom-saas.fly.dev/api/federation/agents/123e4567-e89b-12d3-a456-426614174000" \ -H "X-Federation-Key: sk-federation-shared-key-..."

Response:

{ "id": "123e4567-e89b-12d3-a456-426614174000", "name": "Sales Assistant", "description": "Automated sales outreach and lead qualification", "category": "sales", "price": 0.0, "version": "1.2.0", "configuration": { "model": "gpt-4", "temperature": 0.7, "max_tokens": 2000, "system_prompt": "You are a helpful sales assistant...", "capabilities": ["email", "calendar", "crm"] }, "capabilities": [ "skill-uuid-1", "skill-uuid-2", "skill-uuid-3" ], "canvas_ui_schemas": [ { "type": "form", "fields": [ {"name": "prospect_email", "type": "email", "required": true}, {"name": "message", "type": "textarea", "required": true} ] } ], "anonymized_memory_bundle": { "heuristics": [ { "error_type": "email_bounce", "error_code": "BOUNCE", "resolution": "Verify email format and domain validity before sending" }, { "error_type": "calendar_conflict", "error_code": "CONFLICT", "resolution": "Check calendar availability and suggest alternative times" } ] } }

Error Responses:

404 Not Found:

{ "detail": "Agent Template not found" }

429 Too Many Requests:

{ "detail": "Rate limit exceeded. Please try again later." }

Security Model

Threat Model

Protected Assets:

  1. Agent configurations (may contain business logic)
  2. Anonymized memory bundles (heuristics, patterns)
  3. Canvas UI schemas (IP in UX design)
  4. Capability lists (skill dependencies)

Threats:

  1. Unauthorized access to private templates
  2. API key leakage
  3. Rate limit bypass
  4. Data injection
  5. Cross-tenant data leakage
  6. Replay attacks

Security Measures

1. Authentication & Authorization

Shared Secret API Keys:

  • High-entropy random strings (256-bit minimum)
  • Environment-based configuration
  • No hardcoding in source code
  • Regular rotation schedule

Key Exchange Process:

# Generate new key openssl rand -base64 32 # Exchange via secure channel (password manager, secret management) # Add to environment FEDERATION_API_KEY=sk-federation-<generated-key> # Restart services

2. Multi-Tenant Isolation

Query Filtering:

# All queries filter by is_public and is_approved query = db.query(AgentTemplate).filter( and_( AgentTemplate.is_public == True, AgentTemplate.is_approved == True ) )

Tenant Context:

  • Published templates retain author tenant_id
  • Installed templates belong to installer's tenant_id
  • No cross-tenant data access

3. Data Anonymization

PII Redaction:

email_pattern = re.compile(r"[\w\.-]+@[\w\.-]+\.\w+") uuid_pattern = re.compile(r"[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}") for res in raw_resolutions: text = res.resolution_attempted or "" text = email_pattern.sub("[REDACTED_EMAIL]", text) text = uuid_pattern.sub("[REDACTED_ID]", text)

Semantic Sanitization:

  • Optional LLM-based rewriting
  • Remove business-specific context
  • Preserve technical patterns

4. Audit Logging

Structured Logging:

# Log all federation requests logger.info( "federation_request", extra={ "action": "list_agents", "client_ip": client_ip, "limit": limit, "offset": offset, "timestamp": datetime.now(UTC).isoformat() } ) # Log successful responses logger.info( "federation_response", extra={ "action": "list_agents_success", "client_ip": client_ip, "agent_count": len(results), "total_available": total, "timestamp": datetime.now(UTC).isoformat() } ) # Log failures logger.warning( "federation_request_failed", extra={ "action": "get_bundle_not_found", "client_ip": client_ip, "template_id": template_id, "timestamp": datetime.now(UTC).isoformat() } )

Log Fields:

  • Action type
  • Client IP
  • Timestamp
  • Request parameters
  • Response status
  • Error details

5. Rate Limiting

Implementation:

  • Redis-based sliding window
  • Per-IP tracking
  • Graceful degradation
  • Configurable limits

Bypass Prevention:

  • Client IP detection (via request.client.host)
  • No IP spoofing (TCP layer)
  • No user-agent based bypass

6. Input Validation

Parameter Validation:

# Validate UUID format if not is_valid_uuid(template_id): raise HTTPException(status_code=400, detail="Invalid template ID") # Validate pagination limits if limit < 1 or limit > 100: raise HTTPException(status_code=400, detail="Limit must be between 1 and 100") # Validate offset if offset < 0: raise HTTPException(status_code=400, detail="Offset must be non-negative")

SQL Injection Prevention:

  • SQLAlchemy ORM parameterized queries
  • No raw SQL with user input
  • Type-safe query building

Security Best Practices

For Instance Operators

  1. Key Management

    • Store keys in environment variables
    • Use secret management systems (Vault, AWS Secrets Manager)
    • Rotate keys quarterly
    • Never commit keys to git
    • Use different keys per environment
  2. Network Security

    • Use HTTPS in production
    • Configure firewall rules
    • Monitor access logs
    • Set up intrusion detection
  3. Access Control

    • Limit federation to trusted instances
    • IP whitelisting (optional)
    • VPN requirements (optional)
    • Regular audits
  4. Monitoring

    • Track federation request volume
    • Alert on unusual patterns
    • Monitor rate limit violations
    • Review authentication failures

For Developers

  1. API Key Handling

    # Good: Environment variable API_KEY = os.environ.get("FEDERATION_API_KEY") # Bad: Hardcoded API_KEY = "sk-federation-secret-key-123"
  2. Error Handling

    # Good: Generic error messages if not valid_key: raise HTTPException(status_code=403, detail="Invalid Federation Key") # Bad: Information leakage if not valid_key: raise HTTPException(status_code=403, detail=f"Key {key} not in {keys}")
  3. Logging

    # Good: Structured logging logger.info("federation_request", extra={"action": "list_agents"}) # Bad: Sensitive data in logs logger.info(f"Request with key: {api_key}")

Error Handling

Error Response Format

{ "detail": "Human-readable error message", "code": "ERROR_CODE", "timestamp": "2024-02-19T00:00:00Z" }

Common Errors

CodeStatusDescription
MISSING_API_KEY401X-Federation-Key header missing
INVALID_API_KEY403Federation key invalid or revoked
RATE_LIMIT_EXCEEDED429Request rate limit exceeded
TEMPLATE_NOT_FOUND404Agent template not found
INVALID_PARAMETER400Request parameter invalid
SERVER_ERROR500Internal server error

Retry Strategy

429 Too Many Requests:

  • Exponential backoff: 1s, 2s, 4s, 8s, 16s
  • Maximum retries: 5
  • Use Retry-After header if present

5xx Server Errors:

  • Exponential backoff with jitter
  • Maximum retries: 3
  • Alert monitoring system

4xx Client Errors:

  • Do not retry
  • Fix request parameters
  • Verify authentication

Cross-Instance Communication

Discovery Pattern

Installation Flow

Federation Source Tracking

Installation Record:

installation = AgentInstallation( tenant_id=tenant_id, template_id=template.id, instantiated_agent_id=new_agent.id, installed_version=template.version, federation_source="remote-instance.com", # Track source federation_timestamp=datetime.now(UTC) )

Benefits:

  • Trace installation origins
  • Monitor federation usage
  • Identify popular templates
  • Detect abuse patterns

Performance Optimization

Caching Strategy

Redis Cache:

# Cache agent lists (5 minutes) cache_key = f"federation:agents:{limit}:{offset}" cached = cache_service.get(cache_key) if not cached: agents = fetch_from_db(limit, offset) cache_service.set(cache_key, agents, ttl=300)

Cache Invalidation:

  • On template approval
  • On template update
  • On template deletion
  • Time-based expiry

Database Optimization

Indexes:

-- Federation queries CREATE INDEX idx_agent_templates_federation ON agent_templates(is_public, is_approved, created_at DESC) WHERE is_public = true AND is_approved = true; -- Covering index for list endpoint CREATE INDEX idx_agent_templates_list ON agent_templates(is_public, is_approved, rating, installs) WHERE is_public = true AND is_approved = true;

Query Optimization:

# Efficient pagination query = ( db.query(AgentTemplate.id, AgentTemplate.name, AgentTemplate.description, AgentTemplate.category, AgentTemplate.price, AgentTemplate.version, AgentTemplate.rating) .filter(and_(AgentTemplate.is_public == True, AgentTemplate.is_approved == True)) .order_by(AgentTemplate.created_at.desc()) .limit(limit) .offset(offset) )

Bandwidth Optimization

Response Compression:

  • Gzip compression enabled
  • Typical compression ratio: 5-10x
  • Reduces bandwidth usage significantly

Field Selection:

# List endpoint: Minimal fields agents = query.with_entities(AgentTemplate.id, AgentTemplate.name, ...) # Bundle endpoint: Full fields bundle = query.with_entities(AgentTemplate.*)

Monitoring & Observability

Key Metrics

Request Metrics:

  • Request rate (per endpoint)
  • Response times (p50, p95, p99)
  • Error rate (by status code)
  • Rate limit violations

Business Metrics:

  • Templates downloaded
  • Unique instances federating
  • Popular templates
  • Federation sources

Security Metrics:

  • Authentication failures
  • Invalid API keys
  • Suspicious IP addresses
  • Rate limit violations

Alerting

Critical Alerts:

  • Error rate > 5%
  • Response time p95 > 1s
  • Authentication failures > 10/min
  • Rate limit violations > 100/min

Warning Alerts:

  • Error rate > 1%
  • Response time p95 > 500ms
  • Unusual request patterns
  • Cache hit rate < 80%

Logging

Log Levels:

  • INFO: Successful requests
  • WARNING: Authentication failures, rate limits
  • ERROR: Server errors, database failures
  • CRITICAL: Security incidents

Log Retention:

  • 30 days for INFO/WARNING
  • 90 days for ERROR/CRITICAL
  • Archive for security incidents

Testing

Unit Tests

Authentication:

def test_valid_federation_key(): """Test valid federation key is accepted.""" response = client.get( "/api/federation/agents", headers={"X-Federation-Key": "valid-key"} ) assert response.status_code == 200

Rate Limiting:

def test_rate_limit_exceeded(): """Test rate limit enforcement.""" for _ in range(101): response = client.get( "/api/federation/agents", headers={"X-Federation-Key": "valid-key"} ) assert response.status_code == 429

Integration Tests

Cross-Instance:

def test_federation_workflow(): """Test complete federation workflow.""" # Instance A: Publish agent template_id = publish_agent(instance_a) # Instance B: Discover agent agents = list_agents(instance_b) assert template_id in [a["id"] for a in agents] # Instance B: Download and install bundle = get_bundle(instance_b, template_id) installed = install_bundle(bundle) assert installed["status"] == "installed"

Security Tests

Authentication Bypass:

def test_no_auth_key(): """Test request without API key is rejected.""" response = client.get("/api/federation/agents") assert response.status_code == 401 def test_invalid_auth_key(): """Test invalid API key is rejected.""" response = client.get( "/api/federation/agents", headers={"X-Federation-Key": "invalid-key"} ) assert response.status_code == 403

SQL Injection:

def test_sql_injection_prevention(): """Test SQL injection attempts are blocked.""" response = client.get( "/api/federation/agents", params={"limit": "1; DROP TABLE agents; --"}, headers={"X-Federation-Key": "valid-key"} ) assert response.status_code == 400

Best Practices

For Federation Consumers

  1. Respect Rate Limits

    • Implement exponential backoff
    • Cache responses locally
    • Use pagination for large lists
  2. Secure API Keys

    • Store in environment variables
    • Never commit to git
    • Rotate regularly
  3. Monitor Usage

    • Track request volume
    • Alert on unusual patterns
    • Review error rates
  4. Handle Errors Gracefully

    • Implement retry logic
    • Log errors for debugging
    • Provide user feedback

For Federation Providers

  1. Validate All Input

    • Parameter type checking
    • Range validation
    • SQL injection prevention
  2. Log Everything

    • Structured logging
    • Include context
    • Preserve sensitive data
  3. Monitor Performance

    • Response times
    • Error rates
    • Resource usage
  4. Plan for Scale

    • Horizontal scaling
    • Database sharding
    • Cache optimization

Troubleshooting

Common Issues

1. 403 Forbidden: Invalid Federation Key

  • Cause: Incorrect or missing API key
  • Solution: Verify X-Federation-Key header matches environment variable

2. 429 Too Many Requests

  • Cause: Rate limit exceeded
  • Solution: Implement exponential backoff, check Retry-After header

3. 404 Not Found: Agent Template

  • Cause: Template not public, not approved, or doesn't exist
  • Solution: Verify template status, check template_id

4. Slow Response Times

  • Cause: Database query performance, cache miss
  • Solution: Check database indexes, cache hit rates, query optimization

5. High Memory Usage

  • Cause: Large bundles, memory leaks
  • Solution: Implement pagination, monitor memory, profile code

Debug Mode

Enable Debug Logging:

import logging logging.getLogger("atom").setLevel(logging.DEBUG)

Check Federation Status:

# Test federation endpoint curl -v -X GET "https://atom-saas.fly.dev/api/federation/agents?limit=1" \ -H "X-Federation-Key: your-key" # Check rate limit curl -I -X GET "https://atom-saas.fly.dev/api/federation/agents" \ -H "X-Federation-Key: your-key"

Future Enhancements

Planned Features

  1. Webhook Federation

    • Push notifications for new templates
    • Real-time updates
    • Event-driven synchronization
  2. Peer Discovery

    • Automatic instance discovery
    • Peer-to-peer federation
    • Mesh networking
  3. Reputation Scoring

    • Instance reputation scores
    • Quality metrics
    • Trust networks
  4. Version Management

    • Template versioning
    • Migration guides
    • Backward compatibility
  5. Advanced Security

    • Mutual TLS
    • IP whitelisting
    • VPN requirements
    • Signature verification

References