Federation Protocol

Overview

The ATOM Federation Protocol enables cross-instance communication between self-hosted ATOM deployments, allowing agents, skills, and domain templates to be shared across organizational boundaries while maintaining security, authentication, and multi-tenant isolation.

Architecture Diagram

Protocol Specification

Version

Current Version: 1.0.0

Compatibility:

Instances running v13.0+ can federate with each other
Backward compatible with v12.x for read operations
Forward compatible with schema versioning

Base URL

https://atom-saas.fly.dev/api/federation

For self-hosted instances:

https://your-instance.com/api/federation

Authentication

API Key Authentication

All federation requests require a shared secret API key:

Header:

X-Federation-Key: sk-federation-shared-key-...

Key Rotation:

# Set multiple keys (comma-separated)
FEDERATION_API_KEY=sk-federation-key-1,sk-federation-key-2,sk-federation-key-3

# First key is primary for validation
# Remaining keys support rotation

Security:

Keys should be exchanged via secure channels (e.g., password managers, secret management)
Keys should be rotated regularly (recommended: quarterly)
Never include keys in code repositories
Use environment variables or secret management systems

Key Validation:

def verify_federation_key(request: Request, x_federation_key: str = Header(...)):
    """
    Verify federation API key from X-Federation-Key header.

    Supports key rotation - checks against all valid keys from environment.
    Logs all auth attempts for security auditing.
    """
    client_ip = request.client.host if request.client else "unknown"

    # Check against all valid keys (supports rotation)
    if x_federation_key not in _FEDERATION_API_KEYS:
        logger.warning(
            "federation_auth_failed",
            extra={
                "action": "auth_failed",
                "client_ip": client_ip,
                "timestamp": datetime.now(UTC).isoformat(),
                "reason": "invalid_api_key"
            }
        )
        raise HTTPException(status_code=403, detail="Invalid Federation Key")

    # Log successful auth
    logger.info(
        "federation_auth_success",
        extra={
            "action": "auth_success",
            "client_ip": client_ip,
            "timestamp": datetime.now(UTC).isoformat()
        }
    )
    return True

Rate Limiting

Limits:

100 requests per minute per IP address
Sliding window using Redis
Graceful degradation if Redis unavailable

Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1708412800

Exceeded Response:

{
  "detail": "Rate limit exceeded. Please try again later.",
  "code": "RATE_LIMIT_EXCEEDED"
}

Implementation:

async def rate_limit_federation(request: Request):
    """Rate limiting for federation endpoints using Redis."""
    if not cache_service.enabled:
        return  # Graceful degradation

    client_ip = request.client.host if request.client else "unknown"
    rate_limit = 100
    window_seconds = 60

    current_minute = datetime.now(UTC).strftime("%Y-%m-%d:%H:%M")
    rate_limit_key = f"federation:rate_limit:{client_ip}:{current_minute}"

    current_count = cache_service.get(rate_limit_key) or 0

    if int(current_count) >= rate_limit:
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded",
            headers={"Retry-After": str(window_seconds)}
        )

    cache_service.set(rate_limit_key, int(current_count) + 1, ttl=window_seconds)

API Endpoints

1. List Federated Agents

Endpoint: GET /api/federation/agents

Description: Retrieve a paginated list of public, approved agent templates available for federation.

Authentication: Required (X-Federation-Key)

Rate Limited: Yes (100 req/min per IP)

Parameters:

Parameter	Type	Required	Description
`limit`	integer	No	Page size (1-100). Default: 50
`offset`	integer	No	Page offset. Default: 0

Example Request:

curl -X GET "https://atom-saas.fly.dev/api/federation/agents?limit=10&offset=0" \
  -H "X-Federation-Key: sk-federation-shared-key-..."

Response:

{
  "agents": [
    {
      "id": "123e4567-e89b-12d3-a456-426614174000",
      "name": "Sales Assistant",
      "description": "Automated sales outreach and lead qualification",
      "category": "sales",
      "price": 0.0,
      "version": "1.2.0",
      "rating": 4.5
    }
  ],
  "total": 150,
  "limit": 10,
  "offset": 0,
  "has_more": true
}

Error Responses:

401 Unauthorized:

{
  "detail": "Missing X-Federation-Key header"
}

403 Forbidden:

{
  "detail": "Invalid Federation Key"
}

429 Too Many Requests:

{
  "detail": "Rate limit exceeded. Please try again later."
}

2. Get Agent Bundle

Endpoint: GET /api/federation/agents/{template_id}

Description: Download the complete agent template bundle including configuration, capabilities, canvas UI schemas, and anonymized memory for local installation.

Authentication: Required (X-Federation-Key)

Rate Limited: Yes (100 req/min per IP)

Parameters:

Parameter	Type	Required	Description
`template_id`	string	Yes	Agent template UUID

Example Request:

curl -X GET "https://atom-saas.fly.dev/api/federation/agents/123e4567-e89b-12d3-a456-426614174000" \
  -H "X-Federation-Key: sk-federation-shared-key-..."

Response:

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "name": "Sales Assistant",
  "description": "Automated sales outreach and lead qualification",
  "category": "sales",
  "price": 0.0,
  "version": "1.2.0",
  "configuration": {
    "model": "gpt-4",
    "temperature": 0.7,
    "max_tokens": 2000,
    "system_prompt": "You are a helpful sales assistant...",
    "capabilities": ["email", "calendar", "crm"]
  },
  "capabilities": [
    "skill-uuid-1",
    "skill-uuid-2",
    "skill-uuid-3"
  ],
  "canvas_ui_schemas": [
    {
      "type": "form",
      "fields": [
        {"name": "prospect_email", "type": "email", "required": true},
        {"name": "message", "type": "textarea", "required": true}
      ]
    }
  ],
  "anonymized_memory_bundle": {
    "heuristics": [
      {
        "error_type": "email_bounce",
        "error_code": "BOUNCE",
        "resolution": "Verify email format and domain validity before sending"
      },
      {
        "error_type": "calendar_conflict",
        "error_code": "CONFLICT",
        "resolution": "Check calendar availability and suggest alternative times"
      }
    ]
  }
}

Error Responses:

404 Not Found:

{
  "detail": "Agent Template not found"
}

429 Too Many Requests:

{
  "detail": "Rate limit exceeded. Please try again later."
}

Security Model

Threat Model

Protected Assets:

Agent configurations (may contain business logic)
Anonymized memory bundles (heuristics, patterns)
Canvas UI schemas (IP in UX design)
Capability lists (skill dependencies)

Threats:

Unauthorized access to private templates
API key leakage
Rate limit bypass
Data injection
Cross-tenant data leakage
Replay attacks

Security Measures

1. Authentication & Authorization

Shared Secret API Keys:

High-entropy random strings (256-bit minimum)
Environment-based configuration
No hardcoding in source code
Regular rotation schedule

Key Exchange Process:

# Generate new key
openssl rand -base64 32

# Exchange via secure channel (password manager, secret management)
# Add to environment
FEDERATION_API_KEY=sk-federation-<generated-key>

# Restart services

2. Multi-Tenant Isolation

Query Filtering:

# All queries filter by is_public and is_approved
query = db.query(AgentTemplate).filter(
    and_(
        AgentTemplate.is_public == True,
        AgentTemplate.is_approved == True
    )
)

Tenant Context:

Published templates retain author tenant_id
Installed templates belong to installer's tenant_id
No cross-tenant data access

3. Data Anonymization

PII Redaction:

email_pattern = re.compile(r"[\w\.-]+@[\w\.-]+\.\w+")
uuid_pattern = re.compile(r"[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}")

for res in raw_resolutions:
    text = res.resolution_attempted or ""
    text = email_pattern.sub("[REDACTED_EMAIL]", text)
    text = uuid_pattern.sub("[REDACTED_ID]", text)

Semantic Sanitization:

Optional LLM-based rewriting
Remove business-specific context
Preserve technical patterns

4. Audit Logging

Structured Logging:

# Log all federation requests
logger.info(
    "federation_request",
    extra={
        "action": "list_agents",
        "client_ip": client_ip,
        "limit": limit,
        "offset": offset,
        "timestamp": datetime.now(UTC).isoformat()
    }
)

# Log successful responses
logger.info(
    "federation_response",
    extra={
        "action": "list_agents_success",
        "client_ip": client_ip,
        "agent_count": len(results),
        "total_available": total,
        "timestamp": datetime.now(UTC).isoformat()
    }
)

# Log failures
logger.warning(
    "federation_request_failed",
    extra={
        "action": "get_bundle_not_found",
        "client_ip": client_ip,
        "template_id": template_id,
        "timestamp": datetime.now(UTC).isoformat()
    }
)

Log Fields:

Action type
Client IP
Timestamp
Request parameters
Response status
Error details

5. Rate Limiting

Implementation:

Redis-based sliding window
Per-IP tracking
Graceful degradation
Configurable limits

Bypass Prevention:

Client IP detection (via request.client.host)
No IP spoofing (TCP layer)
No user-agent based bypass

6. Input Validation

Parameter Validation:

# Validate UUID format
if not is_valid_uuid(template_id):
    raise HTTPException(status_code=400, detail="Invalid template ID")

# Validate pagination limits
if limit < 1 or limit > 100:
    raise HTTPException(status_code=400, detail="Limit must be between 1 and 100")

# Validate offset
if offset < 0:
    raise HTTPException(status_code=400, detail="Offset must be non-negative")

SQL Injection Prevention:

SQLAlchemy ORM parameterized queries
No raw SQL with user input
Type-safe query building

Security Best Practices

For Instance Operators

Key Management
- Store keys in environment variables
- Use secret management systems (Vault, AWS Secrets Manager)
- Rotate keys quarterly
- Never commit keys to git
- Use different keys per environment
Network Security
- Use HTTPS in production
- Configure firewall rules
- Monitor access logs
- Set up intrusion detection
Access Control
- Limit federation to trusted instances
- IP whitelisting (optional)
- VPN requirements (optional)
- Regular audits
Monitoring
- Track federation request volume
- Alert on unusual patterns
- Monitor rate limit violations
- Review authentication failures

For Developers

API Key Handling

# Good: Environment variable
API_KEY = os.environ.get("FEDERATION_API_KEY")

# Bad: Hardcoded
API_KEY = "sk-federation-secret-key-123"

Error Handling

# Good: Generic error messages
if not valid_key:
    raise HTTPException(status_code=403, detail="Invalid Federation Key")

# Bad: Information leakage
if not valid_key:
    raise HTTPException(status_code=403, detail=f"Key {key} not in {keys}")

Logging

# Good: Structured logging
logger.info("federation_request", extra={"action": "list_agents"})

# Bad: Sensitive data in logs
logger.info(f"Request with key: {api_key}")

Error Handling

Error Response Format

{
  "detail": "Human-readable error message",
  "code": "ERROR_CODE",
  "timestamp": "2024-02-19T00:00:00Z"
}

Common Errors

Code	Status	Description
`MISSING_API_KEY`	401	X-Federation-Key header missing
`INVALID_API_KEY`	403	Federation key invalid or revoked
`RATE_LIMIT_EXCEEDED`	429	Request rate limit exceeded
`TEMPLATE_NOT_FOUND`	404	Agent template not found
`INVALID_PARAMETER`	400	Request parameter invalid
`SERVER_ERROR`	500	Internal server error

Retry Strategy

429 Too Many Requests:

Exponential backoff: 1s, 2s, 4s, 8s, 16s
Maximum retries: 5
Use Retry-After header if present

5xx Server Errors:

Exponential backoff with jitter
Maximum retries: 3
Alert monitoring system

4xx Client Errors:

Do not retry
Fix request parameters
Verify authentication

Cross-Instance Communication

Discovery Pattern

Installation Flow

Federation Source Tracking

Installation Record:

installation = AgentInstallation(
    tenant_id=tenant_id,
    template_id=template.id,
    instantiated_agent_id=new_agent.id,
    installed_version=template.version,
    federation_source="remote-instance.com",  # Track source
    federation_timestamp=datetime.now(UTC)
)

Benefits:

Trace installation origins
Monitor federation usage
Identify popular templates
Detect abuse patterns

Performance Optimization

Caching Strategy

Redis Cache:

# Cache agent lists (5 minutes)
cache_key = f"federation:agents:{limit}:{offset}"
cached = cache_service.get(cache_key)

if not cached:
    agents = fetch_from_db(limit, offset)
    cache_service.set(cache_key, agents, ttl=300)

Cache Invalidation:

On template approval
On template update
On template deletion
Time-based expiry

Database Optimization

Indexes:

-- Federation queries
CREATE INDEX idx_agent_templates_federation
ON agent_templates(is_public, is_approved, created_at DESC)
WHERE is_public = true AND is_approved = true;

-- Covering index for list endpoint
CREATE INDEX idx_agent_templates_list
ON agent_templates(is_public, is_approved, rating, installs)
WHERE is_public = true AND is_approved = true;

Query Optimization:

# Efficient pagination
query = (
    db.query(AgentTemplate.id, AgentTemplate.name, AgentTemplate.description, AgentTemplate.category, AgentTemplate.price, AgentTemplate.version, AgentTemplate.rating)
    .filter(and_(AgentTemplate.is_public == True, AgentTemplate.is_approved == True))
    .order_by(AgentTemplate.created_at.desc())
    .limit(limit)
    .offset(offset)
)

Bandwidth Optimization

Response Compression:

Gzip compression enabled
Typical compression ratio: 5-10x
Reduces bandwidth usage significantly

Field Selection:

# List endpoint: Minimal fields
agents = query.with_entities(AgentTemplate.id, AgentTemplate.name, ...)

# Bundle endpoint: Full fields
bundle = query.with_entities(AgentTemplate.*)

Monitoring & Observability

Key Metrics

Request Metrics:

Request rate (per endpoint)
Response times (p50, p95, p99)
Error rate (by status code)
Rate limit violations

Business Metrics:

Templates downloaded
Unique instances federating
Popular templates
Federation sources

Security Metrics:

Authentication failures
Invalid API keys
Suspicious IP addresses
Rate limit violations

Alerting

Critical Alerts:

Error rate > 5%
Response time p95 > 1s
Authentication failures > 10/min
Rate limit violations > 100/min

Warning Alerts:

Error rate > 1%
Response time p95 > 500ms
Unusual request patterns
Cache hit rate < 80%

Logging

Log Levels:

INFO: Successful requests
WARNING: Authentication failures, rate limits
ERROR: Server errors, database failures
CRITICAL: Security incidents

Log Retention:

30 days for INFO/WARNING
90 days for ERROR/CRITICAL
Archive for security incidents

Testing

Unit Tests

Authentication:

def test_valid_federation_key():
    """Test valid federation key is accepted."""
    response = client.get(
        "/api/federation/agents",
        headers={"X-Federation-Key": "valid-key"}
    )
    assert response.status_code == 200

Rate Limiting:

def test_rate_limit_exceeded():
    """Test rate limit enforcement."""
    for _ in range(101):
        response = client.get(
            "/api/federation/agents",
            headers={"X-Federation-Key": "valid-key"}
        )
    assert response.status_code == 429

Integration Tests

Cross-Instance:

def test_federation_workflow():
    """Test complete federation workflow."""
    # Instance A: Publish agent
    template_id = publish_agent(instance_a)

    # Instance B: Discover agent
    agents = list_agents(instance_b)
    assert template_id in [a["id"] for a in agents]

    # Instance B: Download and install
    bundle = get_bundle(instance_b, template_id)
    installed = install_bundle(bundle)
    assert installed["status"] == "installed"

Security Tests

Authentication Bypass:

def test_no_auth_key():
    """Test request without API key is rejected."""
    response = client.get("/api/federation/agents")
    assert response.status_code == 401

def test_invalid_auth_key():
    """Test invalid API key is rejected."""
    response = client.get(
        "/api/federation/agents",
        headers={"X-Federation-Key": "invalid-key"}
    )
    assert response.status_code == 403

SQL Injection:

def test_sql_injection_prevention():
    """Test SQL injection attempts are blocked."""
    response = client.get(
        "/api/federation/agents",
        params={"limit": "1; DROP TABLE agents; --"},
        headers={"X-Federation-Key": "valid-key"}
    )
    assert response.status_code == 400

Best Practices

For Federation Consumers

Respect Rate Limits
- Implement exponential backoff
- Cache responses locally
- Use pagination for large lists
Secure API Keys
- Store in environment variables
- Never commit to git
- Rotate regularly
Monitor Usage
- Track request volume
- Alert on unusual patterns
- Review error rates
Handle Errors Gracefully
- Implement retry logic
- Log errors for debugging
- Provide user feedback

For Federation Providers

Validate All Input
- Parameter type checking
- Range validation
- SQL injection prevention
Log Everything
- Structured logging
- Include context
- Preserve sensitive data
Monitor Performance
- Response times
- Error rates
- Resource usage
Plan for Scale
- Horizontal scaling
- Database sharding
- Cache optimization

Troubleshooting

Common Issues

1. 403 Forbidden: Invalid Federation Key

Cause: Incorrect or missing API key
Solution: Verify X-Federation-Key header matches environment variable

2. 429 Too Many Requests

Cause: Rate limit exceeded
Solution: Implement exponential backoff, check Retry-After header

3. 404 Not Found: Agent Template

Cause: Template not public, not approved, or doesn't exist
Solution: Verify template status, check template_id

4. Slow Response Times

Cause: Database query performance, cache miss
Solution: Check database indexes, cache hit rates, query optimization

5. High Memory Usage

Cause: Large bundles, memory leaks
Solution: Implement pagination, monitor memory, profile code

Debug Mode

Enable Debug Logging:

import logging
logging.getLogger("atom").setLevel(logging.DEBUG)

Check Federation Status:

# Test federation endpoint
curl -v -X GET "https://atom-saas.fly.dev/api/federation/agents?limit=1" \
  -H "X-Federation-Key: your-key"

# Check rate limit
curl -I -X GET "https://atom-saas.fly.dev/api/federation/agents" \
  -H "X-Federation-Key: your-key"

Future Enhancements

Planned Features

Webhook Federation
- Push notifications for new templates
- Real-time updates
- Event-driven synchronization
Peer Discovery
- Automatic instance discovery
- Peer-to-peer federation
- Mesh networking
Reputation Scoring
- Instance reputation scores
- Quality metrics
- Trust networks
Version Management
- Template versioning
- Migration guides
- Backward compatibility
Advanced Security
- Mutual TLS
- IP whitelisting
- VPN requirements
- Signature verification

Federation Protocol

Overview

Architecture Diagram

Protocol Specification

Version

Base URL

Authentication

API Key Authentication

Rate Limiting

API Endpoints

1. List Federated Agents

2. Get Agent Bundle

Security Model

Threat Model

Security Measures

1. Authentication & Authorization

2. Multi-Tenant Isolation

3. Data Anonymization

4. Audit Logging

5. Rate Limiting

6. Input Validation

Security Best Practices

For Instance Operators

For Developers

Error Handling

Error Response Format

Common Errors

Retry Strategy

Cross-Instance Communication

Discovery Pattern

Installation Flow

Federation Source Tracking

Performance Optimization

Caching Strategy

Database Optimization

Bandwidth Optimization

Monitoring & Observability

Key Metrics

Alerting

Logging

Testing

Unit Tests

Integration Tests

Security Tests

Best Practices

For Federation Consumers

For Federation Providers

Troubleshooting

Common Issues

Debug Mode

Future Enhancements

Planned Features

References