ATOM Documentation

← Back to App

BYOK Key Fix Deployment Summary

**Date:** 2026-05-01

**Deployment:** atom-saas version with pre-flight API key validation

**Image:** registry.fly.io/atom-saas:deployment-01KQHXM3XJ9JY595H6NG0GXADJ

Problem Solved

Backfill jobs were failing during LLM extraction because:

  1. **Wrong key being used**: System was falling back to global DeepSeek key ending in 6100 instead of tenant's key ending in 4207
  2. **tenant_id not resolved**: LLMService was initialized with workspace_id as tenant_id, so BYOK lookup failed
  3. **Silent failures**: Auth errors were caught and silently skipped, making debugging difficult

Commits Deployed

1. `6303c13f57` - Resolve tenant_id from workspace_id

**File:** backend-saas/core/graphrag_engine.py

# Before: Used workspace_id as tenant_id (WRONG)
llm = LLMService(db=self.db, workspace_id=workspace_id, tenant_id=workspace_id)

# After: Resolve tenant_id from Workspace table (CORRECT)
w = session.query(Workspace).filter(Workspace.id == workspace_id).first()
resolved_tenant_id = str(w.tenant_id) if w and w.tenant_id else workspace_id
llm = LLMService(db=self.db, workspace_id=workspace_id, tenant_id=resolved_tenant_id)

**Impact:** BYOK key lookup now uses correct tenant_id

2. `077bbdc115` - Remove dangerous global key fallback

**File:** backend-saas/core/byok_endpoints.py

# REMOVED: Dangerous fallback to global keys
fallback_key_id = f"{provider_id}_{key_name}_{environment}"  # e.g., "deepseek_default_production"
if fallback_key_id in self.api_keys:
    return decrypt(self.api_keys[fallback_key_id])  # ← REMOVED THIS LINE

**Impact:** No longer returns global keys when tenant key lookup fails

3. `a51e194459` - Raise AuthenticationError on auth failure

**File:** Multiple files in BYOKHandler

# Before: Silently skip on auth failure
except Exception as e:
    logger.warning(f"Provider {provider_id} failed: {e}")
    continue  # Try next provider

# After: Raise clear error
raise AuthenticationError(
    f"Failed to authenticate with {provider_id}: {auth_error}"
)

**Impact:** Clear error messages when API keys are invalid

4. `ab01dc1cb5` - Pre-flight API key check

**File:** backend-saas/core/historical_sync_service.py

# Check BEFORE fetching records (lines 624-648)
openai_key = db.query(TenantSetting).filter(
    TenantSetting.tenant_id == tenant_id,
    TenantSetting.setting_key == "OPENAI_API_KEY"
).first()

has_openai_key = (
    openai_key
    and openai_key.setting_value
    and not openai_key.setting_value.startswith("mock")
)

can_use_graphrag = has_graphrag_access and has_openai_key

if not can_use_graphrag:
    logger.warning(
        f"Skipping backfill: Tenant {tenant_id} has no valid API key. "
        f"Add key in Settings or skip GraphRAG extraction."
    )
    return  # Stop immediately, don't fetch records

**Impact:**

  • **Faster feedback:** Job stops immediately if no API key (vs fetching all records then failing)
  • **Clearer logs:** Explicit warning about missing API key
  • **No wasted resources:** Doesn't fetch emails that can't be processed

Expected Behavior

Scenario 1: Tenant with Valid API Key (Brennan)

  • ✅ Checks tenant_settings for OPENAI_API_KEY
  • ✅ Finds key ending in 4207
  • ✅ Resolves tenant_id from workspace_id
  • ✅ Uses tenant's key (not global fallback)
  • ✅ LLM extraction succeeds
  • ✅ Entities and relationships extracted

Scenario 2: Tenant Without API Key

  • ✅ Checks tenant_settings for OPENAI_API_KEY
  • ✅ Key not found or invalid
  • ✅ Logs clear warning: "Skipping backfill: Tenant has no valid API key"
  • ✅ Stops immediately (doesn't fetch records)
  • ✅ Job marked as failed with clear reason

Scenario 3: Tenant with Invalid API Key

  • ✅ Checks tenant_settings for OPENAI_API_KEY
  • ✅ Key found but invalid (401 error)
  • ✅ Raises AuthenticationError with clear message
  • ✅ Job marked as failed with auth error details

Database State

**Brennan's tenant (verified):**

-- Tenant
SELECT id, subdomain, plan_type FROM tenants WHERE subdomain = 'brennan';
-- Result: 31c06fc4-db22-4740-83ea-48ac14f25810 | brennan | team

-- Workspace
SELECT id, tenant_id FROM workspaces WHERE tenant_id = '31c06fc4-db22-4740-83ea-48ac14f25810';
-- Result: 795c2ec9-b794-47ea-9aae-12c1c3d48589 | 31c06fc4-db22-4740-83ea-48ac14f25810

-- API Keys
SELECT setting_key, LENGTH(setting_value), RIGHT(setting_value, 8)
FROM tenant_settings
WHERE tenant_id = '31c06fc4-db22-4740-83ea-48ac14f25810'
  AND setting_key LIKE '%API_KEY%';
-- Results:
--   DEEPSEEK_API_KEY   | 35 | ...4f474207  ✅ CORRECT KEY
--   OPENAI_API_KEY     | 164| ...CQnbyPMA  ✅ CORRECT KEY
--   GOOGLE_API_KEY     | 39 | ...LLfokymk
--   MINIMAX_2_7_API_KEY| 126| ...7onMexq8

Testing

To verify the fix:

  1. **Trigger a backfill** for brennan tenant
  2. **Check logs** for: "Resolved tenant_id from workspace_id"
  3. **Check logs** for: "Using tenant's DEEPSEEK_API_KEY"
  4. **Verify** LLM extraction succeeds
  5. **Verify** entities and relationships are created

To test the pre-flight check:

  1. **Remove** the OPENAI_API_KEY from tenant_settings temporarily
  2. **Trigger** a backfill
  3. **Verify** job fails immediately with: "Skipping backfill: Tenant has no valid API key"
  4. **Verify** NO records are fetched (faster feedback)
  5. **Restore** the API key

Future Work

For **managed AI tenants** (platform provides keys):

  • LLMService will handle key management transparently
  • No code change needed
  • System will use platform keys instead of BYOK keys

The current fix ensures BYOK tenants' keys are found correctly without falling back to global keys.