ATOM Documentation

← Back to App

World Model - Long-Term Memory System

The World Model is ATOM's long-term memory system that enables agents to remember, recall, and learn from every experience across their entire lifetime.


Overview

The World Model implements a sophisticated semantic memory system that:

  • Stores Experiences: Records every agent execution with full context
  • Semantic Recall: Retrieves experiences by meaning, not just keywords
  • Feedback-Aware: Prioritizes highly-rated experiences with RLHF
  • Context-Rich: Includes canvas context and artifacts
  • Hybrid Search: Combines vector similarity with metadata filters

Location: src/lib/ai/world-model.ts, backend-saas/core/agent_world_model.py


Architecture


Core Algorithm

Semantic Memory Recall

ALGORITHM: Semantic Memory Recall

INPUT: query, agent_role, agent_id, limit=5, filters={}
OUTPUT: relevant_experiences

1. QUERY EMBEDDING
   ============================================================================
   Convert natural language query into vector representation.

   INPUT: query (string)
   OUTPUT: query_vector (float[])

   STEPS:
   a) Preprocess query
      - Clean whitespace and special characters
      - Extract key entities (names, dates, IDs)
      - Normalize terminology

   b) Generate embedding
      - Use embedding model (OpenAI text-embedding-3-small)
      - Dimensions: 1536
      - query_vector = embed(query)

   c) Extract entities
      - Parse query for named entities
      - Extract canvas references (canvas-123, artifact-456)
      - Extract agent references
      - Extract date ranges

   RETURN query_vector, entities


2. HYBRID SEARCH
   ============================================================================
   Perform vector similarity search with metadata filtering.

   INPUT: query_vector, entities, filters
   OUTPUT: candidate_memories

   STEPS:
   a) Vector Search (LanceDB)
      - Query LanceDB with query_vector
      - Get top 100 candidates by semantic similarity
      - similarity_score = cosine_similarity(query, candidate)

   b) Metadata Filtering
      FOR each candidate IN candidates:
        - Check agent_role filter:
          IF filters.agent_role AND candidate.role != filters.agent_role:
            REMOVE candidate

        - Check canvas context filter:
          IF filters.canvas_ids AND candidate.canvas_id NOT IN filters.canvas_ids:
            REMOVE candidate

        - Check date range filter:
          IF filters.date_range AND candidate.date NOT IN filters.date_range:
            REMOVE candidate

        - Check success filter:
          IF filters.success_only AND candidate.outcome != 'success':
            REMOVE candidate

   c) Apply Feedback Filter
      FOR each candidate IN candidates:
        - Get feedback scores for candidate.episode_id
        - avg_feedback = average(feedback_scores)

        - IF filters.min_feedback_score AND avg_feedback < filters.min_feedback:
          REMOVE candidate

   RETURN candidates


3. RELEVANCE SCORING
   ============================================================================
   Calculate final relevance score for each candidate memory.

   INPUT: candidates, query_vector
   OUTPUT: scored_memories

   STEPS:
   FOR each candidate IN candidates:
    a) Base semantic similarity
       - semantic_score = cosine_similarity(query_vector, candidate.embedding)
       - Range: 0-1 (1 = perfect match)

    b) Recency boost
       - days_since = (now - candidate.timestamp) / 86400
       - IF days_since < 7:
         - recency_boost = +0.1
       - ELSE IF days_since < 30:
         - recency_boost = +0.05
       - ELSE:
         - recency_boost = 0.0

    c) Feedback adjustment
       - avg_feedback = average(candidate.feedback_scores)
       - IF avg_feedback > 0.5 (positive):
         - feedback_boost = (avg_feedback - 0.5) * 0.4  # Max +0.2
       - ELSE IF avg_feedback < 0.5 (negative):
         - feedback_penalty = (0.5 - avg_feedback) * 0.6  # Max -0.3
       - ELSE:
         - feedback_adjustment = 0.0

    d) Confidence weighting
       - confidence_factor = candidate.confidence_score * 0.1
       - Range: 0-0.1

    e) Calculate final score
       - final_score = (
             semantic_score * 0.6 +
             recency_boost +
             feedback_adjustment +
             confidence_factor
           )

       - Clamp to [0, 1] range

   RETURN scored_memories


4. DIVERSITY PENALTY
   ============================================================================
   Reduce redundancy by penalizing very similar memories.

   INPUT: scored_memories
   OUTPUT: diverse_memories

   STEPS:
   a) Sort by final_score descending

   b) FOR each memory at index i:
     FOR each previous_memory at index j < i:
       - similarity = cosine_similarity(
           memory.embedding,
           previous_memory.embedding
         )

       - IF similarity > 0.9 (very similar):
         - diversity_penalty = (similarity - 0.9) * 0.5
         - memory.final_score -= diversity_penalty

   c) Re-sort by adjusted final_score

   RETURN diverse_memories


5. RANKING & SELECTION
   ============================================================================
   Select top-K most relevant and diverse memories.

   INPUT: diverse_memories, limit
   OUTPUT: top_k_experiences

   STEPS:
   a) Sort by final_score descending
   b) Select top limit memories
   c) Format for consumption

   FOR each memory IN top_k:
     experience = {
       episode_id: memory.episode_id,
       task_type: memory.task_type,
       task_description: memory.task_description,
       input_summary: memory.input_summary,
       outcome: memory.outcome,
       learnings: memory.learnings,
       confidence: memory.confidence_score,
       timestamp: memory.timestamp,
       relevance_score: memory.final_score,

       # Canvas context
       canvas_id: memory.canvas_id,
       canvas_name: memory.canvas_name,
       canvas_actions: memory.canvas_action_ids,

       # Feedback metadata
       feedback_score: memory.avg_feedback,
       feedback_count: memory.feedback_count
     }

   RETURN top_k_experiences


MAIN RETURN relevant_experiences

Experience Recording

ALGORITHM: Record Experience

INPUT: tenant_id, agent_id, experience_data
OUTPUT: episode_id

1. VALIDATE INPUT
   - Required fields present
   - Data types correct
   - Tenant context valid

2. CREATE EPISODE
   episode = {
     id: generate_uuid(),
     tenant_id: tenant_id,
     agent_id: agent_id,

     # Task information
     task_type: experience_data.task_type,
     task_description: experience_data.task_description,
     input_summary: summarize(experience_data.input),

     # Execution details
     reasoning_chain: experience_data.reasoning_chain,
     approach_taken: experience_data.approach,
     actions_taken: experience_data.actions,

     # Outcome
     outcome: experience_data.outcome,  # success/failure
     success: experience_data.success,
     error_message: experience_data.error,

     # Confidence & learning
     confidence_score: experience_data.confidence,
     constitutional_violations: experience_data.violations || [],
     human_intervention_required: experience_data.intervention,

     # Learnings
     learnings: experience_data.learnings,
     metacognitive_insights: experience_data.metacognitive,

     # Metadata
     timestamp: now(),
     agent_role: experience_data.agent_role,
     maturity_level: experience_data.maturity_level,

     # Canvas context
     canvas_id: experience_data.canvas_id,
     canvas_action_ids: experience_data.canvas_action_ids || []
   }

3. STORE IN POSTGRESQL
   INSERT INTO episodes VALUES (episode)

4. GENERATE EMBEDDING
   # Create searchable text
   searchable_text = """
     {episode.task_type}
     {episode.task_description}
     {episode.input_summary}
     {episode.outcome}
     {episode.learnings}
   """.strip()

   # Generate embedding
   embedding = embed(searchable_text)

5. INDEX IN LANCEDB
   lance_record = {
     episode_id: episode.id,
     tenant_id: episode.tenant_id,
     agent_id: episode.agent_id,
     agent_role: episode.agent_role,

     # Vector
     embedding: embedding,

     # Metadata
     task_type: episode.task_type,
     outcome: episode.outcome,
     success: episode.success,
     confidence: episode.confidence_score,
     timestamp: episode.timestamp,

     # Canvas context
     canvas_id: episode.canvas_id,
     canvas_action_ids: episode.canvas_action_ids
   }

   INSERT INTO lancedb TABLE episodes VALUES (lance_record)

6. RETURN episode_id



Feedback-Aware Recall

The World Model integrates Reinforcement Learning from Human Feedback (RLHF) to prioritize successful experiences:

ALGORITHM: Feedback-Aware Recall

INPUT: query, agent_role, min_feedback_score=0.5, limit=5
OUTPUT: highly_rated_experiences

1. STANDARD RECALL
   - Execute standard recall algorithm
   - Get top 20 candidates by semantic similarity

2. FILTER BY FEEDBACK
   FOR each candidate:
     - Fetch all feedback for candidate.episode_id
     - Calculate average feedback score
     - IF avg_feedback < min_feedback_score:
       - REMOVE candidate

3. BOOST POSITIVE FEEDBACK
   FOR each candidate with avg_feedback > 0.7:
     - final_score += 0.2  # Significant boost

4. PENALIZE NEGATIVE FEEDBACK
   FOR each candidate with avg_feedback < 0.3:
     - final_score -= 0.3  # Significant penalty

5. RETURN TOP-K
   - Sort by adjusted score
   - Return top limit results

EXAMPLE USAGE:
  # Recall only successful experiences with positive feedback
  experiences = await world_model.recall_experiences(
    query="Reconcile inventory discrepancies",
    agent_role="Finance",
    min_feedback_score=0.7,  # Only highly-rated
    limit=5
  )

  # Returns experiences that:
  # - Semantically match inventory reconciliation
  # - Have avg feedback >= 0.7
  # - Are ranked by feedback-adjusted relevance

Canvas Context Integration

Episodes automatically capture Canvas workspace context for complete state awareness:

ALGORITHM: Canvas-Aware Episode Creation

INPUT: execution_data, canvas_context
OUTPUT: episode_with_canvas

1. EXTRACT CANVAS METADATA
   canvas_metadata = {
     canvas_id: canvas_context.id,
     canvas_name: canvas_context.name,
     canvas_snapshot: canvas_context.serialize(),

     # Track all canvas actions
     canvas_action_ids: [
       action.id
       FOR action IN canvas_context.actions_during_execution
     ]
   }

2. CREATE EPISODE
   episode = {
     # ... standard fields ...
     canvas_id: canvas_metadata.canvas_id,
     canvas_action_ids: canvas_metadata.canvas_action_ids
   }

3. STORE CANVAS AUDIT
   FOR each action IN canvas_context.actions_during_execution:
     canvas_audit = {
       id: generate_uuid(),
       canvas_id: canvas_context.id,
       action_type: action.type,
       component_id: action.component_id,
       agent_id: execution_data.agent_id,
       episode_id: episode.id,
       timestamp: action.timestamp,
       details: action.details
     }

     INSERT INTO canvas_audits VALUES (canvas_audit)

4. CREATE CANVAS-AWARE RECALL
   # When recalling episodes, include full canvas context
   experiences = await world_model.recall_experiences(
     query="Create data visualization",
     agent_role="Developer",
     include_canvas_context=true
   )

   # Each experience includes:
   # - Full canvas snapshot at time of execution
   # - List of canvas actions taken
   # - Component states and configurations

Data Structures

AgentExperience

interface AgentExperience { // Identity episode_id: string; tenant_id: string; agent_id: string; agent_role: string; // Task task_type: string; task_description: string; input_summary: string; // Execution reasoning_chain: ReasoningChain; approach_taken: string; actions_taken: string[]; // Outcome outcome: 'success' | 'failure'; success: boolean; error_message?: string; // Learning confidence_score: number; constitutional_violations: string[]; human_intervention_required: boolean; learnings: string[]; metacognitive_insights: MetacognitiveInsights; // Metadata timestamp: Date; maturity_level: MaturityLevel; // Canvas Context canvas_id?: string; canvas_action_ids?: string[]; }

RecallOptions

interface RecallOptions { // Search parameters limit?: number; // Max results (default: 5) min_similarity?: number; // Min semantic score (default: 0.6) // Filters agent_role?: string; // Filter by agent role canvas_ids?: string[]; // Filter by canvas date_range?: { start: Date; end: Date }; success_only?: boolean; // Only successful outcomes // Feedback min_feedback_score?: number; // Min avg feedback (default: none) feedback_category?: string; // Filter by feedback type // Context include_canvas_context?: boolean; // Include full canvas state }

EpisodeFeedback

interface EpisodeFeedback { id: string; episode_id: string; tenant_id: string; // Feedback data feedback_score: number; // -1.0 to 1.0 feedback_notes?: string; // Optional detailed feedback feedback_category?: string; // accuracy, helpfulness, safety, etc. // Metadata created_by: string; created_at: Date; }

Integration Points

Cognitive Architecture Integration

// Cognitive Architecture recalls experiences for reasoning const worldModel = new WorldModelService(db); const experiences = await worldModel.recallExperiences( tenantId, 'Finance', 'Reconcile inventory discrepancies', 5 ); // Experiences inform cognitive reasoning const reasoning = await cognitive.reason(tenantId, agentId, task, { past_experiences: experiences });

Learning Engine Integration

// Learning Engine records experiences to World Model await learning.recordExperience(tenantId, { agent_id: agentId, task_type: 'finance_reconciliation', task_description: 'Reconcile SKU-123', input_summary: 'Inventory discrepancy found', outcome: 'success', success: true, confidence: 0.9, learnings: ['Use weighted average for costing'], metacognitive_insights: metacognition }); // World Model stores for future recall

Graduation Exam Integration

// Graduation Exam queries episodes for readiness calculation const episodes = await worldModel.getAgentEpisodes( agentId, { limit: 30 } ); // Episodes used for: // - Zero-intervention ratio calculation // - Constitutional compliance tracking // - Confidence score aggregation // - Success rate calculation

Example Usage

Basic Recall

import { WorldModelService } from '@/lib/ai/world-model'; const worldModel = new WorldModelService(db); // Recall relevant experiences const experiences = await worldModel.recallExperiences( tenantId, 'Finance', 'Reconcile inventory discrepancies', 5 ); console.log('Relevant experiences:', experiences); // Each experience includes: // - Task description and outcome // - Approach taken and learnings // - Confidence score // - Canvas context (if applicable) // - Feedback score

Feedback-Aware Recall

// Only recall highly-rated experiences const positiveExperiences = await worldModel.recallExperiences( tenantId, 'Finance', 'Reconcile inventory', 5, { min_feedback_score: 0.7 // Only positive feedback } ); // Positive feedback gets +0.2 boost in relevance

Canvas-Context Recall

// Recall experiences from specific canvas const canvasExperiences = await worldModel.recallExperiences( tenantId, 'Developer', 'Create data visualization', 5, { canvas_ids: ['canvas-123'], include_canvas_context: true } ); // Each experience includes full canvas state

Record Experience

// Record new experience after agent execution const episodeId = await worldModel.recordExperience(tenantId, { agent_id: agentId, agent_role: 'Finance', task_type: 'reconciliation', task_description: 'Reconcile SKU-123 inventory', input_summary: 'Discrepancy between physical and system count', reasoning_chain: reasoning, approach_taken: 'Weighted average costing', actions_taken: ['Query ERP', 'Compare counts', 'Adjust records'], outcome: 'success', success: true, confidence: 0.9, constitutional_violations: [], human_intervention_required: false, learnings: [ 'Weighted average minimizes variance', 'Physical count accuracy critical' ], metacognitive_insights: metacognition, canvas_id: 'canvas-456', canvas_action_ids: ['action-1', 'action-2'] }); console.log('Episode recorded:', episodeId);

Submit Feedback

import { episodeFeedbackService } from '@/lib/ai/episodic-memory'; // Submit feedback for episode (RLHF) const feedback = await episodeFeedbackService.submitFeedback( episodeId, 0.8, // Strongly positive 'Excellent reconciliation! Very accurate.', 'accuracy' ); // Future recalls prioritize this experience

Performance Characteristics

Storage

  • PostgreSQL: Episode records (immediate write)
  • LanceDB: Vector index (background indexing)
  • Latency: < 100ms per episode

Recall

  • Vector Search: O(log n) with LanceDB
  • Metadata Filtering: O(k) where k = candidates
  • Total Latency: < 500ms for top-5 results

Scalability

  • PostgreSQL: 10M+ episodes (with proper indexing)
  • LanceDB: 100M+ vectors (with partitioning)
  • Recall Performance: Constant with index size

Configuration

interface WorldModelConfig { // Embedding embedding_model: string; // Default: text-embedding-3-small embedding_dimensions: number; // Default: 1536 // Recall default_recall_limit: number; // Default: 5 min_similarity_threshold: number; // Default: 0.6 // Scoring recency_boost_days: number; // Default: 7 recency_boost_amount: number; // Default: 0.1 feedback_boost_max: number; // Default: 0.2 feedback_penalty_max: number; // Default: -0.3 // Diversity diversity_penalty_threshold: number; // Default: 0.9 diversity_penalty_factor: number; // Default: 0.5 // Storage lancedb_path: string; // Default: ./data/lancedb postgres_connection_pool: number; // Default: 20 }

Troubleshooting

Poor Recall Quality

Symptom: Retrieved experiences not relevant

Diagnosis:

  • Check query clarity and specificity
  • Verify embedding model working correctly
  • Review similarity scores

Fix:

  • Improve query specificity
  • Adjust min_similarity_threshold
  • Increase limit to get more candidates

No Experiences Returned

Symptom: Recall returns empty results

Diagnosis:

  • Check if agent has any recorded episodes
  • Verify filters not too restrictive
  • Check feedback score threshold

Fix:

  • Remove filters gradually
  • Lower min_feedback_score
  • Remove date range restrictions

Slow Recall Performance

Symptom: Recall takes > 1 second

Diagnosis:

  • Check LanceDB index size
  • Verify PostgreSQL query performance
  • Review network latency

Fix:

  • Increase LanceDB cache size
  • Add PostgreSQL indexes
  • Use read replica for queries

References


Last Updated: 2025-02-06 Version: 8.0 Status: Production Ready