World Model - Long-Term Memory System
The World Model is ATOM's long-term memory system that enables agents to remember, recall, and learn from every experience across their entire lifetime.
Overview
The World Model implements a sophisticated semantic memory system that:
- Stores Experiences: Records every agent execution with full context
- Semantic Recall: Retrieves experiences by meaning, not just keywords
- Feedback-Aware: Prioritizes highly-rated experiences with RLHF
- Context-Rich: Includes canvas context and artifacts
- Hybrid Search: Combines vector similarity with metadata filters
Location: src/lib/ai/world-model.ts, backend-saas/core/agent_world_model.py
Architecture
Core Algorithm
Semantic Memory Recall
ALGORITHM: Semantic Memory Recall
INPUT: query, agent_role, agent_id, limit=5, filters={}
OUTPUT: relevant_experiences
1. QUERY EMBEDDING
============================================================================
Convert natural language query into vector representation.
INPUT: query (string)
OUTPUT: query_vector (float[])
STEPS:
a) Preprocess query
- Clean whitespace and special characters
- Extract key entities (names, dates, IDs)
- Normalize terminology
b) Generate embedding
- Use embedding model (OpenAI text-embedding-3-small)
- Dimensions: 1536
- query_vector = embed(query)
c) Extract entities
- Parse query for named entities
- Extract canvas references (canvas-123, artifact-456)
- Extract agent references
- Extract date ranges
RETURN query_vector, entities
2. HYBRID SEARCH
============================================================================
Perform vector similarity search with metadata filtering.
INPUT: query_vector, entities, filters
OUTPUT: candidate_memories
STEPS:
a) Vector Search (LanceDB)
- Query LanceDB with query_vector
- Get top 100 candidates by semantic similarity
- similarity_score = cosine_similarity(query, candidate)
b) Metadata Filtering
FOR each candidate IN candidates:
- Check agent_role filter:
IF filters.agent_role AND candidate.role != filters.agent_role:
REMOVE candidate
- Check canvas context filter:
IF filters.canvas_ids AND candidate.canvas_id NOT IN filters.canvas_ids:
REMOVE candidate
- Check date range filter:
IF filters.date_range AND candidate.date NOT IN filters.date_range:
REMOVE candidate
- Check success filter:
IF filters.success_only AND candidate.outcome != 'success':
REMOVE candidate
c) Apply Feedback Filter
FOR each candidate IN candidates:
- Get feedback scores for candidate.episode_id
- avg_feedback = average(feedback_scores)
- IF filters.min_feedback_score AND avg_feedback < filters.min_feedback:
REMOVE candidate
RETURN candidates
3. RELEVANCE SCORING
============================================================================
Calculate final relevance score for each candidate memory.
INPUT: candidates, query_vector
OUTPUT: scored_memories
STEPS:
FOR each candidate IN candidates:
a) Base semantic similarity
- semantic_score = cosine_similarity(query_vector, candidate.embedding)
- Range: 0-1 (1 = perfect match)
b) Recency boost
- days_since = (now - candidate.timestamp) / 86400
- IF days_since < 7:
- recency_boost = +0.1
- ELSE IF days_since < 30:
- recency_boost = +0.05
- ELSE:
- recency_boost = 0.0
c) Feedback adjustment
- avg_feedback = average(candidate.feedback_scores)
- IF avg_feedback > 0.5 (positive):
- feedback_boost = (avg_feedback - 0.5) * 0.4 # Max +0.2
- ELSE IF avg_feedback < 0.5 (negative):
- feedback_penalty = (0.5 - avg_feedback) * 0.6 # Max -0.3
- ELSE:
- feedback_adjustment = 0.0
d) Confidence weighting
- confidence_factor = candidate.confidence_score * 0.1
- Range: 0-0.1
e) Calculate final score
- final_score = (
semantic_score * 0.6 +
recency_boost +
feedback_adjustment +
confidence_factor
)
- Clamp to [0, 1] range
RETURN scored_memories
4. DIVERSITY PENALTY
============================================================================
Reduce redundancy by penalizing very similar memories.
INPUT: scored_memories
OUTPUT: diverse_memories
STEPS:
a) Sort by final_score descending
b) FOR each memory at index i:
FOR each previous_memory at index j < i:
- similarity = cosine_similarity(
memory.embedding,
previous_memory.embedding
)
- IF similarity > 0.9 (very similar):
- diversity_penalty = (similarity - 0.9) * 0.5
- memory.final_score -= diversity_penalty
c) Re-sort by adjusted final_score
RETURN diverse_memories
5. RANKING & SELECTION
============================================================================
Select top-K most relevant and diverse memories.
INPUT: diverse_memories, limit
OUTPUT: top_k_experiences
STEPS:
a) Sort by final_score descending
b) Select top limit memories
c) Format for consumption
FOR each memory IN top_k:
experience = {
episode_id: memory.episode_id,
task_type: memory.task_type,
task_description: memory.task_description,
input_summary: memory.input_summary,
outcome: memory.outcome,
learnings: memory.learnings,
confidence: memory.confidence_score,
timestamp: memory.timestamp,
relevance_score: memory.final_score,
# Canvas context
canvas_id: memory.canvas_id,
canvas_name: memory.canvas_name,
canvas_actions: memory.canvas_action_ids,
# Feedback metadata
feedback_score: memory.avg_feedback,
feedback_count: memory.feedback_count
}
RETURN top_k_experiences
MAIN RETURN relevant_experiences
Experience Recording
ALGORITHM: Record Experience
INPUT: tenant_id, agent_id, experience_data
OUTPUT: episode_id
1. VALIDATE INPUT
- Required fields present
- Data types correct
- Tenant context valid
2. CREATE EPISODE
episode = {
id: generate_uuid(),
tenant_id: tenant_id,
agent_id: agent_id,
# Task information
task_type: experience_data.task_type,
task_description: experience_data.task_description,
input_summary: summarize(experience_data.input),
# Execution details
reasoning_chain: experience_data.reasoning_chain,
approach_taken: experience_data.approach,
actions_taken: experience_data.actions,
# Outcome
outcome: experience_data.outcome, # success/failure
success: experience_data.success,
error_message: experience_data.error,
# Confidence & learning
confidence_score: experience_data.confidence,
constitutional_violations: experience_data.violations || [],
human_intervention_required: experience_data.intervention,
# Learnings
learnings: experience_data.learnings,
metacognitive_insights: experience_data.metacognitive,
# Metadata
timestamp: now(),
agent_role: experience_data.agent_role,
maturity_level: experience_data.maturity_level,
# Canvas context
canvas_id: experience_data.canvas_id,
canvas_action_ids: experience_data.canvas_action_ids || []
}
3. STORE IN POSTGRESQL
INSERT INTO episodes VALUES (episode)
4. GENERATE EMBEDDING
# Create searchable text
searchable_text = """
{episode.task_type}
{episode.task_description}
{episode.input_summary}
{episode.outcome}
{episode.learnings}
""".strip()
# Generate embedding
embedding = embed(searchable_text)
5. INDEX IN LANCEDB
lance_record = {
episode_id: episode.id,
tenant_id: episode.tenant_id,
agent_id: episode.agent_id,
agent_role: episode.agent_role,
# Vector
embedding: embedding,
# Metadata
task_type: episode.task_type,
outcome: episode.outcome,
success: episode.success,
confidence: episode.confidence_score,
timestamp: episode.timestamp,
# Canvas context
canvas_id: episode.canvas_id,
canvas_action_ids: episode.canvas_action_ids
}
INSERT INTO lancedb TABLE episodes VALUES (lance_record)
6. RETURN episode_id
Feedback-Aware Recall
The World Model integrates Reinforcement Learning from Human Feedback (RLHF) to prioritize successful experiences:
ALGORITHM: Feedback-Aware Recall
INPUT: query, agent_role, min_feedback_score=0.5, limit=5
OUTPUT: highly_rated_experiences
1. STANDARD RECALL
- Execute standard recall algorithm
- Get top 20 candidates by semantic similarity
2. FILTER BY FEEDBACK
FOR each candidate:
- Fetch all feedback for candidate.episode_id
- Calculate average feedback score
- IF avg_feedback < min_feedback_score:
- REMOVE candidate
3. BOOST POSITIVE FEEDBACK
FOR each candidate with avg_feedback > 0.7:
- final_score += 0.2 # Significant boost
4. PENALIZE NEGATIVE FEEDBACK
FOR each candidate with avg_feedback < 0.3:
- final_score -= 0.3 # Significant penalty
5. RETURN TOP-K
- Sort by adjusted score
- Return top limit results
EXAMPLE USAGE:
# Recall only successful experiences with positive feedback
experiences = await world_model.recall_experiences(
query="Reconcile inventory discrepancies",
agent_role="Finance",
min_feedback_score=0.7, # Only highly-rated
limit=5
)
# Returns experiences that:
# - Semantically match inventory reconciliation
# - Have avg feedback >= 0.7
# - Are ranked by feedback-adjusted relevance
Canvas Context Integration
Episodes automatically capture Canvas workspace context for complete state awareness:
ALGORITHM: Canvas-Aware Episode Creation
INPUT: execution_data, canvas_context
OUTPUT: episode_with_canvas
1. EXTRACT CANVAS METADATA
canvas_metadata = {
canvas_id: canvas_context.id,
canvas_name: canvas_context.name,
canvas_snapshot: canvas_context.serialize(),
# Track all canvas actions
canvas_action_ids: [
action.id
FOR action IN canvas_context.actions_during_execution
]
}
2. CREATE EPISODE
episode = {
# ... standard fields ...
canvas_id: canvas_metadata.canvas_id,
canvas_action_ids: canvas_metadata.canvas_action_ids
}
3. STORE CANVAS AUDIT
FOR each action IN canvas_context.actions_during_execution:
canvas_audit = {
id: generate_uuid(),
canvas_id: canvas_context.id,
action_type: action.type,
component_id: action.component_id,
agent_id: execution_data.agent_id,
episode_id: episode.id,
timestamp: action.timestamp,
details: action.details
}
INSERT INTO canvas_audits VALUES (canvas_audit)
4. CREATE CANVAS-AWARE RECALL
# When recalling episodes, include full canvas context
experiences = await world_model.recall_experiences(
query="Create data visualization",
agent_role="Developer",
include_canvas_context=true
)
# Each experience includes:
# - Full canvas snapshot at time of execution
# - List of canvas actions taken
# - Component states and configurations
Data Structures
AgentExperience
interface AgentExperience { // Identity episode_id: string; tenant_id: string; agent_id: string; agent_role: string; // Task task_type: string; task_description: string; input_summary: string; // Execution reasoning_chain: ReasoningChain; approach_taken: string; actions_taken: string[]; // Outcome outcome: 'success' | 'failure'; success: boolean; error_message?: string; // Learning confidence_score: number; constitutional_violations: string[]; human_intervention_required: boolean; learnings: string[]; metacognitive_insights: MetacognitiveInsights; // Metadata timestamp: Date; maturity_level: MaturityLevel; // Canvas Context canvas_id?: string; canvas_action_ids?: string[]; }
RecallOptions
interface RecallOptions { // Search parameters limit?: number; // Max results (default: 5) min_similarity?: number; // Min semantic score (default: 0.6) // Filters agent_role?: string; // Filter by agent role canvas_ids?: string[]; // Filter by canvas date_range?: { start: Date; end: Date }; success_only?: boolean; // Only successful outcomes // Feedback min_feedback_score?: number; // Min avg feedback (default: none) feedback_category?: string; // Filter by feedback type // Context include_canvas_context?: boolean; // Include full canvas state }
EpisodeFeedback
interface EpisodeFeedback { id: string; episode_id: string; tenant_id: string; // Feedback data feedback_score: number; // -1.0 to 1.0 feedback_notes?: string; // Optional detailed feedback feedback_category?: string; // accuracy, helpfulness, safety, etc. // Metadata created_by: string; created_at: Date; }
Integration Points
Cognitive Architecture Integration
// Cognitive Architecture recalls experiences for reasoning const worldModel = new WorldModelService(db); const experiences = await worldModel.recallExperiences( tenantId, 'Finance', 'Reconcile inventory discrepancies', 5 ); // Experiences inform cognitive reasoning const reasoning = await cognitive.reason(tenantId, agentId, task, { past_experiences: experiences });
Learning Engine Integration
// Learning Engine records experiences to World Model await learning.recordExperience(tenantId, { agent_id: agentId, task_type: 'finance_reconciliation', task_description: 'Reconcile SKU-123', input_summary: 'Inventory discrepancy found', outcome: 'success', success: true, confidence: 0.9, learnings: ['Use weighted average for costing'], metacognitive_insights: metacognition }); // World Model stores for future recall
Graduation Exam Integration
// Graduation Exam queries episodes for readiness calculation const episodes = await worldModel.getAgentEpisodes( agentId, { limit: 30 } ); // Episodes used for: // - Zero-intervention ratio calculation // - Constitutional compliance tracking // - Confidence score aggregation // - Success rate calculation
Example Usage
Basic Recall
import { WorldModelService } from '@/lib/ai/world-model'; const worldModel = new WorldModelService(db); // Recall relevant experiences const experiences = await worldModel.recallExperiences( tenantId, 'Finance', 'Reconcile inventory discrepancies', 5 ); console.log('Relevant experiences:', experiences); // Each experience includes: // - Task description and outcome // - Approach taken and learnings // - Confidence score // - Canvas context (if applicable) // - Feedback score
Feedback-Aware Recall
// Only recall highly-rated experiences const positiveExperiences = await worldModel.recallExperiences( tenantId, 'Finance', 'Reconcile inventory', 5, { min_feedback_score: 0.7 // Only positive feedback } ); // Positive feedback gets +0.2 boost in relevance
Canvas-Context Recall
// Recall experiences from specific canvas const canvasExperiences = await worldModel.recallExperiences( tenantId, 'Developer', 'Create data visualization', 5, { canvas_ids: ['canvas-123'], include_canvas_context: true } ); // Each experience includes full canvas state
Record Experience
// Record new experience after agent execution const episodeId = await worldModel.recordExperience(tenantId, { agent_id: agentId, agent_role: 'Finance', task_type: 'reconciliation', task_description: 'Reconcile SKU-123 inventory', input_summary: 'Discrepancy between physical and system count', reasoning_chain: reasoning, approach_taken: 'Weighted average costing', actions_taken: ['Query ERP', 'Compare counts', 'Adjust records'], outcome: 'success', success: true, confidence: 0.9, constitutional_violations: [], human_intervention_required: false, learnings: [ 'Weighted average minimizes variance', 'Physical count accuracy critical' ], metacognitive_insights: metacognition, canvas_id: 'canvas-456', canvas_action_ids: ['action-1', 'action-2'] }); console.log('Episode recorded:', episodeId);
Submit Feedback
import { episodeFeedbackService } from '@/lib/ai/episodic-memory'; // Submit feedback for episode (RLHF) const feedback = await episodeFeedbackService.submitFeedback( episodeId, 0.8, // Strongly positive 'Excellent reconciliation! Very accurate.', 'accuracy' ); // Future recalls prioritize this experience
Performance Characteristics
Storage
- PostgreSQL: Episode records (immediate write)
- LanceDB: Vector index (background indexing)
- Latency: < 100ms per episode
Recall
- Vector Search: O(log n) with LanceDB
- Metadata Filtering: O(k) where k = candidates
- Total Latency: < 500ms for top-5 results
Scalability
- PostgreSQL: 10M+ episodes (with proper indexing)
- LanceDB: 100M+ vectors (with partitioning)
- Recall Performance: Constant with index size
Configuration
interface WorldModelConfig { // Embedding embedding_model: string; // Default: text-embedding-3-small embedding_dimensions: number; // Default: 1536 // Recall default_recall_limit: number; // Default: 5 min_similarity_threshold: number; // Default: 0.6 // Scoring recency_boost_days: number; // Default: 7 recency_boost_amount: number; // Default: 0.1 feedback_boost_max: number; // Default: 0.2 feedback_penalty_max: number; // Default: -0.3 // Diversity diversity_penalty_threshold: number; // Default: 0.9 diversity_penalty_factor: number; // Default: 0.5 // Storage lancedb_path: string; // Default: ./data/lancedb postgres_connection_pool: number; // Default: 20 }
Troubleshooting
Poor Recall Quality
Symptom: Retrieved experiences not relevant
Diagnosis:
- Check query clarity and specificity
- Verify embedding model working correctly
- Review similarity scores
Fix:
- Improve query specificity
- Adjust
min_similarity_threshold - Increase
limitto get more candidates
No Experiences Returned
Symptom: Recall returns empty results
Diagnosis:
- Check if agent has any recorded episodes
- Verify filters not too restrictive
- Check feedback score threshold
Fix:
- Remove filters gradually
- Lower
min_feedback_score - Remove date range restrictions
Slow Recall Performance
Symptom: Recall takes > 1 second
Diagnosis:
- Check LanceDB index size
- Verify PostgreSQL query performance
- Review network latency
Fix:
- Increase LanceDB cache size
- Add PostgreSQL indexes
- Use read replica for queries
References
- Implementation:
src/lib/ai/world-model.ts,backend-saas/core/agent_world_model.py - Tests:
src/lib/ai/__tests__/world-model.test.ts - Related: Cognitive Architecture, Learning Engine, Graduation Exam
Last Updated: 2025-02-06 Version: 8.0 Status: Production Ready