Web Research Capabilities Investigation - Auto Dev Agents
**Investigation Date:** 2026-04-12
**Investigator:** Claude Code Analysis
**Focus:** Do autonomous development agents have access to web research capabilities?
---
Executive Summary
**YES** - Auto dev agents in the platform have access to web research capabilities through multiple integrated services. The platform provides a comprehensive web research infrastructure with governance controls, BYOK (Bring Your Own Key) support, and maturity-based access controls.
---
1. Existing Web Research Infrastructure
1.1 Browser Automation (Playwright)
**Location:** /backend-saas/browser_engine/driver.py, /backend-saas/tools/browser_tool.py
**Capabilities:**
- Full Playwright browser automation (Chromium, Firefox, WebKit)
- Headless and headed modes
- Multi-session management with automatic cleanup
- Chrome DevTools Protocol (CDP) support
- Remote browser connection support
**Key Features:**
# Browser session management
class BrowserSessionManager:
- create_session(user_id, agent_id, headless, browser_type)
- close_session(session_id)
- cleanup_expired_sessions()**Governance Integration:**
- Requires INTERN+ maturity level for browser_navigate actions
- Full audit trail via browser_audit table
- Agent execution tracking for all browser sessions
- Security: User/agent validation before session access
**Available Operations:**
browser_create_session()- Create isolated browser contextbrowser_navigate()- Navigate to URLsbrowser_screenshot()- Capture screenshots (base64 or file)browser_fill_form()- Fill and submit formsbrowser_click()- Click elementsbrowser_extract_text()- Extract page contentbrowser_execute_script()- Execute JavaScriptbrowser_get_page_info()- Get page metadata (title, URL, cookies)
---
1.2 Web Search APIs
1.2.1 Tavily Search (Primary)
**Location:** /backend-saas/integrations/internal_tools.py
**Tool:** web_search
**Capabilities:**
- Real-time web search via Tavily API
- BYOK support (tenant-specific keys)
- Fallback to platform-wide API key
- AI-optimized search results with answer synthesis
**Configuration:**
@tool_registry.tool(
name="web_search",
description="Search the web for real-time information using Tavily API.",
category="discovery",
)
async def web_search(query: str, workspace_id: str | None = None)**BYOK Priority:**
- Tenant-specific Tavily key (from Workspace settings)
- Platform-wide
TAVILY_API_KEYenvironment variable - Returns error if no key configured
1.2.2 BrightData Search (Enterprise)
**Location:** /backend-saas/integrations/brightdata_service.py
**Tools:**
brightdata_search- Geo-targeted web searchbrightdata_crawl- Large-scale web crawlingbrightdata_access- Direct site accessbrightdata_navigate- Multi-page navigation
**Capabilities:**
- Country-specific search results
- High-volume crawling
- BYOK support for BrightData API keys
- Enterprise-grade data collection
---
1.3 HTTP Client Infrastructure
**Location:** /backend-saas/core/http_client.py
**Capabilities:**
- Shared httpx async/sync clients
- HTTP/2 support
- Connection pooling
- Configurable timeouts and limits
- SSL verification
**Usage Pattern:**
from core.http_client import get_async_client
async def fetch_data(url: str):
client = get_async_client()
response = await client.get(url)
return response.json()**Configuration:**
HTTP_TIMEOUT- Default request timeout (30s)HTTP_MAX_CONNECTIONS- Max connections (100)HTTP_MAX_KEEPALIVE- Max keepalive connections (20)
---
1.4 Knowledge Query System
**Location:** /backend-saas/core/knowledge_query_endpoints.py
**Capabilities:**
- GraphRAG-based knowledge graph queries
- Local and global search modes
- LLM-synthesized answers
- Entity and relationship extraction
**API Endpoint:**
POST /api/knowledge/query
{
"query": "What are the key features of our platform?",
"user_id": "user-123",
"workspace_id": "workspace-456"
}**Integration:**
- LanceDB vector storage
- Semantic search over ingested documents
- Knowledge graph traversal
- Community summarization
---
2. Agent Tool Access
2.1 Core Tools (Available to All Agents)
**Location:** /backend-saas/core/generic_agent.py
CORE_TOOLS_NAMES = [
"mcp_tool_search", # Discover new tools
"save_business_fact", # Store learned information
"verify_citation", # Fact-checking
"ingest_knowledge_from_text", # Knowledge base ingestion
"request_human_intervention", # HITL support
"get_system_health" # System status
]2.2 Tool Registry System
**Location:** /backend-saas/tools/registry.py
**Categories:**
- **canvas** - UI and visualization tools
- **browser** - Web automation tools
- **device** - Device control tools
- **discovery** - Search and research tools
- **automation** - Workflow triggers
- **finance** - Financial operations
- **ecommerce** - B2B procurement
**Tool Discovery:**
# Agents can dynamically discover tools
tool_registry.discover_tools(tool_modules)
tool_registry.list_by_category("browser")
tool_registry.list_by_maturity("INTERN")2.3 MCP (Model Context Protocol) Integration
**Location:** /backend-saas/core/mcp_service.py
**Pre-configured Servers:**
google-search- Web search capabilitiesbrightdata- Enterprise scrapinglocal-tools- Internal tool registry
**Dynamic Tool Loading:**
# Agents can request tools from MCP servers
tools = await mcp_service.get_available_tools(
agent_id=agent_id,
workspace_id=workspace_id
)---
3. Package Whitelist (Network Access)
**Location:** /backend-saas/config/package_whitelist_default.yaml
Intern Level (Web Scraping Allowed)
intern:
- package: requests
version: ">=2.28.0,<3.0.0"
network_access: true
description: "HTTP library for making web requests"
- package: beautifulsoup4
version: ">=4.12.0,<5.0.0"
network_access: false
description: "HTML/XML parsing library"
- package: lxml
version: ">=4.9.0"
network_access: false
description: "XML processing library"
- package: html5lib
version: ">=1.1"
network_access: false
description: "HTML parser"**Key Point:** INTERN+ maturity agents can install and use web scraping packages.
---
4. Auto Dev Capability Gates
**Location:** /backend-saas/core/auto_dev_capability_gate.py
Maturity Requirements
| Capability | Required Maturity | Description |
|---|---|---|
auto_dev.memento_skills | INTERN | User approval required |
auto_dev.alpha_evolver | SUPERVISED | Results queued for review |
auto_dev.background_evolution | AUTONOMOUS | Background loops |
Daily Limits
| Capability | Default Limit | Configurable Via |
|---|---|---|
| Alpha evolver mutations | 10/day | Tenant settings |
| Memento skill candidates | 5/day | Tenant settings |
**Usage Check:**
if auto_dev_gate.can_use(agent_id, "auto_dev.alpha_evolver", tenant_id):
# Agent can use autonomous evolution
auto_dev_gate.record_usage(agent_id, capability, success=True)---
5. Coding & Deployment Agents
5.1 Coding Agent
**Location:** /backend-saas/core/agents/coding_agent.py
**Capabilities:**
- Code generation via LLM
- WorldModel integration (pattern recall)
- Episode tracking for learning
- Governance checks (SDLC enabled)
- Approval workflow for high-risk code
**No Direct Web Research:**
- Focuses on code generation from requirements
- Relies on WorldModel for patterns (not web)
- Could be extended to research APIs/libraries
5.2 Deployment Agent
**Location:** /backend-saas/core/agents/deployment_agent.py
**Capabilities:**
- Blue-green deployments
- Health monitoring
- Rollback operations
- Fly.io integration
**No Direct Web Research:**
- Focuses on deployment operations
- Uses DeploymentService for Fly.io API calls
- Could be extended to research deployment issues
---
6. Identified Gaps
6.1 Missing Capabilities
- **No Dedicated Research Agent**
- No agent specifically designed for web research
- Research is scattered across tools and services
- **Limited Research Workflow Orchestration**
- No multi-step research pipelines
- No automatic citation verification
- No source credibility scoring
- **No Research Result Storage**
- Web search results not cached
- No research history for agents
- No knowledge base integration for research findings
- **No API Documentation Discovery**
- Agents can't automatically discover API docs
- No integration with Swagger/OpenAPI specs
- Can't research library usage patterns
6.2 Security Considerations
- **CORS Restrictions**
- Browser automation bypasses CORS
- HTTP client respects CORS
- No unified CORS handling
- **Rate Limiting**
- No built-in rate limiting for web requests
- Depends on external API limits
- No request throttling
- **Data Validation**
- No validation of scraped data
- No XSS protection in browser tools
- No sanitization of search results
---
7. Recommendations
7.1 Add Web Research to Auto Dev
**Priority: HIGH**
**Implementation:**
- **Create Research Agent**
- **Add Research Tools to Coding Agent**
async def research_api(self, api_name: str):
# Find official docs
# Get authentication details
# Find rate limits
```
- **Research Workflow Orchestration**
7.2 Enhance Existing Infrastructure
**Priority: MEDIUM**
- **Add Research to Package Whitelist**
- package: google-search-results
- package: wikipedia
- **Create Research Storage**
- **Add Citation Verification**
7.3 Governance & Safety
**Priority: HIGH**
- **Research Capability Gate**
- **Rate Limiting**
async def check_limit(self, agent_id: str) -> bool:
# Check Redis for rate limits
# Return True if under limit
```
- **Content Validation**
---
8. Implementation Roadmap
Phase 1: Foundation (1-2 weeks)
- [ ] Add
web_searchtool to all auto dev agents - [ ] Create research storage in LanceDB
- [ ] Add research capability gates
- [ ] Implement rate limiting
Phase 2: Research Agent (2-3 weeks)
- [ ] Create dedicated ResearchAgent class
- [ ] Implement multi-step research workflows
- [ ] Add citation verification
- [ ] Create research result storage
Phase 3: Integration (1-2 weeks)
- [ ] Integrate research into CodingAgent
- [ ] Integrate research into DeploymentAgent
- [ ] Add research to WorldModel
- [ ] Create research UI components
Phase 4: Enhancement (2-3 weeks)
- [ ] Add API documentation discovery
- [ ] Implement source credibility scoring
- [ ] Add research history and analytics
- [ ] Create research templates
---
9. Testing Strategy
9.1 Unit Tests
- Test web search tool with mocked API
- Test browser automation in isolation
- Test rate limiting logic
- Test citation verification
9.2 Integration Tests
- Test research agent with real APIs
- Test multi-step research workflows
- Test storage and retrieval
- Test governance enforcement
9.3 E2E Tests
- Test coding agent researching libraries
- Test deployment agent researching issues
- Test research result accuracy
- Test rate limiting in production
---
10. Conclusion
**Auto dev agents have partial web research capabilities** through:
- Browser automation (Playwright)
- Web search APIs (Tavily, BrightData)
- HTTP client infrastructure
- Tool registry with dynamic discovery
**Key gaps:**
- No dedicated research agent
- No research workflow orchestration
- No research result storage
- No citation verification
**Recommendation:**
Add web research as a first-class capability to auto dev agents, with proper governance, rate limiting, and content validation. This would significantly enhance agents' ability to:
- Research API documentation
- Find library usage examples
- Investigate deployment issues
- Learn new technologies autonomously
**Estimated Effort:** 6-10 weeks for full implementation
---
Appendix A: File References
Core Infrastructure
/backend-saas/browser_engine/driver.py- Browser automation/backend-saas/tools/browser_tool.py- Browser tool functions/backend-saas/core/http_client.py- HTTP client/backend-saas/tools/registry.py- Tool registry
Web Search
/backend-saas/integrations/internal_tools.py- Tavily search/backend-saas/integrations/brightdata_service.py- BrightData integration/backend-saas/core/mcp_service.py- MCP tool servers
Agents
/backend-saas/core/generic_agent.py- Generic agent runtime/backend-saas/core/agents/coding_agent.py- Code generation/backend-saas/core/agents/deployment_agent.py- Deployment
Governance
/backend-saas/core/auto_dev_capability_gate.py- Capability gates/backend-saas/config/package_whitelist_default.yaml- Package whitelist/backend-saas/core/agent_governance_service.py- Governance enforcement
Knowledge
/backend-saas/core/knowledge_query_endpoints.py- Knowledge queries/backend-saas/core/agent_world_model.py- WorldModel service
---
**Document Version:** 1.0
**Last Updated:** 2026-04-12
**Status:** Complete Investigation