Tonic commited on
Commit
40aa8de
·
unverified ·
2 Parent(s): 898cd37 3ab54ea

Merge pull request #1 from Josephrp/feature/iterative-deep-research-workflows

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .cursorrules +240 -0
  2. .env.example +3 -3
  3. .github/README.md +203 -0
  4. .github/workflows/ci.yml +47 -14
  5. .gitignore +3 -0
  6. .pre-commit-config.yaml +44 -1
  7. .pre-commit-hooks/run_pytest.ps1 +14 -0
  8. .pre-commit-hooks/run_pytest.sh +15 -0
  9. AGENTS.md +0 -118
  10. AGENTS.txt +236 -0
  11. CLAUDE.md +0 -111
  12. CONTRIBUTING.md +1 -0
  13. GEMINI.md +0 -98
  14. Makefile +9 -3
  15. README.md +98 -18
  16. docs/CONFIGURATION.md +301 -0
  17. docs/architecture/graph_orchestration.md +151 -0
  18. docs/examples/writer_agents_usage.md +425 -0
  19. docs/implementation/02_phase_search.md +31 -19
  20. examples/rate_limiting_demo.py +1 -1
  21. main.py +0 -6
  22. pyproject.toml +30 -1
  23. requirements.txt +2 -0
  24. src/agent_factory/agents.py +339 -0
  25. src/agent_factory/graph_builder.py +608 -0
  26. src/agent_factory/judges.py +21 -6
  27. src/agents/code_executor_agent.py +6 -8
  28. src/agents/input_parser.py +178 -0
  29. src/agents/judge_agent.py +1 -1
  30. src/agents/knowledge_gap.py +156 -0
  31. src/agents/long_writer.py +431 -0
  32. src/agents/magentic_agents.py +19 -26
  33. src/agents/proofreader.py +205 -0
  34. src/agents/retrieval_agent.py +8 -9
  35. src/agents/search_agent.py +1 -1
  36. src/agents/state.py +27 -5
  37. src/agents/thinking.py +148 -0
  38. src/agents/tool_selector.py +168 -0
  39. src/agents/writer.py +209 -0
  40. src/app.py +28 -18
  41. src/{orchestrator.py → legacy_orchestrator.py} +0 -0
  42. src/middleware/__init__.py +30 -1
  43. src/middleware/budget_tracker.py +390 -0
  44. src/middleware/state_machine.py +129 -0
  45. src/middleware/sub_iteration.py +1 -2
  46. src/middleware/workflow_manager.py +322 -0
  47. src/orchestrator/__init__.py +48 -0
  48. src/orchestrator/graph_orchestrator.py +974 -0
  49. src/orchestrator/planner_agent.py +184 -0
  50. src/orchestrator/research_flow.py +999 -0
.cursorrules ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepCritical Project - Cursor Rules
2
+
3
+ ## Project-Wide Rules
4
+
5
+ **Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
6
+
7
+ **Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
8
+
9
+ **Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
10
+
11
+ **Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
12
+
13
+ **Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
14
+
15
+ **Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
16
+
17
+ **Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
18
+
19
+ **Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
20
+
21
+ **Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
22
+
23
+ **State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
24
+
25
+ **Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
26
+
27
+ ---
28
+
29
+ ## src/agents/ - Agent Implementation Rules
30
+
31
+ **Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
32
+
33
+ **Agent Structure**:
34
+ - System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
35
+ - Agent class with `__init__(model: Any | None = None)`
36
+ - Main method (e.g., `async def evaluate()`, `async def write_report()`)
37
+ - Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
38
+
39
+ **Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
40
+
41
+ **Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
42
+
43
+ **Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
44
+
45
+ **Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
46
+
47
+ **Agent-Specific Rules**:
48
+ - `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
49
+ - `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
50
+ - `writer.py`: Returns markdown string. Includes citations in numbered format.
51
+ - `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
52
+ - `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
53
+ - `thinking.py`: Returns observation string from conversation history.
54
+ - `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
55
+
56
+ ---
57
+
58
+ ## src/tools/ - Search Tool Rules
59
+
60
+ **Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
61
+
62
+ **Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
63
+
64
+ **Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
65
+
66
+ **Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
67
+
68
+ **Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
69
+
70
+ **Tool-Specific Rules**:
71
+ - `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
72
+ - `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
73
+ - `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
74
+ - `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
75
+ - `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
76
+
77
+ ---
78
+
79
+ ## src/middleware/ - Middleware Rules
80
+
81
+ **State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
82
+
83
+ **WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
84
+
85
+ **WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
86
+
87
+ **BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
88
+
89
+ **Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
90
+
91
+ ---
92
+
93
+ ## src/orchestrator/ - Orchestration Rules
94
+
95
+ **Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
96
+
97
+ **IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
98
+
99
+ **DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
100
+
101
+ **Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
102
+
103
+ **State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
104
+
105
+ **Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
106
+
107
+ ---
108
+
109
+ ## src/services/ - Service Rules
110
+
111
+ **EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
112
+
113
+ **LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
114
+
115
+ **StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
116
+
117
+ **Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
118
+
119
+ ---
120
+
121
+ ## src/utils/ - Utility Rules
122
+
123
+ **Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
124
+
125
+ **Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
126
+
127
+ **Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
128
+
129
+ **LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
130
+
131
+ **Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
132
+
133
+ ---
134
+
135
+ ## src/orchestrator_factory.py Rules
136
+
137
+ **Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
138
+
139
+ **Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
140
+
141
+ **Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
142
+
143
+ **Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
144
+
145
+ **Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
146
+
147
+ ---
148
+
149
+ ## src/orchestrator_hierarchical.py Rules
150
+
151
+ **Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
152
+
153
+ **Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
154
+
155
+ **State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
156
+
157
+ **Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
158
+
159
+ **Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
160
+
161
+ ---
162
+
163
+ ## src/orchestrator_magentic.py Rules
164
+
165
+ **Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
166
+
167
+ **Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
168
+
169
+ **Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
170
+
171
+ **Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
172
+
173
+ **State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
174
+
175
+ **Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
176
+
177
+ **Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
178
+
179
+ ---
180
+
181
+ ## src/agent_factory/ - Factory Rules
182
+
183
+ **Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
184
+
185
+ **Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
186
+
187
+ **Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
188
+
189
+ **Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
190
+
191
+ **Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
192
+
193
+ ---
194
+
195
+ ## src/prompts/ - Prompt Rules
196
+
197
+ **Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
198
+
199
+ **Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
200
+
201
+ **Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
202
+
203
+ **Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
204
+
205
+ ---
206
+
207
+ ## Testing Rules
208
+
209
+ **Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
210
+
211
+ **Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
212
+
213
+ **Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
214
+
215
+ **Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
216
+
217
+ ---
218
+
219
+ ## File-Specific Agent Rules
220
+
221
+ **knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
222
+
223
+ **writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
224
+
225
+ **long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
226
+
227
+ **proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
228
+
229
+ **tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
230
+
231
+ **thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
232
+
233
+ **input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.
234
+
235
+
236
+
237
+
238
+
239
+
240
+
.env.example CHANGED
@@ -7,9 +7,9 @@ LLM_PROVIDER=openai
7
  OPENAI_API_KEY=sk-your-key-here
8
  ANTHROPIC_API_KEY=sk-ant-your-key-here
9
 
10
- # Model names (optional - sensible defaults)
11
- ANTHROPIC_MODEL=claude-3-5-sonnet-20240620
12
- OPENAI_MODEL=gpt-4-turbo
13
 
14
  # ============== EMBEDDINGS ==============
15
 
 
7
  OPENAI_API_KEY=sk-your-key-here
8
  ANTHROPIC_API_KEY=sk-ant-your-key-here
9
 
10
+ # Model names (optional - sensible defaults set in config.py)
11
+ # ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
12
+ # OPENAI_MODEL=gpt-5.1
13
 
14
  # ============== EMBEDDINGS ==============
15
 
.github/README.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: DeepCritical
3
+ emoji: 🧬
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: "6.0.1"
8
+ python_version: "3.11"
9
+ app_file: src/app.py
10
+ pinned: false
11
+ license: mit
12
+ tags:
13
+ - mcp-in-action-track-enterprise
14
+ - mcp-hackathon
15
+ - drug-repurposing
16
+ - biomedical-ai
17
+ - pydantic-ai
18
+ - llamaindex
19
+ - modal
20
+ ---
21
+
22
+ # DeepCritical
23
+
24
+ ## Intro
25
+
26
+ ## Features
27
+
28
+ - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
29
+ - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
30
+ - **Modal Sandbox**: Secure execution of AI-generated statistical code
31
+ - **LlamaIndex RAG**: Semantic search and evidence synthesis
32
+ - **HuggingfaceInference**:
33
+ - **HuggingfaceMCP Custom Config To Use Community Tools**:
34
+ - **Strongly Typed Composable Graphs**:
35
+ - **Specialized Research Teams of Agents**:
36
+
37
+ ## Quick Start
38
+
39
+ ### 1. Environment Setup
40
+
41
+ ```bash
42
+ # Install uv if you haven't already
43
+ pip install uv
44
+
45
+ # Sync dependencies
46
+ uv sync
47
+ ```
48
+
49
+ ### 2. Run the UI
50
+
51
+ ```bash
52
+ # Start the Gradio app
53
+ uv run gradio run src/app.py
54
+ ```
55
+
56
+ Open your browser to `http://localhost:7860`.
57
+
58
+ ### 3. Connect via MCP
59
+
60
+ This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
61
+
62
+ **MCP Server URL**: `http://localhost:7860/gradio_api/mcp/`
63
+
64
+ **Claude Desktop Configuration**:
65
+ Add this to your `claude_desktop_config.json`:
66
+ ```json
67
+ {
68
+ "mcpServers": {
69
+ "deepcritical": {
70
+ "url": "http://localhost:7860/gradio_api/mcp/"
71
+ }
72
+ }
73
+ }
74
+ ```
75
+
76
+ **Available Tools**:
77
+ - `search_pubmed`: Search peer-reviewed biomedical literature.
78
+ - `search_clinical_trials`: Search ClinicalTrials.gov.
79
+ - `search_biorxiv`: Search bioRxiv/medRxiv preprints.
80
+ - `search_all`: Search all sources simultaneously.
81
+ - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
82
+
83
+
84
+ ## Deep Research Flows
85
+
86
+ - iterativeResearch
87
+ - deepResearch
88
+ - researchTeam
89
+
90
+ ### Iterative Research
91
+
92
+ sequenceDiagram
93
+ participant IterativeFlow
94
+ participant ThinkingAgent
95
+ participant KnowledgeGapAgent
96
+ participant ToolSelector
97
+ participant ToolExecutor
98
+ participant JudgeHandler
99
+ participant WriterAgent
100
+
101
+ IterativeFlow->>IterativeFlow: run(query)
102
+
103
+ loop Until complete or max_iterations
104
+ IterativeFlow->>ThinkingAgent: generate_observations()
105
+ ThinkingAgent-->>IterativeFlow: observations
106
+
107
+ IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
108
+ KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
109
+
110
+ alt Research complete
111
+ IterativeFlow->>WriterAgent: create_final_report()
112
+ WriterAgent-->>IterativeFlow: final_report
113
+ else Gaps remain
114
+ IterativeFlow->>ToolSelector: select_agents(gap)
115
+ ToolSelector-->>IterativeFlow: AgentSelectionPlan
116
+
117
+ IterativeFlow->>ToolExecutor: execute_tool_tasks()
118
+ ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
119
+
120
+ IterativeFlow->>JudgeHandler: assess_evidence()
121
+ JudgeHandler-->>IterativeFlow: should_continue
122
+ end
123
+ end
124
+
125
+
126
+ ### Deep Research
127
+
128
+ sequenceDiagram
129
+ actor User
130
+ participant GraphOrchestrator
131
+ participant InputParser
132
+ participant GraphBuilder
133
+ participant GraphExecutor
134
+ participant Agent
135
+ participant BudgetTracker
136
+ participant WorkflowState
137
+
138
+ User->>GraphOrchestrator: run(query)
139
+ GraphOrchestrator->>InputParser: detect_research_mode(query)
140
+ InputParser-->>GraphOrchestrator: mode (iterative/deep)
141
+ GraphOrchestrator->>GraphBuilder: build_graph(mode)
142
+ GraphBuilder-->>GraphOrchestrator: ResearchGraph
143
+ GraphOrchestrator->>WorkflowState: init_workflow_state()
144
+ GraphOrchestrator->>BudgetTracker: create_budget()
145
+ GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
146
+
147
+ loop For each node in graph
148
+ GraphExecutor->>Agent: execute_node(agent_node)
149
+ Agent->>Agent: process_input
150
+ Agent-->>GraphExecutor: result
151
+ GraphExecutor->>WorkflowState: update_state(result)
152
+ GraphExecutor->>BudgetTracker: add_tokens(used)
153
+ GraphExecutor->>BudgetTracker: check_budget()
154
+ alt Budget exceeded
155
+ GraphExecutor->>GraphOrchestrator: emit(error_event)
156
+ else Continue
157
+ GraphExecutor->>GraphOrchestrator: emit(progress_event)
158
+ end
159
+ end
160
+
161
+ GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
162
+
163
+ ### Research Team
164
+ Critical Deep Research Agent
165
+
166
+ ## Development
167
+
168
+ ### Run Tests
169
+
170
+ ```bash
171
+ uv run pytest
172
+ ```
173
+
174
+ ### Run Checks
175
+
176
+ ```bash
177
+ make check
178
+ ```
179
+
180
+ ## Architecture
181
+
182
+ DeepCritical uses a Vertical Slice Architecture:
183
+
184
+ 1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
185
+ 2. **Judge Slice**: Evaluating evidence quality using LLMs.
186
+ 3. **Orchestrator Slice**: Managing the research loop and UI.
187
+
188
+ Built with:
189
+ - **PydanticAI**: For robust agent interactions.
190
+ - **Gradio**: For the streaming user interface.
191
+ - **PubMed, ClinicalTrials.gov, bioRxiv**: For biomedical data.
192
+ - **MCP**: For universal tool access.
193
+ - **Modal**: For secure code execution.
194
+
195
+ ## Team
196
+
197
+ - The-Obstacle-Is-The-Way
198
+ - MarioAderman
199
+ - Josephrp
200
+
201
+ ## Links
202
+
203
+ - [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)
.github/workflows/ci.yml CHANGED
@@ -2,33 +2,66 @@ name: CI
2
 
3
  on:
4
  push:
5
- branches: [main, dev]
6
  pull_request:
7
- branches: [main, dev]
8
 
9
  jobs:
10
- check:
11
  runs-on: ubuntu-latest
 
 
 
12
 
13
  steps:
14
  - uses: actions/checkout@v4
15
 
16
- - name: Install uv
17
- uses: astral-sh/setup-uv@v4
18
  with:
19
- version: "latest"
20
-
21
- - name: Set up Python 3.11
22
- run: uv python install 3.11
23
 
24
  - name: Install dependencies
25
- run: uv sync --all-extras
 
 
26
 
27
  - name: Lint with ruff
28
- run: uv run ruff check src tests
 
 
29
 
30
  - name: Type check with mypy
31
- run: uv run mypy src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
- - name: Run tests
34
- run: uv run pytest tests/unit/ -v
 
 
 
 
 
2
 
3
  on:
4
  push:
5
+ branches: [main, develop]
6
  pull_request:
7
+ branches: [main, develop]
8
 
9
  jobs:
10
+ test:
11
  runs-on: ubuntu-latest
12
+ strategy:
13
+ matrix:
14
+ python-version: ["3.11"]
15
 
16
  steps:
17
  - uses: actions/checkout@v4
18
 
19
+ - name: Set up Python ${{ matrix.python-version }}
20
+ uses: actions/setup-python@v5
21
  with:
22
+ python-version: ${{ matrix.python-version }}
 
 
 
23
 
24
  - name: Install dependencies
25
+ run: |
26
+ python -m pip install --upgrade pip
27
+ pip install -e ".[dev]"
28
 
29
  - name: Lint with ruff
30
+ run: |
31
+ ruff check . --exclude tests
32
+ ruff format --check . --exclude tests
33
 
34
  - name: Type check with mypy
35
+ run: |
36
+ mypy src
37
+
38
+ - name: Install embedding dependencies
39
+ run: |
40
+ pip install -e ".[embeddings]"
41
+
42
+ - name: Run unit tests (excluding OpenAI and embedding providers)
43
+ env:
44
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
45
+ run: |
46
+ pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
47
+
48
+ - name: Run local embeddings tests
49
+ env:
50
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
51
+ run: |
52
+ pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire || true
53
+ continue-on-error: true # Allow failures if dependencies not available
54
+
55
+ - name: Run HuggingFace integration tests
56
+ env:
57
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
58
+ run: |
59
+ pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire || true
60
+ continue-on-error: true # Allow failures if HF_TOKEN not set
61
 
62
+ - name: Run non-OpenAI integration tests (excluding embedding providers)
63
+ env:
64
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
65
+ run: |
66
+ pytest tests/integration/ -v -m "integration and not openai and not embedding_provider" --tb=short -p no:logfire || true
67
+ continue-on-error: true # Allow failures if dependencies not available
.gitignore CHANGED
@@ -1,3 +1,6 @@
 
 
 
1
  # Python
2
  __pycache__/
3
  *.py[cod]
 
1
+ folder/
2
+ .cursor/
3
+ .ruff_cache/
4
  # Python
5
  __pycache__/
6
  *.py[cod]
.pre-commit-config.yaml CHANGED
@@ -3,9 +3,10 @@ repos:
3
  rev: v0.4.4
4
  hooks:
5
  - id: ruff
6
- args: [--fix]
7
  exclude: ^reference_repos/
8
  - id: ruff-format
 
9
  exclude: ^reference_repos/
10
 
11
  - repo: https://github.com/pre-commit/mirrors-mypy
@@ -13,9 +14,51 @@ repos:
13
  hooks:
14
  - id: mypy
15
  files: ^src/
 
16
  additional_dependencies:
17
  - pydantic>=2.7
18
  - pydantic-settings>=2.2
19
  - tenacity>=8.2
20
  - pydantic-ai>=0.0.16
21
  args: [--ignore-missing-imports]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  rev: v0.4.4
4
  hooks:
5
  - id: ruff
6
+ args: [--fix, --exclude, tests]
7
  exclude: ^reference_repos/
8
  - id: ruff-format
9
+ args: [--exclude, tests]
10
  exclude: ^reference_repos/
11
 
12
  - repo: https://github.com/pre-commit/mirrors-mypy
 
14
  hooks:
15
  - id: mypy
16
  files: ^src/
17
+ exclude: ^folder
18
  additional_dependencies:
19
  - pydantic>=2.7
20
  - pydantic-settings>=2.2
21
  - tenacity>=8.2
22
  - pydantic-ai>=0.0.16
23
  args: [--ignore-missing-imports]
24
+
25
+ - repo: local
26
+ hooks:
27
+ - id: pytest-unit
28
+ name: pytest unit tests (no OpenAI)
29
+ entry: uv
30
+ language: system
31
+ types: [python]
32
+ args: [
33
+ "run",
34
+ "pytest",
35
+ "tests/unit/",
36
+ "-v",
37
+ "-m",
38
+ "not openai and not embedding_provider",
39
+ "--tb=short",
40
+ "-p",
41
+ "no:logfire",
42
+ ]
43
+ pass_filenames: false
44
+ always_run: true
45
+ require_serial: false
46
+ - id: pytest-local-embeddings
47
+ name: pytest local embeddings tests
48
+ entry: uv
49
+ language: system
50
+ types: [python]
51
+ args: [
52
+ "run",
53
+ "pytest",
54
+ "tests/",
55
+ "-v",
56
+ "-m",
57
+ "local_embeddings",
58
+ "--tb=short",
59
+ "-p",
60
+ "no:logfire",
61
+ ]
62
+ pass_filenames: false
63
+ always_run: true
64
+ require_serial: false
.pre-commit-hooks/run_pytest.ps1 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PowerShell pytest runner for pre-commit (Windows)
2
+ # Uses uv if available, otherwise falls back to pytest
3
+
4
+ if (Get-Command uv -ErrorAction SilentlyContinue) {
5
+ uv run pytest $args
6
+ } else {
7
+ Write-Warning "uv not found, using system pytest (may have missing dependencies)"
8
+ pytest $args
9
+ }
10
+
11
+
12
+
13
+
14
+
.pre-commit-hooks/run_pytest.sh ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Cross-platform pytest runner for pre-commit
3
+ # Uses uv if available, otherwise falls back to pytest
4
+
5
+ if command -v uv >/dev/null 2>&1; then
6
+ uv run pytest "$@"
7
+ else
8
+ echo "Warning: uv not found, using system pytest (may have missing dependencies)"
9
+ pytest "$@"
10
+ fi
11
+
12
+
13
+
14
+
15
+
AGENTS.md DELETED
@@ -1,118 +0,0 @@
1
- # AGENTS.md
2
-
3
- This file provides guidance to AI agents when working with code in this repository.
4
-
5
- ## Project Overview
6
-
7
- DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
8
-
9
- **Current Status:** Phases 1-13 COMPLETE (Foundation through Modal sandbox integration).
10
-
11
- ## Development Commands
12
-
13
- ```bash
14
- # Install all dependencies (including dev)
15
- make install # or: uv sync --all-extras && uv run pre-commit install
16
-
17
- # Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
18
- make check
19
-
20
- # Individual commands
21
- make test # uv run pytest tests/unit/ -v
22
- make lint # uv run ruff check src tests
23
- make format # uv run ruff format src tests
24
- make typecheck # uv run mypy src
25
- make test-cov # uv run pytest --cov=src --cov-report=term-missing
26
-
27
- # Run single test
28
- uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
29
-
30
- # Integration tests (real APIs)
31
- uv run pytest -m integration
32
- ```
33
-
34
- ## Architecture
35
-
36
- **Pattern**: Search-and-judge loop with multi-tool orchestration.
37
-
38
- ```text
39
- User Question → Orchestrator
40
-
41
- Search Loop:
42
- 1. Query PubMed, ClinicalTrials.gov, bioRxiv
43
- 2. Gather evidence
44
- 3. Judge quality ("Do we have enough?")
45
- 4. If NO → Refine query, search more
46
- 5. If YES → Synthesize findings (+ optional Modal analysis)
47
-
48
- Research Report with Citations
49
- ```
50
-
51
- **Key Components**:
52
-
53
- - `src/orchestrator.py` - Main agent loop
54
- - `src/tools/pubmed.py` - PubMed E-utilities search
55
- - `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
56
- - `src/tools/biorxiv.py` - bioRxiv/medRxiv preprint search
57
- - `src/tools/code_execution.py` - Modal sandbox execution
58
- - `src/tools/search_handler.py` - Scatter-gather orchestration
59
- - `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
60
- - `src/services/statistical_analyzer.py` - Statistical analysis via Modal
61
- - `src/agent_factory/judges.py` - LLM-based evidence assessment
62
- - `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
63
- - `src/mcp_tools.py` - MCP tool wrappers for Claude Desktop
64
- - `src/utils/config.py` - Pydantic Settings (loads from `.env`)
65
- - `src/utils/models.py` - Evidence, Citation, SearchResult models
66
- - `src/utils/exceptions.py` - Exception hierarchy
67
- - `src/app.py` - Gradio UI with MCP server (HuggingFace Spaces)
68
-
69
- **Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
70
-
71
- ## Configuration
72
-
73
- Settings via pydantic-settings from `.env`:
74
-
75
- - `LLM_PROVIDER`: "openai" or "anthropic"
76
- - `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
77
- - `NCBI_API_KEY`: Optional, for higher PubMed rate limits
78
- - `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
79
- - `MAX_ITERATIONS`: 1-50, default 10
80
- - `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
81
-
82
- ## Exception Hierarchy
83
-
84
- ```text
85
- DeepCriticalError (base)
86
- ├── SearchError
87
- │ └── RateLimitError
88
- ├── JudgeError
89
- └── ConfigurationError
90
- ```
91
-
92
- ## Testing
93
-
94
- - **TDD**: Write tests first in `tests/unit/`, implement in `src/`
95
- - **Markers**: `unit`, `integration`, `slow`
96
- - **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
97
- - **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
98
-
99
- ## Coding Standards
100
-
101
- - Python 3.11+, strict mypy, ruff (100-char lines)
102
- - Type all functions, use Pydantic models for data
103
- - Use `structlog` for logging, not print
104
- - Conventional commits: `feat(scope):`, `fix:`, `docs:`
105
-
106
- ## Git Workflow
107
-
108
- - `main`: Production-ready (GitHub)
109
- - `dev`: Development integration (GitHub)
110
- - Remote `origin`: GitHub (source of truth for PRs/code review)
111
- - Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
112
-
113
- **HuggingFace Spaces Collaboration:**
114
-
115
- - Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
116
- - **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
117
- - GitHub is the source of truth; HuggingFace is for deployment/demo
118
- - Consider using git hooks to prevent accidental pushes to protected branches
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
AGENTS.txt ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepCritical Project - Rules
2
+
3
+ ## Project-Wide Rules
4
+
5
+ **Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
6
+
7
+ **Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
8
+
9
+ **Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
10
+
11
+ **Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
12
+
13
+ **Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
14
+
15
+ **Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
16
+
17
+ **Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
18
+
19
+ **Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
20
+
21
+ **Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
22
+
23
+ **State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
24
+
25
+ **Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
26
+
27
+ ---
28
+
29
+ ## src/agents/ - Agent Implementation Rules
30
+
31
+ **Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
32
+
33
+ **Agent Structure**:
34
+ - System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
35
+ - Agent class with `__init__(model: Any | None = None)`
36
+ - Main method (e.g., `async def evaluate()`, `async def write_report()`)
37
+ - Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
38
+
39
+ **Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
40
+
41
+ **Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
42
+
43
+ **Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
44
+
45
+ **Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
46
+
47
+ **Agent-Specific Rules**:
48
+ - `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
49
+ - `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
50
+ - `writer.py`: Returns markdown string. Includes citations in numbered format.
51
+ - `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
52
+ - `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
53
+ - `thinking.py`: Returns observation string from conversation history.
54
+ - `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
55
+
56
+ ---
57
+
58
+ ## src/tools/ - Search Tool Rules
59
+
60
+ **Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
61
+
62
+ **Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
63
+
64
+ **Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
65
+
66
+ **Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
67
+
68
+ **Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
69
+
70
+ **Tool-Specific Rules**:
71
+ - `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
72
+ - `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
73
+ - `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
74
+ - `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
75
+ - `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
76
+
77
+ ---
78
+
79
+ ## src/middleware/ - Middleware Rules
80
+
81
+ **State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
82
+
83
+ **WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
84
+
85
+ **WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
86
+
87
+ **BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
88
+
89
+ **Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
90
+
91
+ ---
92
+
93
+ ## src/orchestrator/ - Orchestration Rules
94
+
95
+ **Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
96
+
97
+ **IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
98
+
99
+ **DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
100
+
101
+ **Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
102
+
103
+ **State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
104
+
105
+ **Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
106
+
107
+ ---
108
+
109
+ ## src/services/ - Service Rules
110
+
111
+ **EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
112
+
113
+ **LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
114
+
115
+ **StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
116
+
117
+ **Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
118
+
119
+ ---
120
+
121
+ ## src/utils/ - Utility Rules
122
+
123
+ **Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
124
+
125
+ **Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
126
+
127
+ **Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
128
+
129
+ **LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
130
+
131
+ **Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
132
+
133
+ ---
134
+
135
+ ## src/orchestrator_factory.py Rules
136
+
137
+ **Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
138
+
139
+ **Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
140
+
141
+ **Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
142
+
143
+ **Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
144
+
145
+ **Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
146
+
147
+ ---
148
+
149
+ ## src/orchestrator_hierarchical.py Rules
150
+
151
+ **Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
152
+
153
+ **Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
154
+
155
+ **State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
156
+
157
+ **Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
158
+
159
+ **Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
160
+
161
+ ---
162
+
163
+ ## src/orchestrator_magentic.py Rules
164
+
165
+ **Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
166
+
167
+ **Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
168
+
169
+ **Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
170
+
171
+ **Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
172
+
173
+ **State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
174
+
175
+ **Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
176
+
177
+ **Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
178
+
179
+ ---
180
+
181
+ ## src/agent_factory/ - Factory Rules
182
+
183
+ **Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
184
+
185
+ **Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
186
+
187
+ **Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
188
+
189
+ **Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
190
+
191
+ **Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
192
+
193
+ ---
194
+
195
+ ## src/prompts/ - Prompt Rules
196
+
197
+ **Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
198
+
199
+ **Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
200
+
201
+ **Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
202
+
203
+ **Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
204
+
205
+ ---
206
+
207
+ ## Testing Rules
208
+
209
+ **Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
210
+
211
+ **Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
212
+
213
+ **Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
214
+
215
+ **Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
216
+
217
+ ---
218
+
219
+ ## File-Specific Agent Rules
220
+
221
+ **knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
222
+
223
+ **writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
224
+
225
+ **long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
226
+
227
+ **proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
228
+
229
+ **tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
230
+
231
+ **thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
232
+
233
+ **input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.
234
+
235
+
236
+
CLAUDE.md DELETED
@@ -1,111 +0,0 @@
1
- # CLAUDE.md
2
-
3
- This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
-
5
- ## Project Overview
6
-
7
- DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
8
-
9
- **Current Status:** Phases 1-13 COMPLETE (Foundation through Modal sandbox integration).
10
-
11
- ## Development Commands
12
-
13
- ```bash
14
- # Install all dependencies (including dev)
15
- make install # or: uv sync --all-extras && uv run pre-commit install
16
-
17
- # Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
18
- make check
19
-
20
- # Individual commands
21
- make test # uv run pytest tests/unit/ -v
22
- make lint # uv run ruff check src tests
23
- make format # uv run ruff format src tests
24
- make typecheck # uv run mypy src
25
- make test-cov # uv run pytest --cov=src --cov-report=term-missing
26
-
27
- # Run single test
28
- uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
29
-
30
- # Integration tests (real APIs)
31
- uv run pytest -m integration
32
- ```
33
-
34
- ## Architecture
35
-
36
- **Pattern**: Search-and-judge loop with multi-tool orchestration.
37
-
38
- ```text
39
- User Question → Orchestrator
40
-
41
- Search Loop:
42
- 1. Query PubMed, ClinicalTrials.gov, bioRxiv
43
- 2. Gather evidence
44
- 3. Judge quality ("Do we have enough?")
45
- 4. If NO → Refine query, search more
46
- 5. If YES → Synthesize findings (+ optional Modal analysis)
47
-
48
- Research Report with Citations
49
- ```
50
-
51
- **Key Components**:
52
-
53
- - `src/orchestrator.py` - Main agent loop
54
- - `src/tools/pubmed.py` - PubMed E-utilities search
55
- - `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
56
- - `src/tools/biorxiv.py` - bioRxiv/medRxiv preprint search
57
- - `src/tools/code_execution.py` - Modal sandbox execution
58
- - `src/tools/search_handler.py` - Scatter-gather orchestration
59
- - `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
60
- - `src/services/statistical_analyzer.py` - Statistical analysis via Modal
61
- - `src/agent_factory/judges.py` - LLM-based evidence assessment
62
- - `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
63
- - `src/mcp_tools.py` - MCP tool wrappers for Claude Desktop
64
- - `src/utils/config.py` - Pydantic Settings (loads from `.env`)
65
- - `src/utils/models.py` - Evidence, Citation, SearchResult models
66
- - `src/utils/exceptions.py` - Exception hierarchy
67
- - `src/app.py` - Gradio UI with MCP server (HuggingFace Spaces)
68
-
69
- **Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
70
-
71
- ## Configuration
72
-
73
- Settings via pydantic-settings from `.env`:
74
-
75
- - `LLM_PROVIDER`: "openai" or "anthropic"
76
- - `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
77
- - `NCBI_API_KEY`: Optional, for higher PubMed rate limits
78
- - `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
79
- - `MAX_ITERATIONS`: 1-50, default 10
80
- - `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
81
-
82
- ## Exception Hierarchy
83
-
84
- ```text
85
- DeepCriticalError (base)
86
- ├── SearchError
87
- │ └── RateLimitError
88
- ├── JudgeError
89
- └── ConfigurationError
90
- ```
91
-
92
- ## Testing
93
-
94
- - **TDD**: Write tests first in `tests/unit/`, implement in `src/`
95
- - **Markers**: `unit`, `integration`, `slow`
96
- - **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
97
- - **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
98
-
99
- ## Git Workflow
100
-
101
- - `main`: Production-ready (GitHub)
102
- - `dev`: Development integration (GitHub)
103
- - Remote `origin`: GitHub (source of truth for PRs/code review)
104
- - Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
105
-
106
- **HuggingFace Spaces Collaboration:**
107
-
108
- - Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
109
- - **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
110
- - GitHub is the source of truth; HuggingFace is for deployment/demo
111
- - Consider using git hooks to prevent accidental pushes to protected branches
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CONTRIBUTING.md ADDED
@@ -0,0 +1 @@
 
 
1
+ make sure you run the full pre-commit checks before opening a PR (not draft) otherwise Obstacle is the Way will loose his mind
GEMINI.md DELETED
@@ -1,98 +0,0 @@
1
- # DeepCritical Context
2
-
3
- ## Project Overview
4
-
5
- **DeepCritical** is an AI-native Medical Drug Repurposing Research Agent.
6
- **Goal:** To accelerate the discovery of new uses for existing drugs by intelligently searching biomedical literature (PubMed, ClinicalTrials.gov, bioRxiv), evaluating evidence, and hypothesizing potential applications.
7
-
8
- **Architecture:**
9
- The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orchestrator) and adheres to **Strict TDD** (Test-Driven Development).
10
-
11
- **Current Status:**
12
-
13
- - **Phases 1-9:** COMPLETE. Foundation, Search, Judge, UI, Orchestrator, Embeddings, Hypothesis, Report, Cleanup.
14
- - **Phases 10-11:** COMPLETE. ClinicalTrials.gov and bioRxiv integration.
15
- - **Phase 12:** COMPLETE. MCP Server integration (Gradio MCP at `/gradio_api/mcp/`).
16
- - **Phase 13:** COMPLETE. Modal sandbox for statistical analysis.
17
-
18
- ## Tech Stack & Tooling
19
-
20
- - **Language:** Python 3.11 (Pinned)
21
- - **Package Manager:** `uv` (Rust-based, extremely fast)
22
- - **Frameworks:** `pydantic`, `pydantic-ai`, `httpx`, `gradio[mcp]`
23
- - **Vector DB:** `chromadb` with `sentence-transformers` for semantic search
24
- - **Code Execution:** `modal` for secure sandboxed Python execution
25
- - **Testing:** `pytest`, `pytest-asyncio`, `respx` (for mocking)
26
- - **Quality:** `ruff` (linting/formatting), `mypy` (strict type checking), `pre-commit`
27
-
28
- ## Building & Running
29
-
30
- | Command | Description |
31
- | :--- | :--- |
32
- | `make install` | Install dependencies and pre-commit hooks. |
33
- | `make test` | Run unit tests. |
34
- | `make lint` | Run Ruff linter. |
35
- | `make format` | Run Ruff formatter. |
36
- | `make typecheck` | Run Mypy static type checker. |
37
- | `make check` | **The Golden Gate:** Runs lint, typecheck, and test. Must pass before committing. |
38
- | `make clean` | Clean up cache and artifacts. |
39
-
40
- ## Directory Structure
41
-
42
- - `src/`: Source code
43
- - `utils/`: Shared utilities (`config.py`, `exceptions.py`, `models.py`)
44
- - `tools/`: Search tools (`pubmed.py`, `clinicaltrials.py`, `biorxiv.py`, `code_execution.py`)
45
- - `services/`: Services (`embeddings.py`, `statistical_analyzer.py`)
46
- - `agents/`: Magentic multi-agent mode agents
47
- - `agent_factory/`: Agent definitions (judges, prompts)
48
- - `mcp_tools.py`: MCP tool wrappers for Claude Desktop integration
49
- - `app.py`: Gradio UI with MCP server
50
- - `tests/`: Test suite
51
- - `unit/`: Isolated unit tests (Mocked)
52
- - `integration/`: Real API tests (Marked as slow/integration)
53
- - `docs/`: Documentation and Implementation Specs
54
- - `examples/`: Working demos for each phase
55
-
56
- ## Key Components
57
-
58
- - `src/orchestrator.py` - Main agent loop
59
- - `src/tools/pubmed.py` - PubMed E-utilities search
60
- - `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
61
- - `src/tools/biorxiv.py` - bioRxiv/medRxiv preprint search
62
- - `src/tools/code_execution.py` - Modal sandbox execution
63
- - `src/services/statistical_analyzer.py` - Statistical analysis via Modal
64
- - `src/mcp_tools.py` - MCP tool wrappers
65
- - `src/app.py` - Gradio UI (HuggingFace Spaces) with MCP server
66
-
67
- ## Configuration
68
-
69
- Settings via pydantic-settings from `.env`:
70
-
71
- - `LLM_PROVIDER`: "openai" or "anthropic"
72
- - `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
73
- - `NCBI_API_KEY`: Optional, for higher PubMed rate limits
74
- - `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
75
- - `MAX_ITERATIONS`: 1-50, default 10
76
- - `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
77
-
78
- ## Development Conventions
79
-
80
- 1. **Strict TDD:** Write failing tests in `tests/unit/` *before* implementing logic in `src/`.
81
- 2. **Type Safety:** All code must pass `mypy --strict`. Use Pydantic models for data exchange.
82
- 3. **Linting:** Zero tolerance for Ruff errors.
83
- 4. **Mocking:** Use `respx` or `unittest.mock` for all external API calls in unit tests.
84
- 5. **Vertical Slices:** Implement features end-to-end rather than layer-by-layer.
85
-
86
- ## Git Workflow
87
-
88
- - `main`: Production-ready (GitHub)
89
- - `dev`: Development integration (GitHub)
90
- - Remote `origin`: GitHub (source of truth for PRs/code review)
91
- - Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
92
-
93
- **HuggingFace Spaces Collaboration:**
94
-
95
- - Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
96
- - **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
97
- - GitHub is the source of truth; HuggingFace is for deployment/demo
98
- - Consider using git hooks to prevent accidental pushes to protected branches
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Makefile CHANGED
@@ -8,15 +8,21 @@ install:
8
  uv run pre-commit install
9
 
10
  test:
11
- uv run pytest tests/unit/ -v
 
 
 
 
 
 
12
 
13
  # Coverage aliases
14
  cov: test-cov
15
  test-cov:
16
- uv run pytest --cov=src --cov-report=term-missing
17
 
18
  cov-html:
19
- uv run pytest --cov=src --cov-report=html
20
  @echo "Coverage report: open htmlcov/index.html"
21
 
22
  lint:
 
8
  uv run pre-commit install
9
 
10
  test:
11
+ uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
12
+
13
+ test-hf:
14
+ uv run pytest tests/ -v -m "huggingface" -p no:logfire
15
+
16
+ test-all:
17
+ uv run pytest tests/ -v -p no:logfire
18
 
19
  # Coverage aliases
20
  cov: test-cov
21
  test-cov:
22
+ uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
23
 
24
  cov-html:
25
+ uv run pytest --cov=src --cov-report=html -p no:logfire
26
  @echo "Coverage report: open htmlcov/index.html"
27
 
28
  lint:
README.md CHANGED
@@ -21,7 +21,7 @@ tags:
21
 
22
  # DeepCritical
23
 
24
- AI-Powered Drug Repurposing Research Agent
25
 
26
  ## Features
27
 
@@ -29,6 +29,10 @@ AI-Powered Drug Repurposing Research Agent
29
  - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
30
  - **Modal Sandbox**: Secure execution of AI-generated statistical code
31
  - **LlamaIndex RAG**: Semantic search and evidence synthesis
 
 
 
 
32
 
33
  ## Quick Start
34
 
@@ -46,7 +50,7 @@ uv sync
46
 
47
  ```bash
48
  # Start the Gradio app
49
- uv run python src/app.py
50
  ```
51
 
52
  Open your browser to `http://localhost:7860`.
@@ -76,6 +80,97 @@ Add this to your `claude_desktop_config.json`:
76
  - `search_all`: Search all sources simultaneously.
77
  - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ## Development
80
 
81
  ### Run Tests
@@ -90,22 +185,7 @@ uv run pytest
90
  make check
91
  ```
92
 
93
- ## Architecture
94
-
95
- DeepCritical uses a Vertical Slice Architecture:
96
-
97
- 1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
98
- 2. **Judge Slice**: Evaluating evidence quality using LLMs.
99
- 3. **Orchestrator Slice**: Managing the research loop and UI.
100
-
101
- Built with:
102
- - **PydanticAI**: For robust agent interactions.
103
- - **Gradio**: For the streaming user interface.
104
- - **PubMed, ClinicalTrials.gov, bioRxiv**: For biomedical data.
105
- - **MCP**: For universal tool access.
106
- - **Modal**: For secure code execution.
107
-
108
- ## Team
109
 
110
  - The-Obstacle-Is-The-Way
111
  - MarioAderman
 
21
 
22
  # DeepCritical
23
 
24
+ ## Intro
25
 
26
  ## Features
27
 
 
29
  - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
30
  - **Modal Sandbox**: Secure execution of AI-generated statistical code
31
  - **LlamaIndex RAG**: Semantic search and evidence synthesis
32
+ - **HuggingfaceInference**:
33
+ - **HuggingfaceMCP Custom Config To Use Community Tools**:
34
+ - **Strongly Typed Composable Graphs**:
35
+ - **Specialized Research Teams of Agents**:
36
 
37
  ## Quick Start
38
 
 
50
 
51
  ```bash
52
  # Start the Gradio app
53
+ uv run gradio run src/app.py
54
  ```
55
 
56
  Open your browser to `http://localhost:7860`.
 
80
  - `search_all`: Search all sources simultaneously.
81
  - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
82
 
83
+
84
+
85
+ ## Architecture
86
+
87
+ DeepCritical uses a Vertical Slice Architecture:
88
+
89
+ 1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
90
+ 2. **Judge Slice**: Evaluating evidence quality using LLMs.
91
+ 3. **Orchestrator Slice**: Managing the research loop and UI.
92
+
93
+ - iterativeResearch
94
+ - deepResearch
95
+ - researchTeam
96
+
97
+ ### Iterative Research
98
+
99
+ sequenceDiagram
100
+ participant IterativeFlow
101
+ participant ThinkingAgent
102
+ participant KnowledgeGapAgent
103
+ participant ToolSelector
104
+ participant ToolExecutor
105
+ participant JudgeHandler
106
+ participant WriterAgent
107
+
108
+ IterativeFlow->>IterativeFlow: run(query)
109
+
110
+ loop Until complete or max_iterations
111
+ IterativeFlow->>ThinkingAgent: generate_observations()
112
+ ThinkingAgent-->>IterativeFlow: observations
113
+
114
+ IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
115
+ KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
116
+
117
+ alt Research complete
118
+ IterativeFlow->>WriterAgent: create_final_report()
119
+ WriterAgent-->>IterativeFlow: final_report
120
+ else Gaps remain
121
+ IterativeFlow->>ToolSelector: select_agents(gap)
122
+ ToolSelector-->>IterativeFlow: AgentSelectionPlan
123
+
124
+ IterativeFlow->>ToolExecutor: execute_tool_tasks()
125
+ ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
126
+
127
+ IterativeFlow->>JudgeHandler: assess_evidence()
128
+ JudgeHandler-->>IterativeFlow: should_continue
129
+ end
130
+ end
131
+
132
+
133
+ ### Deep Research
134
+
135
+ sequenceDiagram
136
+ actor User
137
+ participant GraphOrchestrator
138
+ participant InputParser
139
+ participant GraphBuilder
140
+ participant GraphExecutor
141
+ participant Agent
142
+ participant BudgetTracker
143
+ participant WorkflowState
144
+
145
+ User->>GraphOrchestrator: run(query)
146
+ GraphOrchestrator->>InputParser: detect_research_mode(query)
147
+ InputParser-->>GraphOrchestrator: mode (iterative/deep)
148
+ GraphOrchestrator->>GraphBuilder: build_graph(mode)
149
+ GraphBuilder-->>GraphOrchestrator: ResearchGraph
150
+ GraphOrchestrator->>WorkflowState: init_workflow_state()
151
+ GraphOrchestrator->>BudgetTracker: create_budget()
152
+ GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
153
+
154
+ loop For each node in graph
155
+ GraphExecutor->>Agent: execute_node(agent_node)
156
+ Agent->>Agent: process_input
157
+ Agent-->>GraphExecutor: result
158
+ GraphExecutor->>WorkflowState: update_state(result)
159
+ GraphExecutor->>BudgetTracker: add_tokens(used)
160
+ GraphExecutor->>BudgetTracker: check_budget()
161
+ alt Budget exceeded
162
+ GraphExecutor->>GraphOrchestrator: emit(error_event)
163
+ else Continue
164
+ GraphExecutor->>GraphOrchestrator: emit(progress_event)
165
+ end
166
+ end
167
+
168
+ GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
169
+
170
+ ### Research Team
171
+
172
+ Critical Deep Research Agent
173
+
174
  ## Development
175
 
176
  ### Run Tests
 
185
  make check
186
  ```
187
 
188
+ ## Join Us
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
189
 
190
  - The-Obstacle-Is-The-Way
191
  - MarioAderman
docs/CONFIGURATION.md ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Configuration Guide
2
+
3
+ ## Overview
4
+
5
+ DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
6
+
7
+ ## Quick Start
8
+
9
+ 1. Copy the example environment file (if available) or create a `.env` file in the project root
10
+ 2. Set at least one LLM API key (`OPENAI_API_KEY` or `ANTHROPIC_API_KEY`)
11
+ 3. Optionally configure other services as needed
12
+
13
+ ## Configuration System
14
+
15
+ ### How It Works
16
+
17
+ - **Settings Class**: `Settings` class in `src/utils/config.py` extends `BaseSettings` from `pydantic_settings`
18
+ - **Environment File**: Automatically loads from `.env` file (if present)
19
+ - **Environment Variables**: Reads from environment variables (case-insensitive)
20
+ - **Type Safety**: Strongly-typed fields with validation
21
+ - **Singleton Pattern**: Global `settings` instance for easy access
22
+
23
+ ### Usage
24
+
25
+ ```python
26
+ from src.utils.config import settings
27
+
28
+ # Check if API keys are available
29
+ if settings.has_openai_key:
30
+ # Use OpenAI
31
+ pass
32
+
33
+ # Access configuration values
34
+ max_iterations = settings.max_iterations
35
+ web_search_provider = settings.web_search_provider
36
+ ```
37
+
38
+ ## Required Configuration
39
+
40
+ ### At Least One LLM Provider
41
+
42
+ You must configure at least one LLM provider:
43
+
44
+ **OpenAI:**
45
+ ```bash
46
+ LLM_PROVIDER=openai
47
+ OPENAI_API_KEY=your_openai_api_key_here
48
+ OPENAI_MODEL=gpt-5.1
49
+ ```
50
+
51
+ **Anthropic:**
52
+ ```bash
53
+ LLM_PROVIDER=anthropic
54
+ ANTHROPIC_API_KEY=your_anthropic_api_key_here
55
+ ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
56
+ ```
57
+
58
+ ## Optional Configuration
59
+
60
+ ### Embedding Configuration
61
+
62
+ ```bash
63
+ # Embedding Provider: "openai", "local", or "huggingface"
64
+ EMBEDDING_PROVIDER=local
65
+
66
+ # OpenAI Embedding Model (used by LlamaIndex RAG)
67
+ OPENAI_EMBEDDING_MODEL=text-embedding-3-small
68
+
69
+ # Local Embedding Model (sentence-transformers)
70
+ LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
71
+
72
+ # HuggingFace Embedding Model
73
+ HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
74
+ ```
75
+
76
+ ### HuggingFace Configuration
77
+
78
+ ```bash
79
+ # HuggingFace API Token (for inference API)
80
+ HUGGINGFACE_API_KEY=your_huggingface_api_key_here
81
+ # Or use HF_TOKEN (alternative name)
82
+
83
+ # Default HuggingFace Model ID
84
+ HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
85
+ ```
86
+
87
+ ### Web Search Configuration
88
+
89
+ ```bash
90
+ # Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
91
+ # Default: "duckduckgo" (no API key required)
92
+ WEB_SEARCH_PROVIDER=duckduckgo
93
+
94
+ # Serper API Key (for Google search via Serper)
95
+ SERPER_API_KEY=your_serper_api_key_here
96
+
97
+ # SearchXNG Host URL
98
+ SEARCHXNG_HOST=http://localhost:8080
99
+
100
+ # Brave Search API Key
101
+ BRAVE_API_KEY=your_brave_api_key_here
102
+
103
+ # Tavily API Key
104
+ TAVILY_API_KEY=your_tavily_api_key_here
105
+ ```
106
+
107
+ ### PubMed Configuration
108
+
109
+ ```bash
110
+ # NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
111
+ NCBI_API_KEY=your_ncbi_api_key_here
112
+ ```
113
+
114
+ ### Agent Configuration
115
+
116
+ ```bash
117
+ # Maximum iterations per research loop
118
+ MAX_ITERATIONS=10
119
+
120
+ # Search timeout in seconds
121
+ SEARCH_TIMEOUT=30
122
+
123
+ # Use graph-based execution for research flows
124
+ USE_GRAPH_EXECUTION=false
125
+ ```
126
+
127
+ ### Budget & Rate Limiting Configuration
128
+
129
+ ```bash
130
+ # Default token budget per research loop
131
+ DEFAULT_TOKEN_LIMIT=100000
132
+
133
+ # Default time limit per research loop (minutes)
134
+ DEFAULT_TIME_LIMIT_MINUTES=10
135
+
136
+ # Default iterations limit per research loop
137
+ DEFAULT_ITERATIONS_LIMIT=10
138
+ ```
139
+
140
+ ### RAG Service Configuration
141
+
142
+ ```bash
143
+ # ChromaDB collection name for RAG
144
+ RAG_COLLECTION_NAME=deepcritical_evidence
145
+
146
+ # Number of top results to retrieve from RAG
147
+ RAG_SIMILARITY_TOP_K=5
148
+
149
+ # Automatically ingest evidence into RAG
150
+ RAG_AUTO_INGEST=true
151
+ ```
152
+
153
+ ### ChromaDB Configuration
154
+
155
+ ```bash
156
+ # ChromaDB storage path
157
+ CHROMA_DB_PATH=./chroma_db
158
+
159
+ # Whether to persist ChromaDB to disk
160
+ CHROMA_DB_PERSIST=true
161
+
162
+ # ChromaDB server host (for remote ChromaDB, optional)
163
+ # CHROMA_DB_HOST=localhost
164
+
165
+ # ChromaDB server port (for remote ChromaDB, optional)
166
+ # CHROMA_DB_PORT=8000
167
+ ```
168
+
169
+ ### External Services
170
+
171
+ ```bash
172
+ # Modal Token ID (for Modal sandbox execution)
173
+ MODAL_TOKEN_ID=your_modal_token_id_here
174
+
175
+ # Modal Token Secret
176
+ MODAL_TOKEN_SECRET=your_modal_token_secret_here
177
+ ```
178
+
179
+ ### Logging Configuration
180
+
181
+ ```bash
182
+ # Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
183
+ LOG_LEVEL=INFO
184
+ ```
185
+
186
+ ## Configuration Properties
187
+
188
+ The `Settings` class provides helpful properties for checking configuration:
189
+
190
+ ```python
191
+ from src.utils.config import settings
192
+
193
+ # Check API key availability
194
+ settings.has_openai_key # bool
195
+ settings.has_anthropic_key # bool
196
+ settings.has_huggingface_key # bool
197
+ settings.has_any_llm_key # bool
198
+
199
+ # Check service availability
200
+ settings.modal_available # bool
201
+ settings.web_search_available # bool
202
+ ```
203
+
204
+ ## Environment Variables Reference
205
+
206
+ ### Required (at least one LLM)
207
+ - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` - At least one LLM provider key
208
+
209
+ ### Optional LLM Providers
210
+ - `DEEPSEEK_API_KEY` (Phase 2)
211
+ - `OPENROUTER_API_KEY` (Phase 2)
212
+ - `GEMINI_API_KEY` (Phase 2)
213
+ - `PERPLEXITY_API_KEY` (Phase 2)
214
+ - `HUGGINGFACE_API_KEY` or `HF_TOKEN`
215
+ - `AZURE_OPENAI_ENDPOINT` (Phase 2)
216
+ - `AZURE_OPENAI_DEPLOYMENT` (Phase 2)
217
+ - `AZURE_OPENAI_API_KEY` (Phase 2)
218
+ - `AZURE_OPENAI_API_VERSION` (Phase 2)
219
+ - `LOCAL_MODEL_URL` (Phase 2)
220
+
221
+ ### Web Search
222
+ - `WEB_SEARCH_PROVIDER` (default: "duckduckgo")
223
+ - `SERPER_API_KEY`
224
+ - `SEARCHXNG_HOST`
225
+ - `BRAVE_API_KEY`
226
+ - `TAVILY_API_KEY`
227
+
228
+ ### Embeddings
229
+ - `EMBEDDING_PROVIDER` (default: "local")
230
+ - `HUGGINGFACE_EMBEDDING_MODEL` (optional)
231
+
232
+ ### RAG
233
+ - `RAG_COLLECTION_NAME` (default: "deepcritical_evidence")
234
+ - `RAG_SIMILARITY_TOP_K` (default: 5)
235
+ - `RAG_AUTO_INGEST` (default: true)
236
+
237
+ ### ChromaDB
238
+ - `CHROMA_DB_PATH` (default: "./chroma_db")
239
+ - `CHROMA_DB_PERSIST` (default: true)
240
+ - `CHROMA_DB_HOST` (optional)
241
+ - `CHROMA_DB_PORT` (optional)
242
+
243
+ ### Budget
244
+ - `DEFAULT_TOKEN_LIMIT` (default: 100000)
245
+ - `DEFAULT_TIME_LIMIT_MINUTES` (default: 10)
246
+ - `DEFAULT_ITERATIONS_LIMIT` (default: 10)
247
+
248
+ ### Other
249
+ - `LLM_PROVIDER` (default: "openai")
250
+ - `NCBI_API_KEY` (optional)
251
+ - `MODAL_TOKEN_ID` (optional)
252
+ - `MODAL_TOKEN_SECRET` (optional)
253
+ - `MAX_ITERATIONS` (default: 10)
254
+ - `LOG_LEVEL` (default: "INFO")
255
+ - `USE_GRAPH_EXECUTION` (default: false)
256
+
257
+ ## Validation
258
+
259
+ Settings are validated on load using Pydantic validation:
260
+
261
+ - **Type checking**: All fields are strongly typed
262
+ - **Range validation**: Numeric fields have min/max constraints
263
+ - **Literal validation**: Enum fields only accept specific values
264
+ - **Required fields**: API keys are checked when accessed via `get_api_key()`
265
+
266
+ ## Error Handling
267
+
268
+ Configuration errors raise `ConfigurationError`:
269
+
270
+ ```python
271
+ from src.utils.config import settings
272
+ from src.utils.exceptions import ConfigurationError
273
+
274
+ try:
275
+ api_key = settings.get_api_key()
276
+ except ConfigurationError as e:
277
+ print(f"Configuration error: {e}")
278
+ ```
279
+
280
+ ## Future Enhancements (Phase 2)
281
+
282
+ The following configurations are planned for Phase 2:
283
+
284
+ 1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
285
+ 2. **Model Selection**: Reasoning/main/fast model configuration
286
+ 3. **Service Integration**: Migrate `folder/llm_config.py` to centralized config
287
+
288
+ See `CONFIGURATION_ANALYSIS.md` for the complete implementation plan.
289
+
290
+
291
+
292
+
293
+
294
+
295
+
296
+
297
+
298
+
299
+
300
+
301
+
docs/architecture/graph_orchestration.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Graph Orchestration Architecture
2
+
3
+ ## Overview
4
+
5
+ Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
6
+
7
+ ## Graph Structure
8
+
9
+ ### Nodes
10
+
11
+ Graph nodes represent different stages in the research workflow:
12
+
13
+ 1. **Agent Nodes**: Execute Pydantic AI agents
14
+ - Input: Prompt/query
15
+ - Output: Structured or unstructured response
16
+ - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
17
+
18
+ 2. **State Nodes**: Update or read workflow state
19
+ - Input: Current state
20
+ - Output: Updated state
21
+ - Examples: Update evidence, update conversation history
22
+
23
+ 3. **Decision Nodes**: Make routing decisions based on conditions
24
+ - Input: Current state/results
25
+ - Output: Next node ID
26
+ - Examples: Continue research vs. complete research
27
+
28
+ 4. **Parallel Nodes**: Execute multiple nodes concurrently
29
+ - Input: List of node IDs
30
+ - Output: Aggregated results
31
+ - Examples: Parallel iterative research loops
32
+
33
+ ### Edges
34
+
35
+ Edges define transitions between nodes:
36
+
37
+ 1. **Sequential Edges**: Always traversed (no condition)
38
+ - From: Source node
39
+ - To: Target node
40
+ - Condition: None (always True)
41
+
42
+ 2. **Conditional Edges**: Traversed based on condition
43
+ - From: Source node
44
+ - To: Target node
45
+ - Condition: Callable that returns bool
46
+ - Example: If research complete → go to writer, else → continue loop
47
+
48
+ 3. **Parallel Edges**: Used for parallel execution branches
49
+ - From: Parallel node
50
+ - To: Multiple target nodes
51
+ - Execution: All targets run concurrently
52
+
53
+ ## Graph Patterns
54
+
55
+ ### Iterative Research Graph
56
+
57
+ ```
58
+ [Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
59
+ ↓ No ↓ Yes
60
+ [Tool Selector] [Writer]
61
+
62
+ [Execute Tools] → [Loop Back]
63
+ ```
64
+
65
+ ### Deep Research Graph
66
+
67
+ ```
68
+ [Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
69
+ ↓ ↓ ↓
70
+ [Loop1] [Loop2] [Loop3]
71
+ ```
72
+
73
+ ## State Management
74
+
75
+ State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
76
+
77
+ - **Evidence**: Collected evidence from searches
78
+ - **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
79
+ - **Embedding Service**: For semantic search
80
+
81
+ State transitions occur at state nodes, which update the global workflow state.
82
+
83
+ ## Execution Flow
84
+
85
+ 1. **Graph Construction**: Build graph from nodes and edges
86
+ 2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
87
+ 3. **Graph Execution**: Traverse graph from entry node
88
+ 4. **Node Execution**: Execute each node based on type
89
+ 5. **Edge Evaluation**: Determine next node(s) based on edges
90
+ 6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
91
+ 7. **State Updates**: Update state at state nodes
92
+ 8. **Event Streaming**: Yield events during execution for UI
93
+
94
+ ## Conditional Routing
95
+
96
+ Decision nodes evaluate conditions and return next node IDs:
97
+
98
+ - **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
99
+ - **Budget Decision**: If budget exceeded → exit, else → continue
100
+ - **Iteration Decision**: If max iterations → exit, else → continue
101
+
102
+ ## Parallel Execution
103
+
104
+ Parallel nodes execute multiple nodes concurrently:
105
+
106
+ - Each parallel branch runs independently
107
+ - Results are aggregated after all branches complete
108
+ - State is synchronized after parallel execution
109
+ - Errors in one branch don't stop other branches
110
+
111
+ ## Budget Enforcement
112
+
113
+ Budget constraints are enforced at decision nodes:
114
+
115
+ - **Token Budget**: Track LLM token usage
116
+ - **Time Budget**: Track elapsed time
117
+ - **Iteration Budget**: Track iteration count
118
+
119
+ If any budget is exceeded, execution routes to exit node.
120
+
121
+ ## Error Handling
122
+
123
+ Errors are handled at multiple levels:
124
+
125
+ 1. **Node Level**: Catch errors in individual node execution
126
+ 2. **Graph Level**: Handle errors during graph traversal
127
+ 3. **State Level**: Rollback state changes on error
128
+
129
+ Errors are logged and yield error events for UI.
130
+
131
+ ## Backward Compatibility
132
+
133
+ Graph execution is optional via feature flag:
134
+
135
+ - `USE_GRAPH_EXECUTION=true`: Use graph-based execution
136
+ - `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
137
+
138
+ This allows gradual migration and fallback if needed.
139
+
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+
150
+
151
+
docs/examples/writer_agents_usage.md ADDED
@@ -0,0 +1,425 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Writer Agents Usage Examples
2
+
3
+ This document provides examples of how to use the writer agents in DeepCritical for generating research reports.
4
+
5
+ ## Overview
6
+
7
+ DeepCritical provides three writer agents for different report generation scenarios:
8
+
9
+ 1. **WriterAgent** - Basic writer for simple reports from findings
10
+ 2. **LongWriterAgent** - Iterative writer for long-form multi-section reports
11
+ 3. **ProofreaderAgent** - Finalizes and polishes report drafts
12
+
13
+ ## WriterAgent
14
+
15
+ The `WriterAgent` generates final reports from research findings. It's used in iterative research flows.
16
+
17
+ ### Basic Usage
18
+
19
+ ```python
20
+ from src.agent_factory.agents import create_writer_agent
21
+
22
+ # Create writer agent
23
+ writer = create_writer_agent()
24
+
25
+ # Generate report
26
+ query = "What is the capital of France?"
27
+ findings = """
28
+ Paris is the capital of France [1].
29
+ It is located in the north-central part of the country [2].
30
+
31
+ [1] https://example.com/france-info
32
+ [2] https://example.com/paris-info
33
+ """
34
+
35
+ report = await writer.write_report(
36
+ query=query,
37
+ findings=findings,
38
+ )
39
+
40
+ print(report)
41
+ ```
42
+
43
+ ### With Output Length Specification
44
+
45
+ ```python
46
+ report = await writer.write_report(
47
+ query="Explain machine learning",
48
+ findings=findings,
49
+ output_length="500 words",
50
+ )
51
+ ```
52
+
53
+ ### With Additional Instructions
54
+
55
+ ```python
56
+ report = await writer.write_report(
57
+ query="Explain machine learning",
58
+ findings=findings,
59
+ output_length="A comprehensive overview",
60
+ output_instructions="Use formal academic language and include examples",
61
+ )
62
+ ```
63
+
64
+ ### Integration with IterativeResearchFlow
65
+
66
+ The `WriterAgent` is automatically used by `IterativeResearchFlow`:
67
+
68
+ ```python
69
+ from src.agent_factory.agents import create_iterative_flow
70
+
71
+ flow = create_iterative_flow(max_iterations=5, max_time_minutes=10)
72
+ report = await flow.run(
73
+ query="What is quantum computing?",
74
+ output_length="A detailed explanation",
75
+ output_instructions="Include practical applications",
76
+ )
77
+ ```
78
+
79
+ ## LongWriterAgent
80
+
81
+ The `LongWriterAgent` iteratively writes report sections with proper citation management. It's used in deep research flows.
82
+
83
+ ### Basic Usage
84
+
85
+ ```python
86
+ from src.agent_factory.agents import create_long_writer_agent
87
+ from src.utils.models import ReportDraft, ReportDraftSection
88
+
89
+ # Create long writer agent
90
+ long_writer = create_long_writer_agent()
91
+
92
+ # Create report draft with sections
93
+ report_draft = ReportDraft(
94
+ sections=[
95
+ ReportDraftSection(
96
+ section_title="Introduction",
97
+ section_content="Draft content for introduction with [1].",
98
+ ),
99
+ ReportDraftSection(
100
+ section_title="Methods",
101
+ section_content="Draft content for methods with [2].",
102
+ ),
103
+ ReportDraftSection(
104
+ section_title="Results",
105
+ section_content="Draft content for results with [3].",
106
+ ),
107
+ ]
108
+ )
109
+
110
+ # Generate full report
111
+ report = await long_writer.write_report(
112
+ original_query="What are the main features of Python?",
113
+ report_title="Python Programming Language Overview",
114
+ report_draft=report_draft,
115
+ )
116
+
117
+ print(report)
118
+ ```
119
+
120
+ ### Writing Individual Sections
121
+
122
+ You can also write sections one at a time:
123
+
124
+ ```python
125
+ # Write first section
126
+ section_output = await long_writer.write_next_section(
127
+ original_query="What is Python?",
128
+ report_draft="", # No existing draft
129
+ next_section_title="Introduction",
130
+ next_section_draft="Python is a programming language...",
131
+ )
132
+
133
+ print(section_output.next_section_markdown)
134
+ print(section_output.references)
135
+
136
+ # Write second section with existing draft
137
+ section_output = await long_writer.write_next_section(
138
+ original_query="What is Python?",
139
+ report_draft="# Report\n\n## Introduction\n\nContent...",
140
+ next_section_title="Features",
141
+ next_section_draft="Python features include...",
142
+ )
143
+ ```
144
+
145
+ ### Integration with DeepResearchFlow
146
+
147
+ The `LongWriterAgent` is automatically used by `DeepResearchFlow`:
148
+
149
+ ```python
150
+ from src.agent_factory.agents import create_deep_flow
151
+
152
+ flow = create_deep_flow(
153
+ max_iterations=5,
154
+ max_time_minutes=10,
155
+ use_long_writer=True, # Use long writer (default)
156
+ )
157
+
158
+ report = await flow.run("What are the main features of Python programming language?")
159
+ ```
160
+
161
+ ## ProofreaderAgent
162
+
163
+ The `ProofreaderAgent` finalizes and polishes report drafts by removing duplicates, adding summaries, and refining wording.
164
+
165
+ ### Basic Usage
166
+
167
+ ```python
168
+ from src.agent_factory.agents import create_proofreader_agent
169
+ from src.utils.models import ReportDraft, ReportDraftSection
170
+
171
+ # Create proofreader agent
172
+ proofreader = create_proofreader_agent()
173
+
174
+ # Create report draft
175
+ report_draft = ReportDraft(
176
+ sections=[
177
+ ReportDraftSection(
178
+ section_title="Introduction",
179
+ section_content="Python is a programming language [1].",
180
+ ),
181
+ ReportDraftSection(
182
+ section_title="Features",
183
+ section_content="Python has many features [2].",
184
+ ),
185
+ ]
186
+ )
187
+
188
+ # Proofread and finalize
189
+ final_report = await proofreader.proofread(
190
+ query="What is Python?",
191
+ report_draft=report_draft,
192
+ )
193
+
194
+ print(final_report)
195
+ ```
196
+
197
+ ### Integration with DeepResearchFlow
198
+
199
+ Use `ProofreaderAgent` instead of `LongWriterAgent`:
200
+
201
+ ```python
202
+ from src.agent_factory.agents import create_deep_flow
203
+
204
+ flow = create_deep_flow(
205
+ max_iterations=5,
206
+ max_time_minutes=10,
207
+ use_long_writer=False, # Use proofreader instead
208
+ )
209
+
210
+ report = await flow.run("What are the main features of Python?")
211
+ ```
212
+
213
+ ## Error Handling
214
+
215
+ All writer agents include robust error handling:
216
+
217
+ ### Handling Empty Inputs
218
+
219
+ ```python
220
+ # WriterAgent handles empty findings gracefully
221
+ report = await writer.write_report(
222
+ query="Test query",
223
+ findings="", # Empty findings
224
+ )
225
+ # Returns a fallback report
226
+
227
+ # LongWriterAgent handles empty sections
228
+ report = await long_writer.write_report(
229
+ original_query="Test",
230
+ report_title="Test Report",
231
+ report_draft=ReportDraft(sections=[]), # Empty draft
232
+ )
233
+ # Returns minimal report
234
+
235
+ # ProofreaderAgent handles empty drafts
236
+ report = await proofreader.proofread(
237
+ query="Test",
238
+ report_draft=ReportDraft(sections=[]),
239
+ )
240
+ # Returns minimal report
241
+ ```
242
+
243
+ ### Retry Logic
244
+
245
+ All agents automatically retry on transient errors (timeouts, connection errors):
246
+
247
+ ```python
248
+ # Automatically retries up to 3 times on transient failures
249
+ report = await writer.write_report(
250
+ query="Test query",
251
+ findings=findings,
252
+ )
253
+ ```
254
+
255
+ ### Fallback Reports
256
+
257
+ If all retries fail, agents return fallback reports:
258
+
259
+ ```python
260
+ # Returns fallback report with query and findings
261
+ report = await writer.write_report(
262
+ query="Test query",
263
+ findings=findings,
264
+ )
265
+ # Fallback includes: "# Research Report\n\n## Query\n...\n\n## Findings\n..."
266
+ ```
267
+
268
+ ## Citation Validation
269
+
270
+ ### For Markdown Reports
271
+
272
+ Use the markdown citation validator:
273
+
274
+ ```python
275
+ from src.utils.citation_validator import validate_markdown_citations
276
+ from src.utils.models import Evidence, Citation
277
+
278
+ # Collect evidence during research
279
+ evidence = [
280
+ Evidence(
281
+ content="Paris is the capital of France",
282
+ citation=Citation(
283
+ source="web",
284
+ title="France Information",
285
+ url="https://example.com/france",
286
+ date="2024-01-01",
287
+ ),
288
+ ),
289
+ ]
290
+
291
+ # Generate report
292
+ report = await writer.write_report(query="What is the capital of France?", findings=findings)
293
+
294
+ # Validate citations
295
+ validated_report, removed_count = validate_markdown_citations(report, evidence)
296
+
297
+ if removed_count > 0:
298
+ print(f"Removed {removed_count} invalid citations")
299
+ ```
300
+
301
+ ### For ResearchReport Objects
302
+
303
+ Use the structured citation validator:
304
+
305
+ ```python
306
+ from src.utils.citation_validator import validate_references
307
+
308
+ # For ResearchReport objects (from ReportAgent)
309
+ validated_report = validate_references(report, evidence)
310
+ ```
311
+
312
+ ## Custom Model Configuration
313
+
314
+ All writer agents support custom model configuration:
315
+
316
+ ```python
317
+ from pydantic_ai import Model
318
+
319
+ # Create custom model
320
+ custom_model = Model("openai", "gpt-4")
321
+
322
+ # Use with writer agents
323
+ writer = create_writer_agent(model=custom_model)
324
+ long_writer = create_long_writer_agent(model=custom_model)
325
+ proofreader = create_proofreader_agent(model=custom_model)
326
+ ```
327
+
328
+ ## Best Practices
329
+
330
+ 1. **Use WriterAgent for simple reports** - When you have findings as a string and need a quick report
331
+ 2. **Use LongWriterAgent for structured reports** - When you need multiple sections with proper citation management
332
+ 3. **Use ProofreaderAgent for final polish** - When you have draft sections and need a polished final report
333
+ 4. **Validate citations** - Always validate citations against collected evidence
334
+ 5. **Handle errors gracefully** - All agents return fallback reports on failure
335
+ 6. **Specify output length** - Use `output_length` parameter to control report size
336
+ 7. **Provide instructions** - Use `output_instructions` for specific formatting requirements
337
+
338
+ ## Integration Examples
339
+
340
+ ### Full Iterative Research Flow
341
+
342
+ ```python
343
+ from src.agent_factory.agents import create_iterative_flow
344
+
345
+ flow = create_iterative_flow(
346
+ max_iterations=5,
347
+ max_time_minutes=10,
348
+ )
349
+
350
+ report = await flow.run(
351
+ query="What is machine learning?",
352
+ output_length="A comprehensive 1000-word explanation",
353
+ output_instructions="Include practical examples and use cases",
354
+ )
355
+ ```
356
+
357
+ ### Full Deep Research Flow with Long Writer
358
+
359
+ ```python
360
+ from src.agent_factory.agents import create_deep_flow
361
+
362
+ flow = create_deep_flow(
363
+ max_iterations=5,
364
+ max_time_minutes=10,
365
+ use_long_writer=True,
366
+ )
367
+
368
+ report = await flow.run("What are the main features of Python programming language?")
369
+ ```
370
+
371
+ ### Full Deep Research Flow with Proofreader
372
+
373
+ ```python
374
+ from src.agent_factory.agents import create_deep_flow
375
+
376
+ flow = create_deep_flow(
377
+ max_iterations=5,
378
+ max_time_minutes=10,
379
+ use_long_writer=False, # Use proofreader
380
+ )
381
+
382
+ report = await flow.run("Explain quantum computing basics")
383
+ ```
384
+
385
+ ## Troubleshooting
386
+
387
+ ### Empty Reports
388
+
389
+ If you get empty reports, check:
390
+ - Input validation logs (agents log warnings for empty inputs)
391
+ - LLM API key configuration
392
+ - Network connectivity
393
+
394
+ ### Citation Issues
395
+
396
+ If citations are missing or invalid:
397
+ - Use `validate_markdown_citations()` to check citations
398
+ - Ensure Evidence objects are properly collected during research
399
+ - Check that URLs in findings match Evidence URLs
400
+
401
+ ### Performance Issues
402
+
403
+ For large reports:
404
+ - Use `LongWriterAgent` for better section management
405
+ - Consider truncating very long findings (agents do this automatically)
406
+ - Use appropriate `max_time_minutes` settings
407
+
408
+ ## See Also
409
+
410
+ - [Research Flows Documentation](../orchestrator/research_flows.md)
411
+ - [Citation Validation](../utils/citation_validation.md)
412
+ - [Agent Factory](../agent_factory/agents.md)
413
+
414
+
415
+
416
+
417
+
418
+
419
+
420
+
421
+
422
+
423
+
424
+
425
+
docs/implementation/02_phase_search.md CHANGED
@@ -4,6 +4,8 @@
4
  **Philosophy**: "Real data, mocked connections."
5
  **Prerequisite**: Phase 1 complete (all tests passing)
6
 
 
 
7
  ---
8
 
9
  ## 1. The Slice Definition
@@ -12,17 +14,20 @@ This slice covers:
12
  1. **Input**: A string query (e.g., "metformin Alzheimer's disease").
13
  2. **Process**:
14
  - Fetch from PubMed (E-utilities API).
15
- - Fetch from Web (DuckDuckGo).
16
  - Normalize results into `Evidence` models.
17
  3. **Output**: A list of `Evidence` objects.
18
 
19
  **Files to Create**:
20
  - `src/utils/models.py` - Pydantic models (Evidence, Citation, SearchResult)
21
  - `src/tools/pubmed.py` - PubMed E-utilities tool
22
- - `src/tools/websearch.py` - DuckDuckGo search tool
23
  - `src/tools/search_handler.py` - Orchestrates multiple tools
24
  - `src/tools/__init__.py` - Exports
25
 
 
 
 
26
  ---
27
 
28
  ## 2. PubMed E-utilities API Reference
@@ -767,17 +772,23 @@ async def test_pubmed_live_search():
767
 
768
  ## 8. Implementation Checklist
769
 
770
- - [ ] Create `src/utils/models.py` with all Pydantic models (Evidence, Citation, SearchResult)
771
- - [ ] Create `src/tools/__init__.py` with SearchTool Protocol and exports
772
- - [ ] Implement `src/tools/pubmed.py` with PubMedTool class
773
- - [ ] Implement `src/tools/websearch.py` with WebTool class
774
- - [ ] Create `src/tools/search_handler.py` with SearchHandler class
775
- - [ ] Write tests in `tests/unit/tools/test_pubmed.py`
776
- - [ ] Write tests in `tests/unit/tools/test_websearch.py`
777
- - [ ] Write tests in `tests/unit/tools/test_search_handler.py`
778
- - [ ] Run `uv run pytest tests/unit/tools/ -v` — **ALL TESTS MUST PASS**
779
  - [ ] (Optional) Run integration test: `uv run pytest -m integration`
780
- - [ ] Commit: `git commit -m "feat: phase 2 search slice complete"`
 
 
 
 
 
 
781
 
782
  ---
783
 
@@ -785,20 +796,19 @@ async def test_pubmed_live_search():
785
 
786
  Phase 2 is **COMPLETE** when:
787
 
788
- 1. All unit tests pass: `uv run pytest tests/unit/tools/ -v`
789
- 2. `SearchHandler` can execute with both tools
790
- 3. Graceful degradation: if PubMed fails, WebTool results still return
791
- 4. Rate limiting is enforced (verify no 429 errors)
792
- 5. Can run this in Python REPL:
793
 
794
  ```python
795
  import asyncio
796
  from src.tools.pubmed import PubMedTool
797
- from src.tools.websearch import WebTool
798
  from src.tools.search_handler import SearchHandler
799
 
800
  async def test():
801
- handler = SearchHandler([PubMedTool(), WebTool()])
802
  result = await handler.execute("metformin alzheimer")
803
  print(f"Found {result.total_found} results")
804
  for e in result.evidence[:3]:
@@ -807,4 +817,6 @@ async def test():
807
  asyncio.run(test())
808
  ```
809
 
 
 
810
  **Proceed to Phase 3 ONLY after all checkboxes are complete.**
 
4
  **Philosophy**: "Real data, mocked connections."
5
  **Prerequisite**: Phase 1 complete (all tests passing)
6
 
7
+ > **⚠️ Implementation Note (2025-01-27)**: The DuckDuckGo WebTool specified in this phase was removed in favor of the Europe PMC tool (see Phase 11). Europe PMC provides better coverage for biomedical research by including preprints, peer-reviewed articles, and patents. The current implementation uses PubMed, ClinicalTrials.gov, and Europe PMC as search sources.
8
+
9
  ---
10
 
11
  ## 1. The Slice Definition
 
14
  1. **Input**: A string query (e.g., "metformin Alzheimer's disease").
15
  2. **Process**:
16
  - Fetch from PubMed (E-utilities API).
17
+ - ~~Fetch from Web (DuckDuckGo).~~ **REMOVED** - Replaced by Europe PMC in Phase 11
18
  - Normalize results into `Evidence` models.
19
  3. **Output**: A list of `Evidence` objects.
20
 
21
  **Files to Create**:
22
  - `src/utils/models.py` - Pydantic models (Evidence, Citation, SearchResult)
23
  - `src/tools/pubmed.py` - PubMed E-utilities tool
24
+ - ~~`src/tools/websearch.py` - DuckDuckGo search tool~~ **REMOVED** - See Phase 11 for Europe PMC replacement
25
  - `src/tools/search_handler.py` - Orchestrates multiple tools
26
  - `src/tools/__init__.py` - Exports
27
 
28
+ **Additional Files (Post-Phase 2 Enhancements)**:
29
+ - `src/tools/query_utils.py` - Query preprocessing (removes question words, expands medical synonyms)
30
+
31
  ---
32
 
33
  ## 2. PubMed E-utilities API Reference
 
772
 
773
  ## 8. Implementation Checklist
774
 
775
+ - [x] Create `src/utils/models.py` with all Pydantic models (Evidence, Citation, SearchResult) - **COMPLETE**
776
+ - [x] Create `src/tools/__init__.py` with SearchTool Protocol and exports - **COMPLETE**
777
+ - [x] Implement `src/tools/pubmed.py` with PubMedTool class - **COMPLETE**
778
+ - [ ] ~~Implement `src/tools/websearch.py` with WebTool class~~ - **REMOVED** (replaced by Europe PMC in Phase 11)
779
+ - [x] Create `src/tools/search_handler.py` with SearchHandler class - **COMPLETE**
780
+ - [x] Write tests in `tests/unit/tools/test_pubmed.py` - **COMPLETE** (basic tests)
781
+ - [ ] Write tests in `tests/unit/tools/test_websearch.py` - **N/A** (WebTool removed)
782
+ - [x] Write tests in `tests/unit/tools/test_search_handler.py` - **COMPLETE** (basic tests)
783
+ - [x] Run `uv run pytest tests/unit/tools/ -v` — **ALL TESTS MUST PASS** - **PASSING**
784
  - [ ] (Optional) Run integration test: `uv run pytest -m integration`
785
+ - [ ] Add edge case tests (rate limiting, error handling, timeouts) - **PENDING**
786
+ - [ ] Commit: `git commit -m "feat: phase 2 search slice complete"` - **DONE**
787
+
788
+ **Post-Phase 2 Enhancements**:
789
+ - [x] Query preprocessing (`src/tools/query_utils.py`) - **ADDED**
790
+ - [x] Europe PMC tool (Phase 11) - **ADDED**
791
+ - [x] ClinicalTrials tool (Phase 10) - **ADDED**
792
 
793
  ---
794
 
 
796
 
797
  Phase 2 is **COMPLETE** when:
798
 
799
+ 1. All unit tests pass: `uv run pytest tests/unit/tools/ -v` - **PASSING**
800
+ 2. `SearchHandler` can execute with search tools - **WORKING**
801
+ 3. Graceful degradation: if one tool fails, other tools still return results - **IMPLEMENTED**
802
+ 4. Rate limiting is enforced (verify no 429 errors) - **IMPLEMENTED**
803
+ 5. Can run this in Python REPL:
804
 
805
  ```python
806
  import asyncio
807
  from src.tools.pubmed import PubMedTool
 
808
  from src.tools.search_handler import SearchHandler
809
 
810
  async def test():
811
+ handler = SearchHandler([PubMedTool()])
812
  result = await handler.execute("metformin alzheimer")
813
  print(f"Found {result.total_found} results")
814
  for e in result.evidence[:3]:
 
817
  asyncio.run(test())
818
  ```
819
 
820
+ **Note**: WebTool was removed in favor of Europe PMC (Phase 11). The current implementation uses PubMed as the primary Phase 2 tool, with Europe PMC and ClinicalTrials added in later phases.
821
+
822
  **Proceed to Phase 3 ONLY after all checkboxes are complete.**
examples/rate_limiting_demo.py CHANGED
@@ -22,7 +22,7 @@ async def test_basic_limiter():
22
  for i in range(6):
23
  await limiter.acquire()
24
  elapsed = time.monotonic() - start
25
- print(f" Request {i+1} at {elapsed:.2f}s")
26
 
27
  total = time.monotonic() - start
28
  print(f" Total time for 6 requests: {total:.2f}s (expected ~2s)")
 
22
  for i in range(6):
23
  await limiter.acquire()
24
  elapsed = time.monotonic() - start
25
+ print(f" Request {i + 1} at {elapsed:.2f}s")
26
 
27
  total = time.monotonic() - start
28
  print(f" Total time for 6 requests: {total:.2f}s (expected ~2s)")
main.py DELETED
@@ -1,6 +0,0 @@
1
- def main():
2
- print("Hello from deepcritical!")
3
-
4
-
5
- if __name__ == "__main__":
6
- main()
 
 
 
 
 
 
 
pyproject.toml CHANGED
@@ -24,8 +24,13 @@ dependencies = [
24
  "tenacity>=8.2", # Retry logic
25
  "structlog>=24.1", # Structured logging
26
  "requests>=2.32.5", # ClinicalTrials.gov (httpx blocked by WAF)
 
27
  "limits>=3.0", # Rate limiting
28
  "duckduckgo-search>=5.0", # Web search
 
 
 
 
29
  ]
30
 
31
  [project.optional-dependencies]
@@ -50,6 +55,7 @@ magentic = [
50
  embeddings = [
51
  "chromadb>=0.4.0",
52
  "sentence-transformers>=2.2.0",
 
53
  ]
54
  modal = [
55
  # Mario's Modal code execution + LlamaIndex RAG
@@ -59,6 +65,7 @@ modal = [
59
  "llama-index-embeddings-openai",
60
  "llama-index-vector-stores-chroma",
61
  "chromadb>=0.4.0",
 
62
  ]
63
 
64
  [build-system]
@@ -72,7 +79,13 @@ packages = ["src"]
72
  [tool.ruff]
73
  line-length = 100
74
  target-version = "py311"
75
- src = ["src", "tests"]
 
 
 
 
 
 
76
 
77
  [tool.ruff.lint]
78
  select = [
@@ -93,6 +106,7 @@ ignore = [
93
  "PLW0603", # Global statement (singleton pattern for Modal)
94
  "PLC0415", # Lazy imports for optional dependencies
95
  "E402", # Module level import not at top (needed for pytest.importorskip)
 
96
  "RUF100", # Unused noqa (version differences between local/CI)
97
  ]
98
 
@@ -107,9 +121,12 @@ ignore_missing_imports = true
107
  disallow_untyped_defs = true
108
  warn_return_any = true
109
  warn_unused_ignores = false
 
 
110
  exclude = [
111
  "^reference_repos/",
112
  "^examples/",
 
113
  ]
114
 
115
  # ============== PYTEST CONFIG ==============
@@ -120,11 +137,17 @@ addopts = [
120
  "-v",
121
  "--tb=short",
122
  "--strict-markers",
 
 
123
  ]
124
  markers = [
125
  "unit: Unit tests (mocked)",
126
  "integration: Integration tests (real APIs)",
127
  "slow: Slow tests",
 
 
 
 
128
  ]
129
 
130
  # ============== COVERAGE CONFIG ==============
@@ -139,5 +162,11 @@ exclude_lines = [
139
  "raise NotImplementedError",
140
  ]
141
 
 
 
 
 
 
 
142
  # Note: agent-framework-core is optional for magentic mode (multi-agent orchestration)
143
  # Version pinned to 1.0.0b* to avoid breaking changes. CI skips tests via pytest.importorskip
 
24
  "tenacity>=8.2", # Retry logic
25
  "structlog>=24.1", # Structured logging
26
  "requests>=2.32.5", # ClinicalTrials.gov (httpx blocked by WAF)
27
+ "pydantic-graph>=1.22.0",
28
  "limits>=3.0", # Rate limiting
29
  "duckduckgo-search>=5.0", # Web search
30
+ "llama-index-llms-huggingface>=0.6.1",
31
+ "llama-index-llms-huggingface-api>=0.6.1",
32
+ "llama-index-vector-stores-chroma>=0.5.3",
33
+ "llama-index>=0.14.8",
34
  ]
35
 
36
  [project.optional-dependencies]
 
55
  embeddings = [
56
  "chromadb>=0.4.0",
57
  "sentence-transformers>=2.2.0",
58
+ "numpy<2.0", # chromadb compatibility: uses np.float_ removed in NumPy 2.0
59
  ]
60
  modal = [
61
  # Mario's Modal code execution + LlamaIndex RAG
 
65
  "llama-index-embeddings-openai",
66
  "llama-index-vector-stores-chroma",
67
  "chromadb>=0.4.0",
68
+ "numpy<2.0", # chromadb compatibility: uses np.float_ removed in NumPy 2.0
69
  ]
70
 
71
  [build-system]
 
79
  [tool.ruff]
80
  line-length = 100
81
  target-version = "py311"
82
+ src = ["src"]
83
+ exclude = [
84
+ "tests/",
85
+ "examples/",
86
+ "reference_repos/",
87
+ "folder/",
88
+ ]
89
 
90
  [tool.ruff.lint]
91
  select = [
 
106
  "PLW0603", # Global statement (singleton pattern for Modal)
107
  "PLC0415", # Lazy imports for optional dependencies
108
  "E402", # Module level import not at top (needed for pytest.importorskip)
109
+ "E501", # Line too long (ignore line length violations)
110
  "RUF100", # Unused noqa (version differences between local/CI)
111
  ]
112
 
 
121
  disallow_untyped_defs = true
122
  warn_return_any = true
123
  warn_unused_ignores = false
124
+ explicit_package_bases = true
125
+ mypy_path = "."
126
  exclude = [
127
  "^reference_repos/",
128
  "^examples/",
129
+ "^folder/",
130
  ]
131
 
132
  # ============== PYTEST CONFIG ==============
 
137
  "-v",
138
  "--tb=short",
139
  "--strict-markers",
140
+ "-p",
141
+ "no:logfire",
142
  ]
143
  markers = [
144
  "unit: Unit tests (mocked)",
145
  "integration: Integration tests (real APIs)",
146
  "slow: Slow tests",
147
+ "openai: Tests that require OpenAI API key",
148
+ "huggingface: Tests that require HuggingFace API key or use HuggingFace models",
149
+ "embedding_provider: Tests that require API-based embedding providers (OpenAI, etc.)",
150
+ "local_embeddings: Tests that use local embeddings (sentence-transformers, ChromaDB)",
151
  ]
152
 
153
  # ============== COVERAGE CONFIG ==============
 
162
  "raise NotImplementedError",
163
  ]
164
 
165
+ [dependency-groups]
166
+ dev = [
167
+ "structlog>=25.5.0",
168
+ "ty>=0.0.1a28",
169
+ ]
170
+
171
  # Note: agent-framework-core is optional for magentic mode (multi-agent orchestration)
172
  # Version pinned to 1.0.0b* to avoid breaking changes. CI skips tests via pytest.importorskip
requirements.txt CHANGED
@@ -3,6 +3,7 @@ pydantic>=2.7
3
  pydantic-settings>=2.2
4
  pydantic-ai>=0.0.16
5
 
 
6
  # AI Providers
7
  openai>=1.0.0
8
  anthropic>=0.18.0
@@ -34,6 +35,7 @@ modal>=0.63.0
34
  # Optional: LlamaIndex RAG
35
  llama-index>=0.11.0
36
  llama-index-llms-openai
 
37
  llama-index-embeddings-openai
38
  llama-index-vector-stores-chroma
39
  chromadb>=0.4.0
 
3
  pydantic-settings>=2.2
4
  pydantic-ai>=0.0.16
5
 
6
+
7
  # AI Providers
8
  openai>=1.0.0
9
  anthropic>=0.18.0
 
35
  # Optional: LlamaIndex RAG
36
  llama-index>=0.11.0
37
  llama-index-llms-openai
38
+ llama-index-llms-huggingface # Optional: For HuggingFace LLM support in RAG
39
  llama-index-embeddings-openai
40
  llama-index-vector-stores-chroma
41
  chromadb>=0.4.0
src/agent_factory/agents.py CHANGED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Agent factory functions for creating research agents.
2
+
3
+ Provides factory functions for creating all Pydantic AI agents used in
4
+ the research workflows, following the pattern from judges.py.
5
+ """
6
+
7
+ from typing import TYPE_CHECKING, Any
8
+
9
+ import structlog
10
+
11
+ from src.utils.config import settings
12
+ from src.utils.exceptions import ConfigurationError
13
+
14
+ if TYPE_CHECKING:
15
+ from src.agent_factory.graph_builder import GraphBuilder
16
+ from src.agents.input_parser import InputParserAgent
17
+ from src.agents.knowledge_gap import KnowledgeGapAgent
18
+ from src.agents.long_writer import LongWriterAgent
19
+ from src.agents.proofreader import ProofreaderAgent
20
+ from src.agents.thinking import ThinkingAgent
21
+ from src.agents.tool_selector import ToolSelectorAgent
22
+ from src.agents.writer import WriterAgent
23
+ from src.orchestrator.graph_orchestrator import GraphOrchestrator
24
+ from src.orchestrator.planner_agent import PlannerAgent
25
+ from src.orchestrator.research_flow import DeepResearchFlow, IterativeResearchFlow
26
+
27
+ logger = structlog.get_logger()
28
+
29
+
30
+ def create_input_parser_agent(model: Any | None = None) -> "InputParserAgent":
31
+ """
32
+ Create input parser agent for query analysis and research mode detection.
33
+
34
+ Args:
35
+ model: Optional Pydantic AI model. If None, uses settings default.
36
+
37
+ Returns:
38
+ Configured InputParserAgent instance
39
+
40
+ Raises:
41
+ ConfigurationError: If required API keys are missing
42
+ """
43
+ from src.agents.input_parser import create_input_parser_agent as _create_agent
44
+
45
+ try:
46
+ logger.debug("Creating input parser agent")
47
+ return _create_agent(model=model)
48
+ except Exception as e:
49
+ logger.error("Failed to create input parser agent", error=str(e))
50
+ raise ConfigurationError(f"Failed to create input parser agent: {e}") from e
51
+
52
+
53
+ def create_planner_agent(model: Any | None = None) -> "PlannerAgent":
54
+ """
55
+ Create planner agent with web search and crawl tools.
56
+
57
+ Args:
58
+ model: Optional Pydantic AI model. If None, uses settings default.
59
+
60
+ Returns:
61
+ Configured PlannerAgent instance
62
+
63
+ Raises:
64
+ ConfigurationError: If required API keys are missing
65
+ """
66
+ # Lazy import to avoid circular dependencies
67
+ from src.orchestrator.planner_agent import create_planner_agent as _create_planner_agent
68
+
69
+ try:
70
+ logger.debug("Creating planner agent")
71
+ return _create_planner_agent(model=model)
72
+ except Exception as e:
73
+ logger.error("Failed to create planner agent", error=str(e))
74
+ raise ConfigurationError(f"Failed to create planner agent: {e}") from e
75
+
76
+
77
+ def create_knowledge_gap_agent(model: Any | None = None) -> "KnowledgeGapAgent":
78
+ """
79
+ Create knowledge gap agent for evaluating research completeness.
80
+
81
+ Args:
82
+ model: Optional Pydantic AI model. If None, uses settings default.
83
+
84
+ Returns:
85
+ Configured KnowledgeGapAgent instance
86
+
87
+ Raises:
88
+ ConfigurationError: If required API keys are missing
89
+ """
90
+ from src.agents.knowledge_gap import create_knowledge_gap_agent as _create_agent
91
+
92
+ try:
93
+ logger.debug("Creating knowledge gap agent")
94
+ return _create_agent(model=model)
95
+ except Exception as e:
96
+ logger.error("Failed to create knowledge gap agent", error=str(e))
97
+ raise ConfigurationError(f"Failed to create knowledge gap agent: {e}") from e
98
+
99
+
100
+ def create_tool_selector_agent(model: Any | None = None) -> "ToolSelectorAgent":
101
+ """
102
+ Create tool selector agent for choosing tools to address gaps.
103
+
104
+ Args:
105
+ model: Optional Pydantic AI model. If None, uses settings default.
106
+
107
+ Returns:
108
+ Configured ToolSelectorAgent instance
109
+
110
+ Raises:
111
+ ConfigurationError: If required API keys are missing
112
+ """
113
+ from src.agents.tool_selector import create_tool_selector_agent as _create_agent
114
+
115
+ try:
116
+ logger.debug("Creating tool selector agent")
117
+ return _create_agent(model=model)
118
+ except Exception as e:
119
+ logger.error("Failed to create tool selector agent", error=str(e))
120
+ raise ConfigurationError(f"Failed to create tool selector agent: {e}") from e
121
+
122
+
123
+ def create_thinking_agent(model: Any | None = None) -> "ThinkingAgent":
124
+ """
125
+ Create thinking agent for generating observations.
126
+
127
+ Args:
128
+ model: Optional Pydantic AI model. If None, uses settings default.
129
+
130
+ Returns:
131
+ Configured ThinkingAgent instance
132
+
133
+ Raises:
134
+ ConfigurationError: If required API keys are missing
135
+ """
136
+ from src.agents.thinking import create_thinking_agent as _create_agent
137
+
138
+ try:
139
+ logger.debug("Creating thinking agent")
140
+ return _create_agent(model=model)
141
+ except Exception as e:
142
+ logger.error("Failed to create thinking agent", error=str(e))
143
+ raise ConfigurationError(f"Failed to create thinking agent: {e}") from e
144
+
145
+
146
+ def create_writer_agent(model: Any | None = None) -> "WriterAgent":
147
+ """
148
+ Create writer agent for generating final reports.
149
+
150
+ Args:
151
+ model: Optional Pydantic AI model. If None, uses settings default.
152
+
153
+ Returns:
154
+ Configured WriterAgent instance
155
+
156
+ Raises:
157
+ ConfigurationError: If required API keys are missing
158
+ """
159
+ from src.agents.writer import create_writer_agent as _create_agent
160
+
161
+ try:
162
+ logger.debug("Creating writer agent")
163
+ return _create_agent(model=model)
164
+ except Exception as e:
165
+ logger.error("Failed to create writer agent", error=str(e))
166
+ raise ConfigurationError(f"Failed to create writer agent: {e}") from e
167
+
168
+
169
+ def create_long_writer_agent(model: Any | None = None) -> "LongWriterAgent":
170
+ """
171
+ Create long writer agent for iteratively writing report sections.
172
+
173
+ Args:
174
+ model: Optional Pydantic AI model. If None, uses settings default.
175
+
176
+ Returns:
177
+ Configured LongWriterAgent instance
178
+
179
+ Raises:
180
+ ConfigurationError: If required API keys are missing
181
+ """
182
+ from src.agents.long_writer import create_long_writer_agent as _create_agent
183
+
184
+ try:
185
+ logger.debug("Creating long writer agent")
186
+ return _create_agent(model=model)
187
+ except Exception as e:
188
+ logger.error("Failed to create long writer agent", error=str(e))
189
+ raise ConfigurationError(f"Failed to create long writer agent: {e}") from e
190
+
191
+
192
+ def create_proofreader_agent(model: Any | None = None) -> "ProofreaderAgent":
193
+ """
194
+ Create proofreader agent for finalizing report drafts.
195
+
196
+ Args:
197
+ model: Optional Pydantic AI model. If None, uses settings default.
198
+
199
+ Returns:
200
+ Configured ProofreaderAgent instance
201
+
202
+ Raises:
203
+ ConfigurationError: If required API keys are missing
204
+ """
205
+ from src.agents.proofreader import create_proofreader_agent as _create_agent
206
+
207
+ try:
208
+ logger.debug("Creating proofreader agent")
209
+ return _create_agent(model=model)
210
+ except Exception as e:
211
+ logger.error("Failed to create proofreader agent", error=str(e))
212
+ raise ConfigurationError(f"Failed to create proofreader agent: {e}") from e
213
+
214
+
215
+ def create_iterative_flow(
216
+ max_iterations: int = 5,
217
+ max_time_minutes: int = 10,
218
+ verbose: bool = True,
219
+ use_graph: bool | None = None,
220
+ ) -> "IterativeResearchFlow":
221
+ """
222
+ Create iterative research flow.
223
+
224
+ Args:
225
+ max_iterations: Maximum number of iterations
226
+ max_time_minutes: Maximum time in minutes
227
+ verbose: Whether to log progress
228
+ use_graph: Whether to use graph execution. If None, reads from settings.use_graph_execution
229
+
230
+ Returns:
231
+ Configured IterativeResearchFlow instance
232
+ """
233
+ from src.orchestrator.research_flow import IterativeResearchFlow
234
+
235
+ try:
236
+ # Use settings default if not explicitly provided
237
+ if use_graph is None:
238
+ use_graph = settings.use_graph_execution
239
+
240
+ logger.debug("Creating iterative research flow", use_graph=use_graph)
241
+ return IterativeResearchFlow(
242
+ max_iterations=max_iterations,
243
+ max_time_minutes=max_time_minutes,
244
+ verbose=verbose,
245
+ use_graph=use_graph,
246
+ )
247
+ except Exception as e:
248
+ logger.error("Failed to create iterative flow", error=str(e))
249
+ raise ConfigurationError(f"Failed to create iterative flow: {e}") from e
250
+
251
+
252
+ def create_deep_flow(
253
+ max_iterations: int = 5,
254
+ max_time_minutes: int = 10,
255
+ verbose: bool = True,
256
+ use_long_writer: bool = True,
257
+ use_graph: bool | None = None,
258
+ ) -> "DeepResearchFlow":
259
+ """
260
+ Create deep research flow.
261
+
262
+ Args:
263
+ max_iterations: Maximum iterations per section
264
+ max_time_minutes: Maximum time per section
265
+ verbose: Whether to log progress
266
+ use_long_writer: Whether to use long writer (True) or proofreader (False)
267
+ use_graph: Whether to use graph execution. If None, reads from settings.use_graph_execution
268
+
269
+ Returns:
270
+ Configured DeepResearchFlow instance
271
+ """
272
+ from src.orchestrator.research_flow import DeepResearchFlow
273
+
274
+ try:
275
+ # Use settings default if not explicitly provided
276
+ if use_graph is None:
277
+ use_graph = settings.use_graph_execution
278
+
279
+ logger.debug("Creating deep research flow", use_graph=use_graph)
280
+ return DeepResearchFlow(
281
+ max_iterations=max_iterations,
282
+ max_time_minutes=max_time_minutes,
283
+ verbose=verbose,
284
+ use_long_writer=use_long_writer,
285
+ use_graph=use_graph,
286
+ )
287
+ except Exception as e:
288
+ logger.error("Failed to create deep flow", error=str(e))
289
+ raise ConfigurationError(f"Failed to create deep flow: {e}") from e
290
+
291
+
292
+ def create_graph_orchestrator(
293
+ mode: str = "auto",
294
+ max_iterations: int = 5,
295
+ max_time_minutes: int = 10,
296
+ use_graph: bool = True,
297
+ ) -> "GraphOrchestrator":
298
+ """
299
+ Create graph orchestrator.
300
+
301
+ Args:
302
+ mode: Research mode ("iterative", "deep", or "auto")
303
+ max_iterations: Maximum iterations per loop
304
+ max_time_minutes: Maximum time per loop
305
+ use_graph: Whether to use graph execution (True) or agent chains (False)
306
+
307
+ Returns:
308
+ Configured GraphOrchestrator instance
309
+ """
310
+ from src.orchestrator.graph_orchestrator import create_graph_orchestrator as _create
311
+
312
+ try:
313
+ logger.debug("Creating graph orchestrator", mode=mode, use_graph=use_graph)
314
+ return _create(
315
+ mode=mode, # type: ignore[arg-type]
316
+ max_iterations=max_iterations,
317
+ max_time_minutes=max_time_minutes,
318
+ use_graph=use_graph,
319
+ )
320
+ except Exception as e:
321
+ logger.error("Failed to create graph orchestrator", error=str(e))
322
+ raise ConfigurationError(f"Failed to create graph orchestrator: {e}") from e
323
+
324
+
325
+ def create_graph_builder() -> "GraphBuilder":
326
+ """
327
+ Create a graph builder instance.
328
+
329
+ Returns:
330
+ GraphBuilder instance
331
+ """
332
+ from src.agent_factory.graph_builder import GraphBuilder
333
+
334
+ try:
335
+ logger.debug("Creating graph builder")
336
+ return GraphBuilder()
337
+ except Exception as e:
338
+ logger.error("Failed to create graph builder", error=str(e))
339
+ raise ConfigurationError(f"Failed to create graph builder: {e}") from e
src/agent_factory/graph_builder.py ADDED
@@ -0,0 +1,608 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Graph builder utilities for constructing research workflow graphs.
2
+
3
+ Provides classes and utilities for building graph-based orchestration systems
4
+ using Pydantic AI agents as nodes.
5
+ """
6
+
7
+ from collections.abc import Callable
8
+ from typing import TYPE_CHECKING, Any, Literal
9
+
10
+ import structlog
11
+ from pydantic import BaseModel, Field
12
+
13
+ if TYPE_CHECKING:
14
+ from pydantic_ai import Agent
15
+
16
+ from src.middleware.state_machine import WorkflowState
17
+
18
+ logger = structlog.get_logger()
19
+
20
+
21
+ # ============================================================================
22
+ # Graph Node Models
23
+ # ============================================================================
24
+
25
+
26
+ class GraphNode(BaseModel):
27
+ """Base class for graph nodes."""
28
+
29
+ node_id: str = Field(description="Unique identifier for the node")
30
+ node_type: Literal["agent", "state", "decision", "parallel"] = Field(description="Type of node")
31
+ description: str = Field(default="", description="Human-readable description of the node")
32
+
33
+ model_config = {"frozen": True}
34
+
35
+
36
+ class AgentNode(GraphNode):
37
+ """Node that executes a Pydantic AI agent."""
38
+
39
+ node_type: Literal["agent"] = "agent"
40
+ agent: Any = Field(description="Pydantic AI agent to execute")
41
+ input_transformer: Callable[[Any], Any] | None = Field(
42
+ default=None, description="Transform input before passing to agent"
43
+ )
44
+ output_transformer: Callable[[Any], Any] | None = Field(
45
+ default=None, description="Transform output after agent execution"
46
+ )
47
+
48
+ model_config = {"arbitrary_types_allowed": True}
49
+
50
+
51
+ class StateNode(GraphNode):
52
+ """Node that updates or reads workflow state."""
53
+
54
+ node_type: Literal["state"] = "state"
55
+ state_updater: Callable[[Any, Any], Any] = Field(
56
+ description="Function to update workflow state"
57
+ )
58
+ state_reader: Callable[[Any], Any] | None = Field(
59
+ default=None, description="Function to read state (optional)"
60
+ )
61
+
62
+ model_config = {"arbitrary_types_allowed": True}
63
+
64
+
65
+ class DecisionNode(GraphNode):
66
+ """Node that makes routing decisions based on conditions."""
67
+
68
+ node_type: Literal["decision"] = "decision"
69
+ decision_function: Callable[[Any], str] = Field(
70
+ description="Function that returns next node ID based on input"
71
+ )
72
+ options: list[str] = Field(description="List of possible next node IDs", min_length=1)
73
+
74
+ model_config = {"arbitrary_types_allowed": True}
75
+
76
+
77
+ class ParallelNode(GraphNode):
78
+ """Node that executes multiple nodes in parallel."""
79
+
80
+ node_type: Literal["parallel"] = "parallel"
81
+ parallel_nodes: list[str] = Field(
82
+ description="List of node IDs to run in parallel", min_length=0
83
+ )
84
+ aggregator: Callable[[list[Any]], Any] | None = Field(
85
+ default=None, description="Function to aggregate parallel results"
86
+ )
87
+
88
+ model_config = {"arbitrary_types_allowed": True}
89
+
90
+
91
+ # ============================================================================
92
+ # Graph Edge Models
93
+ # ============================================================================
94
+
95
+
96
+ class GraphEdge(BaseModel):
97
+ """Base class for graph edges."""
98
+
99
+ from_node: str = Field(description="Source node ID")
100
+ to_node: str = Field(description="Target node ID")
101
+ condition: Callable[[Any], bool] | None = Field(
102
+ default=None, description="Optional condition function"
103
+ )
104
+ weight: float = Field(default=1.0, description="Edge weight for routing decisions")
105
+
106
+ model_config = {"arbitrary_types_allowed": True}
107
+
108
+
109
+ class SequentialEdge(GraphEdge):
110
+ """Edge that is always traversed (no condition)."""
111
+
112
+ condition: None = None
113
+
114
+
115
+ class ConditionalEdge(GraphEdge):
116
+ """Edge that is traversed based on a condition."""
117
+
118
+ condition: Callable[[Any], bool] = Field(description="Required condition function")
119
+ condition_description: str = Field(
120
+ default="", description="Human-readable description of condition"
121
+ )
122
+
123
+
124
+ class ParallelEdge(GraphEdge):
125
+ """Edge used for parallel execution branches."""
126
+
127
+ condition: None = None
128
+
129
+
130
+ # ============================================================================
131
+ # Research Graph Class
132
+ # ============================================================================
133
+
134
+
135
+ class ResearchGraph(BaseModel):
136
+ """Represents a research workflow graph with nodes and edges."""
137
+
138
+ nodes: dict[str, GraphNode] = Field(default_factory=dict, description="All nodes in the graph")
139
+ edges: dict[str, list[GraphEdge]] = Field(
140
+ default_factory=dict, description="Edges by source node ID"
141
+ )
142
+ entry_node: str = Field(description="Starting node ID")
143
+ exit_nodes: list[str] = Field(default_factory=list, description="Terminal node IDs")
144
+
145
+ model_config = {"arbitrary_types_allowed": True}
146
+
147
+ def add_node(self, node: GraphNode) -> None:
148
+ """Add a node to the graph.
149
+
150
+ Args:
151
+ node: The node to add
152
+
153
+ Raises:
154
+ ValueError: If node ID already exists
155
+ """
156
+ if node.node_id in self.nodes:
157
+ raise ValueError(f"Node {node.node_id} already exists in graph")
158
+ self.nodes[node.node_id] = node
159
+ logger.debug("Node added to graph", node_id=node.node_id, type=node.node_type)
160
+
161
+ def add_edge(self, edge: GraphEdge) -> None:
162
+ """Add an edge to the graph.
163
+
164
+ Args:
165
+ edge: The edge to add
166
+
167
+ Raises:
168
+ ValueError: If source or target node doesn't exist
169
+ """
170
+ if edge.from_node not in self.nodes:
171
+ raise ValueError(f"Source node {edge.from_node} not found in graph")
172
+ if edge.to_node not in self.nodes:
173
+ raise ValueError(f"Target node {edge.to_node} not found in graph")
174
+
175
+ if edge.from_node not in self.edges:
176
+ self.edges[edge.from_node] = []
177
+ self.edges[edge.from_node].append(edge)
178
+ logger.debug(
179
+ "Edge added to graph",
180
+ from_node=edge.from_node,
181
+ to_node=edge.to_node,
182
+ )
183
+
184
+ def get_node(self, node_id: str) -> GraphNode | None:
185
+ """Get a node by ID.
186
+
187
+ Args:
188
+ node_id: The node ID
189
+
190
+ Returns:
191
+ The node, or None if not found
192
+ """
193
+ return self.nodes.get(node_id)
194
+
195
+ def get_next_nodes(self, node_id: str, context: Any = None) -> list[tuple[str, GraphEdge]]:
196
+ """Get all possible next nodes from a given node.
197
+
198
+ Args:
199
+ node_id: The current node ID
200
+ context: Optional context for evaluating conditions
201
+
202
+ Returns:
203
+ List of (node_id, edge) tuples for valid next nodes
204
+ """
205
+ if node_id not in self.edges:
206
+ return []
207
+
208
+ next_nodes = []
209
+ for edge in self.edges[node_id]:
210
+ # Evaluate condition if present
211
+ if edge.condition is None or edge.condition(context):
212
+ next_nodes.append((edge.to_node, edge))
213
+
214
+ return next_nodes
215
+
216
+ def validate_structure(self) -> list[str]:
217
+ """Validate the graph structure.
218
+
219
+ Returns:
220
+ List of validation error messages (empty if valid)
221
+ """
222
+ errors = []
223
+
224
+ # Check entry node exists
225
+ if self.entry_node not in self.nodes:
226
+ errors.append(f"Entry node {self.entry_node} not found in graph")
227
+
228
+ # Check exit nodes exist and at least one is defined
229
+ if not self.exit_nodes:
230
+ errors.append("At least one exit node must be defined")
231
+ for exit_node in self.exit_nodes:
232
+ if exit_node not in self.nodes:
233
+ errors.append(f"Exit node {exit_node} not found in graph")
234
+
235
+ # Check all edges reference valid nodes
236
+ for from_node, edge_list in self.edges.items():
237
+ if from_node not in self.nodes:
238
+ errors.append(f"Edge source node {from_node} not found")
239
+ for edge in edge_list:
240
+ if edge.to_node not in self.nodes:
241
+ errors.append(f"Edge target node {edge.to_node} not found")
242
+
243
+ # Check all nodes are reachable from entry node (basic check)
244
+ if self.entry_node in self.nodes:
245
+ reachable = {self.entry_node}
246
+ queue = [self.entry_node]
247
+ while queue:
248
+ current = queue.pop(0)
249
+ for next_node, _ in self.get_next_nodes(current):
250
+ if next_node not in reachable:
251
+ reachable.add(next_node)
252
+ queue.append(next_node)
253
+
254
+ unreachable = set(self.nodes.keys()) - reachable
255
+ if unreachable:
256
+ errors.append(f"Unreachable nodes from entry node: {', '.join(unreachable)}")
257
+
258
+ return errors
259
+
260
+
261
+ # ============================================================================
262
+ # Graph Builder Class
263
+ # ============================================================================
264
+
265
+
266
+ class GraphBuilder:
267
+ """Builder for constructing research workflow graphs."""
268
+
269
+ def __init__(self) -> None:
270
+ """Initialize the graph builder."""
271
+ self.graph = ResearchGraph(entry_node="", exit_nodes=[])
272
+
273
+ def add_agent_node(
274
+ self,
275
+ node_id: str,
276
+ agent: "Agent[Any, Any]",
277
+ description: str = "",
278
+ input_transformer: Callable[[Any], Any] | None = None,
279
+ output_transformer: Callable[[Any], Any] | None = None,
280
+ ) -> "GraphBuilder":
281
+ """Add an agent node to the graph.
282
+
283
+ Args:
284
+ node_id: Unique identifier for the node
285
+ agent: Pydantic AI agent to execute
286
+ description: Human-readable description
287
+ input_transformer: Optional input transformation function
288
+ output_transformer: Optional output transformation function
289
+
290
+ Returns:
291
+ Self for method chaining
292
+ """
293
+ node = AgentNode(
294
+ node_id=node_id,
295
+ agent=agent,
296
+ description=description,
297
+ input_transformer=input_transformer,
298
+ output_transformer=output_transformer,
299
+ )
300
+ self.graph.add_node(node)
301
+ return self
302
+
303
+ def add_state_node(
304
+ self,
305
+ node_id: str,
306
+ state_updater: Callable[["WorkflowState", Any], "WorkflowState"],
307
+ description: str = "",
308
+ state_reader: Callable[["WorkflowState"], Any] | None = None,
309
+ ) -> "GraphBuilder":
310
+ """Add a state node to the graph.
311
+
312
+ Args:
313
+ node_id: Unique identifier for the node
314
+ state_updater: Function to update workflow state
315
+ description: Human-readable description
316
+ state_reader: Optional function to read state
317
+
318
+ Returns:
319
+ Self for method chaining
320
+ """
321
+ node = StateNode(
322
+ node_id=node_id,
323
+ state_updater=state_updater,
324
+ description=description,
325
+ state_reader=state_reader,
326
+ )
327
+ self.graph.add_node(node)
328
+ return self
329
+
330
+ def add_decision_node(
331
+ self,
332
+ node_id: str,
333
+ decision_function: Callable[[Any], str],
334
+ options: list[str],
335
+ description: str = "",
336
+ ) -> "GraphBuilder":
337
+ """Add a decision node to the graph.
338
+
339
+ Args:
340
+ node_id: Unique identifier for the node
341
+ decision_function: Function that returns next node ID
342
+ options: List of possible next node IDs
343
+ description: Human-readable description
344
+
345
+ Returns:
346
+ Self for method chaining
347
+ """
348
+ node = DecisionNode(
349
+ node_id=node_id,
350
+ decision_function=decision_function,
351
+ options=options,
352
+ description=description,
353
+ )
354
+ self.graph.add_node(node)
355
+ return self
356
+
357
+ def add_parallel_node(
358
+ self,
359
+ node_id: str,
360
+ parallel_nodes: list[str],
361
+ description: str = "",
362
+ aggregator: Callable[[list[Any]], Any] | None = None,
363
+ ) -> "GraphBuilder":
364
+ """Add a parallel node to the graph.
365
+
366
+ Args:
367
+ node_id: Unique identifier for the node
368
+ parallel_nodes: List of node IDs to run in parallel
369
+ description: Human-readable description
370
+ aggregator: Optional function to aggregate results
371
+
372
+ Returns:
373
+ Self for method chaining
374
+ """
375
+ node = ParallelNode(
376
+ node_id=node_id,
377
+ parallel_nodes=parallel_nodes,
378
+ description=description,
379
+ aggregator=aggregator,
380
+ )
381
+ self.graph.add_node(node)
382
+ return self
383
+
384
+ def connect_nodes(
385
+ self,
386
+ from_node: str,
387
+ to_node: str,
388
+ condition: Callable[[Any], bool] | None = None,
389
+ condition_description: str = "",
390
+ ) -> "GraphBuilder":
391
+ """Connect two nodes with an edge.
392
+
393
+ Args:
394
+ from_node: Source node ID
395
+ to_node: Target node ID
396
+ condition: Optional condition function
397
+ condition_description: Description of condition (if conditional)
398
+
399
+ Returns:
400
+ Self for method chaining
401
+ """
402
+ if condition is None:
403
+ edge: GraphEdge = SequentialEdge(from_node=from_node, to_node=to_node)
404
+ else:
405
+ edge = ConditionalEdge(
406
+ from_node=from_node,
407
+ to_node=to_node,
408
+ condition=condition,
409
+ condition_description=condition_description,
410
+ )
411
+ self.graph.add_edge(edge)
412
+ return self
413
+
414
+ def set_entry_node(self, node_id: str) -> "GraphBuilder":
415
+ """Set the entry node for the graph.
416
+
417
+ Args:
418
+ node_id: The entry node ID
419
+
420
+ Returns:
421
+ Self for method chaining
422
+ """
423
+ self.graph.entry_node = node_id
424
+ return self
425
+
426
+ def set_exit_nodes(self, node_ids: list[str]) -> "GraphBuilder":
427
+ """Set the exit nodes for the graph.
428
+
429
+ Args:
430
+ node_ids: List of exit node IDs
431
+
432
+ Returns:
433
+ Self for method chaining
434
+ """
435
+ self.graph.exit_nodes = node_ids
436
+ return self
437
+
438
+ def build(self) -> ResearchGraph:
439
+ """Finalize graph construction and validate.
440
+
441
+ Returns:
442
+ The constructed ResearchGraph
443
+
444
+ Raises:
445
+ ValueError: If graph validation fails
446
+ """
447
+ errors = self.graph.validate_structure()
448
+ if errors:
449
+ error_msg = "Graph validation failed:\n" + "\n".join(f" - {e}" for e in errors)
450
+ logger.error("Graph validation failed", errors=errors)
451
+ raise ValueError(error_msg)
452
+
453
+ logger.info(
454
+ "Graph built successfully",
455
+ nodes=len(self.graph.nodes),
456
+ edges=sum(len(edges) for edges in self.graph.edges.values()),
457
+ entry_node=self.graph.entry_node,
458
+ exit_nodes=self.graph.exit_nodes,
459
+ )
460
+ return self.graph
461
+
462
+
463
+ # ============================================================================
464
+ # Factory Functions
465
+ # ============================================================================
466
+
467
+
468
+ def create_iterative_graph(
469
+ knowledge_gap_agent: "Agent[Any, Any]",
470
+ tool_selector_agent: "Agent[Any, Any]",
471
+ thinking_agent: "Agent[Any, Any]",
472
+ writer_agent: "Agent[Any, Any]",
473
+ ) -> ResearchGraph:
474
+ """Create a graph for iterative research flow.
475
+
476
+ Args:
477
+ knowledge_gap_agent: Agent for evaluating knowledge gaps
478
+ tool_selector_agent: Agent for selecting tools
479
+ thinking_agent: Agent for generating observations
480
+ writer_agent: Agent for writing final report
481
+
482
+ Returns:
483
+ Constructed ResearchGraph for iterative research
484
+ """
485
+ builder = GraphBuilder()
486
+
487
+ # Add nodes
488
+ builder.add_agent_node("thinking", thinking_agent, "Generate observations")
489
+ builder.add_agent_node("knowledge_gap", knowledge_gap_agent, "Evaluate knowledge gaps")
490
+ builder.add_decision_node(
491
+ "continue_decision",
492
+ decision_function=lambda result: "writer"
493
+ if getattr(result, "research_complete", False)
494
+ else "tool_selector",
495
+ options=["tool_selector", "writer"],
496
+ description="Decide whether to continue research or write report",
497
+ )
498
+ builder.add_agent_node("tool_selector", tool_selector_agent, "Select tools to address gap")
499
+ builder.add_state_node(
500
+ "execute_tools",
501
+ state_updater=lambda state,
502
+ tasks: state, # Placeholder - actual execution handled separately
503
+ description="Execute selected tools",
504
+ )
505
+ builder.add_agent_node("writer", writer_agent, "Write final report")
506
+
507
+ # Add edges
508
+ builder.connect_nodes("thinking", "knowledge_gap")
509
+ builder.connect_nodes("knowledge_gap", "continue_decision")
510
+ builder.connect_nodes("continue_decision", "tool_selector")
511
+ builder.connect_nodes("continue_decision", "writer")
512
+ builder.connect_nodes("tool_selector", "execute_tools")
513
+ builder.connect_nodes("execute_tools", "thinking") # Loop back
514
+
515
+ # Set entry and exit
516
+ builder.set_entry_node("thinking")
517
+ builder.set_exit_nodes(["writer"])
518
+
519
+ return builder.build()
520
+
521
+
522
+ def create_deep_graph(
523
+ planner_agent: "Agent[Any, Any]",
524
+ knowledge_gap_agent: "Agent[Any, Any]",
525
+ tool_selector_agent: "Agent[Any, Any]",
526
+ thinking_agent: "Agent[Any, Any]",
527
+ writer_agent: "Agent[Any, Any]",
528
+ long_writer_agent: "Agent[Any, Any]",
529
+ ) -> ResearchGraph:
530
+ """Create a graph for deep research flow.
531
+
532
+ The graph structure: planner → store_plan → parallel_loops → collect_drafts → synthesizer
533
+
534
+ Args:
535
+ planner_agent: Agent for creating report plan
536
+ knowledge_gap_agent: Agent for evaluating knowledge gaps (not used directly, but needed for iterative flows)
537
+ tool_selector_agent: Agent for selecting tools (not used directly, but needed for iterative flows)
538
+ thinking_agent: Agent for generating observations (not used directly, but needed for iterative flows)
539
+ writer_agent: Agent for writing section reports (not used directly, but needed for iterative flows)
540
+ long_writer_agent: Agent for synthesizing final report
541
+
542
+ Returns:
543
+ Constructed ResearchGraph for deep research
544
+ """
545
+ from src.utils.models import ReportPlan
546
+
547
+ builder = GraphBuilder()
548
+
549
+ # Add nodes
550
+ # 1. Planner agent - creates report plan
551
+ builder.add_agent_node("planner", planner_agent, "Create report plan with sections")
552
+
553
+ # 2. State node - store report plan in workflow state
554
+ def store_plan(state: "WorkflowState", plan: ReportPlan) -> "WorkflowState":
555
+ """Store report plan in state for parallel loops to access."""
556
+ # Store plan in a custom attribute (we'll need to extend WorkflowState or use a dict)
557
+ # For now, we'll store it in the context's node_results
558
+ # The actual storage will happen in the graph execution
559
+ return state
560
+
561
+ builder.add_state_node(
562
+ "store_plan",
563
+ state_updater=store_plan,
564
+ description="Store report plan in state",
565
+ )
566
+
567
+ # 3. Parallel node - will execute iterative research flows for each section
568
+ # The actual execution will be handled dynamically in _execute_parallel_node()
569
+ # We use a special node ID that the executor will recognize
570
+ builder.add_parallel_node(
571
+ "parallel_loops",
572
+ parallel_nodes=[], # Will be populated dynamically based on report plan
573
+ description="Execute parallel iterative research loops for each section",
574
+ aggregator=lambda results: results, # Collect all section drafts
575
+ )
576
+
577
+ # 4. State node - collect section drafts into ReportDraft
578
+ def collect_drafts(state: "WorkflowState", section_drafts: list[str]) -> "WorkflowState":
579
+ """Collect section drafts into state for synthesizer."""
580
+ # Store drafts in state (will be accessed by synthesizer)
581
+ return state
582
+
583
+ builder.add_state_node(
584
+ "collect_drafts",
585
+ state_updater=collect_drafts,
586
+ description="Collect section drafts for synthesis",
587
+ )
588
+
589
+ # 5. Synthesizer agent - creates final report from drafts
590
+ builder.add_agent_node(
591
+ "synthesizer", long_writer_agent, "Synthesize final report from section drafts"
592
+ )
593
+
594
+ # Add edges
595
+ builder.connect_nodes("planner", "store_plan")
596
+ builder.connect_nodes("store_plan", "parallel_loops")
597
+ builder.connect_nodes("parallel_loops", "collect_drafts")
598
+ builder.connect_nodes("collect_drafts", "synthesizer")
599
+
600
+ # Set entry and exit
601
+ builder.set_entry_node("planner")
602
+ builder.set_exit_nodes(["synthesizer"])
603
+
604
+ return builder.build()
605
+
606
+
607
+ # No need to rebuild models since we're using Any types
608
+ # The models will work correctly with arbitrary_types_allowed=True
src/agent_factory/judges.py CHANGED
@@ -9,7 +9,7 @@ from huggingface_hub import InferenceClient
9
  from pydantic_ai import Agent
10
  from pydantic_ai.models.anthropic import AnthropicModel
11
  from pydantic_ai.models.huggingface import HuggingFaceModel
12
- from pydantic_ai.models.openai import OpenAIModel
13
  from pydantic_ai.providers.anthropic import AnthropicProvider
14
  from pydantic_ai.providers.huggingface import HuggingFaceProvider
15
  from pydantic_ai.providers.openai import OpenAIProvider
@@ -40,15 +40,21 @@ def get_model() -> Any:
40
 
41
  if llm_provider == "huggingface":
42
  # Free tier - uses HF_TOKEN from environment if available
43
- model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
44
  hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
45
  return HuggingFaceModel(model_name, provider=hf_provider)
46
 
47
- if llm_provider != "openai":
48
- logger.warning("Unknown LLM provider, defaulting to OpenAI", provider=llm_provider)
 
49
 
50
- openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
51
- return OpenAIModel(settings.openai_model, provider=openai_provider)
 
 
 
 
 
52
 
53
 
54
  class JudgeHandler:
@@ -359,6 +365,15 @@ IMPORTANT: Respond with ONLY valid JSON matching this schema:
359
  )
360
 
361
 
 
 
 
 
 
 
 
 
 
362
  class MockJudgeHandler:
363
  """
364
  Mock JudgeHandler for demo mode without LLM calls.
 
9
  from pydantic_ai import Agent
10
  from pydantic_ai.models.anthropic import AnthropicModel
11
  from pydantic_ai.models.huggingface import HuggingFaceModel
12
+ from pydantic_ai.models.openai import OpenAIChatModel as OpenAIModel
13
  from pydantic_ai.providers.anthropic import AnthropicProvider
14
  from pydantic_ai.providers.huggingface import HuggingFaceProvider
15
  from pydantic_ai.providers.openai import OpenAIProvider
 
40
 
41
  if llm_provider == "huggingface":
42
  # Free tier - uses HF_TOKEN from environment if available
43
+ model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
44
  hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
45
  return HuggingFaceModel(model_name, provider=hf_provider)
46
 
47
+ if llm_provider == "openai":
48
+ openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
49
+ return OpenAIModel(settings.openai_model, provider=openai_provider)
50
 
51
+ # Default to HuggingFace if provider is unknown or not specified
52
+ if llm_provider != "huggingface":
53
+ logger.warning("Unknown LLM provider, defaulting to HuggingFace", provider=llm_provider)
54
+
55
+ model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
56
+ hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
57
+ return HuggingFaceModel(model_name, provider=hf_provider)
58
 
59
 
60
  class JudgeHandler:
 
365
  )
366
 
367
 
368
+ def create_judge_handler() -> JudgeHandler:
369
+ """Create a judge handler based on configuration.
370
+
371
+ Returns:
372
+ Configured JudgeHandler instance
373
+ """
374
+ return JudgeHandler()
375
+
376
+
377
  class MockJudgeHandler:
378
  """
379
  Mock JudgeHandler for demo mode without LLM calls.
src/agents/code_executor_agent.py CHANGED
@@ -1,13 +1,13 @@
1
  """Code execution agent using Modal."""
2
 
3
  import asyncio
 
4
 
5
  import structlog
6
  from agent_framework import ChatAgent, ai_function
7
- from agent_framework.openai import OpenAIChatClient
8
 
9
  from src.tools.code_execution import get_code_executor
10
- from src.utils.config import settings
11
 
12
  logger = structlog.get_logger()
13
 
@@ -40,19 +40,17 @@ async def execute_python_code(code: str) -> str:
40
  return f"Execution failed: {e}"
41
 
42
 
43
- def create_code_executor_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
44
  """Create a code executor agent.
45
 
46
  Args:
47
- chat_client: Optional custom chat client.
 
48
 
49
  Returns:
50
  ChatAgent configured for code execution.
51
  """
52
- client = chat_client or OpenAIChatClient(
53
- model_id=settings.openai_model,
54
- api_key=settings.openai_api_key,
55
- )
56
 
57
  return ChatAgent(
58
  name="CodeExecutorAgent",
 
1
  """Code execution agent using Modal."""
2
 
3
  import asyncio
4
+ from typing import Any
5
 
6
  import structlog
7
  from agent_framework import ChatAgent, ai_function
 
8
 
9
  from src.tools.code_execution import get_code_executor
10
+ from src.utils.llm_factory import get_chat_client_for_agent
11
 
12
  logger = structlog.get_logger()
13
 
 
40
  return f"Execution failed: {e}"
41
 
42
 
43
+ def create_code_executor_agent(chat_client: Any | None = None) -> ChatAgent:
44
  """Create a code executor agent.
45
 
46
  Args:
47
+ chat_client: Optional custom chat client. If None, uses factory default
48
+ (HuggingFace preferred, OpenAI fallback).
49
 
50
  Returns:
51
  ChatAgent configured for code execution.
52
  """
53
+ client = chat_client or get_chat_client_for_agent()
 
 
 
54
 
55
  return ChatAgent(
56
  name="CodeExecutorAgent",
src/agents/input_parser.py ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Input parser agent for analyzing and improving user queries.
2
+
3
+ Determines research mode (iterative vs deep) and extracts key information
4
+ from user queries to improve research quality.
5
+ """
6
+
7
+ from typing import TYPE_CHECKING, Any, Literal
8
+
9
+ import structlog
10
+ from pydantic_ai import Agent
11
+
12
+ from src.agent_factory.judges import get_model
13
+ from src.utils.exceptions import ConfigurationError, JudgeError
14
+ from src.utils.models import ParsedQuery
15
+
16
+ if TYPE_CHECKING:
17
+ pass
18
+
19
+ logger = structlog.get_logger()
20
+
21
+ # System prompt for the input parser agent
22
+ SYSTEM_PROMPT = """
23
+ You are an expert research query analyzer. Your job is to analyze user queries and determine:
24
+ 1. Whether the query requires iterative research (single focused question) or deep research (multiple sections/topics)
25
+ 2. Improve and refine the query for better research results
26
+ 3. Extract key entities (drugs, diseases, targets, companies, etc.)
27
+ 4. Extract specific research questions
28
+
29
+ Guidelines for determining research mode:
30
+ - **Iterative mode**: Single focused question, straightforward research goal, can be answered with a focused search loop
31
+ Examples: "What is the mechanism of metformin?", "Find clinical trials for drug X"
32
+
33
+ - **Deep mode**: Complex query requiring multiple sections, comprehensive report, multiple related topics
34
+ Examples: "Write a comprehensive report on diabetes treatment", "Analyze the market for quantum computing"
35
+ Indicators: words like "comprehensive", "report", "sections", "analyze", "market analysis", "overview"
36
+
37
+ Your output must be valid JSON matching the ParsedQuery schema. Always provide:
38
+ - original_query: The exact input query
39
+ - improved_query: A refined, clearer version of the query
40
+ - research_mode: Either "iterative" or "deep"
41
+ - key_entities: List of important entities (drugs, diseases, companies, etc.)
42
+ - research_questions: List of specific questions to answer
43
+
44
+ Only output JSON. Do not output anything else.
45
+ """
46
+
47
+
48
+ class InputParserAgent:
49
+ """
50
+ Input parser agent that analyzes queries and determines research mode.
51
+
52
+ Uses Pydantic AI to generate structured ParsedQuery output with research
53
+ mode detection, query improvement, and entity extraction.
54
+ """
55
+
56
+ def __init__(self, model: Any | None = None) -> None:
57
+ """
58
+ Initialize the input parser agent.
59
+
60
+ Args:
61
+ model: Optional Pydantic AI model. If None, uses config default.
62
+ """
63
+ self.model = model or get_model()
64
+ self.logger = logger
65
+
66
+ # Initialize Pydantic AI Agent
67
+ self.agent = Agent(
68
+ model=self.model,
69
+ output_type=ParsedQuery,
70
+ system_prompt=SYSTEM_PROMPT,
71
+ retries=3,
72
+ )
73
+
74
+ async def parse(self, query: str) -> ParsedQuery:
75
+ """
76
+ Parse and analyze a user query.
77
+
78
+ Args:
79
+ query: The user's research query
80
+
81
+ Returns:
82
+ ParsedQuery with research mode, improved query, entities, and questions
83
+
84
+ Raises:
85
+ JudgeError: If parsing fails after retries
86
+ ConfigurationError: If agent configuration is invalid
87
+ """
88
+ self.logger.info("Parsing user query", query=query[:100])
89
+
90
+ user_message = f"QUERY: {query}"
91
+
92
+ try:
93
+ # Run the agent
94
+ result = await self.agent.run(user_message)
95
+ parsed_query = result.output
96
+
97
+ # Validate parsed query
98
+ if not parsed_query.original_query:
99
+ self.logger.warning("Parsed query missing original_query", query=query[:100])
100
+ raise JudgeError("Parsed query must have original_query")
101
+
102
+ if not parsed_query.improved_query:
103
+ self.logger.warning("Parsed query missing improved_query", query=query[:100])
104
+ # Use original as fallback
105
+ parsed_query = ParsedQuery(
106
+ original_query=parsed_query.original_query,
107
+ improved_query=parsed_query.original_query,
108
+ research_mode=parsed_query.research_mode,
109
+ key_entities=parsed_query.key_entities,
110
+ research_questions=parsed_query.research_questions,
111
+ )
112
+
113
+ self.logger.info(
114
+ "Query parsed successfully",
115
+ mode=parsed_query.research_mode,
116
+ entities=len(parsed_query.key_entities),
117
+ questions=len(parsed_query.research_questions),
118
+ )
119
+
120
+ return parsed_query
121
+
122
+ except Exception as e:
123
+ self.logger.error("Query parsing failed", error=str(e), query=query[:100])
124
+
125
+ # Fallback: return basic parsed query with heuristic mode detection
126
+ if isinstance(e, JudgeError | ConfigurationError):
127
+ raise
128
+
129
+ # Heuristic fallback
130
+ query_lower = query.lower()
131
+ research_mode: Literal["iterative", "deep"] = "iterative"
132
+ if any(
133
+ keyword in query_lower
134
+ for keyword in [
135
+ "comprehensive",
136
+ "report",
137
+ "sections",
138
+ "analyze",
139
+ "analysis",
140
+ "overview",
141
+ "market",
142
+ ]
143
+ ):
144
+ research_mode = "deep"
145
+
146
+ return ParsedQuery(
147
+ original_query=query,
148
+ improved_query=query,
149
+ research_mode=research_mode,
150
+ key_entities=[],
151
+ research_questions=[],
152
+ )
153
+
154
+
155
+ def create_input_parser_agent(model: Any | None = None) -> InputParserAgent:
156
+ """
157
+ Factory function to create an input parser agent.
158
+
159
+ Args:
160
+ model: Optional Pydantic AI model. If None, uses settings default.
161
+
162
+ Returns:
163
+ Configured InputParserAgent instance
164
+
165
+ Raises:
166
+ ConfigurationError: If required API keys are missing
167
+ """
168
+ try:
169
+ # Get model from settings if not provided
170
+ if model is None:
171
+ model = get_model()
172
+
173
+ # Create and return input parser agent
174
+ return InputParserAgent(model=model)
175
+
176
+ except Exception as e:
177
+ logger.error("Failed to create input parser agent", error=str(e))
178
+ raise ConfigurationError(f"Failed to create input parser agent: {e}") from e
src/agents/judge_agent.py CHANGED
@@ -12,7 +12,7 @@ from agent_framework import (
12
  Role,
13
  )
14
 
15
- from src.orchestrator import JudgeHandlerProtocol
16
  from src.utils.models import Evidence, JudgeAssessment
17
 
18
 
 
12
  Role,
13
  )
14
 
15
+ from src.legacy_orchestrator import JudgeHandlerProtocol
16
  from src.utils.models import Evidence, JudgeAssessment
17
 
18
 
src/agents/knowledge_gap.py ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Knowledge gap agent for evaluating research completeness.
2
+
3
+ Converts the folder/knowledge_gap_agent.py implementation to use Pydantic AI.
4
+ """
5
+
6
+ from datetime import datetime
7
+ from typing import Any
8
+
9
+ import structlog
10
+ from pydantic_ai import Agent
11
+
12
+ from src.agent_factory.judges import get_model
13
+ from src.utils.exceptions import ConfigurationError
14
+ from src.utils.models import KnowledgeGapOutput
15
+
16
+ logger = structlog.get_logger()
17
+
18
+
19
+ # System prompt for the knowledge gap agent
20
+ SYSTEM_PROMPT = f"""
21
+ You are a Research State Evaluator. Today's date is {datetime.now().strftime("%Y-%m-%d")}.
22
+ Your job is to critically analyze the current state of a research report,
23
+ identify what knowledge gaps still exist and determine the best next step to take.
24
+
25
+ You will be given:
26
+ 1. The original user query and any relevant background context to the query
27
+ 2. A full history of the tasks, actions, findings and thoughts you've made up until this point in the research process
28
+
29
+ Your task is to:
30
+ 1. Carefully review the findings and thoughts, particularly from the latest iteration, and assess their completeness in answering the original query
31
+ 2. Determine if the findings are sufficiently complete to end the research loop
32
+ 3. If not, identify up to 3 knowledge gaps that need to be addressed in sequence in order to continue with research - these should be relevant to the original query
33
+
34
+ Be specific in the gaps you identify and include relevant information as this will be passed onto another agent to process without additional context.
35
+
36
+ Only output JSON. Follow the JSON schema for KnowledgeGapOutput. Do not output anything else.
37
+ """
38
+
39
+
40
+ class KnowledgeGapAgent:
41
+ """
42
+ Agent that evaluates research state and identifies knowledge gaps.
43
+
44
+ Uses Pydantic AI to generate structured KnowledgeGapOutput indicating
45
+ whether research is complete and what gaps remain.
46
+ """
47
+
48
+ def __init__(self, model: Any | None = None) -> None:
49
+ """
50
+ Initialize the knowledge gap agent.
51
+
52
+ Args:
53
+ model: Optional Pydantic AI model. If None, uses config default.
54
+ """
55
+ self.model = model or get_model()
56
+ self.logger = logger
57
+
58
+ # Initialize Pydantic AI Agent
59
+ self.agent = Agent(
60
+ model=self.model,
61
+ output_type=KnowledgeGapOutput,
62
+ system_prompt=SYSTEM_PROMPT,
63
+ retries=3,
64
+ )
65
+
66
+ async def evaluate(
67
+ self,
68
+ query: str,
69
+ background_context: str = "",
70
+ conversation_history: str = "",
71
+ iteration: int = 0,
72
+ time_elapsed_minutes: float = 0.0,
73
+ max_time_minutes: int = 10,
74
+ ) -> KnowledgeGapOutput:
75
+ """
76
+ Evaluate research state and identify knowledge gaps.
77
+
78
+ Args:
79
+ query: The original research query
80
+ background_context: Optional background context
81
+ conversation_history: History of actions, findings, and thoughts
82
+ iteration: Current iteration number
83
+ time_elapsed_minutes: Time elapsed so far
84
+ max_time_minutes: Maximum time allowed
85
+
86
+ Returns:
87
+ KnowledgeGapOutput with research completeness and outstanding gaps
88
+
89
+ Raises:
90
+ JudgeError: If evaluation fails after retries
91
+ """
92
+ self.logger.info(
93
+ "Evaluating knowledge gaps",
94
+ query=query[:100],
95
+ iteration=iteration,
96
+ )
97
+
98
+ background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
99
+
100
+ user_message = f"""
101
+ Current Iteration Number: {iteration}
102
+ Time Elapsed: {time_elapsed_minutes:.2f} minutes of maximum {max_time_minutes} minutes
103
+
104
+ ORIGINAL QUERY:
105
+ {query}
106
+
107
+ {background}
108
+
109
+ HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
110
+ {conversation_history or "No previous actions, findings or thoughts available."}
111
+ """
112
+
113
+ try:
114
+ # Run the agent
115
+ result = await self.agent.run(user_message)
116
+ evaluation = result.output
117
+
118
+ self.logger.info(
119
+ "Knowledge gap evaluation complete",
120
+ research_complete=evaluation.research_complete,
121
+ gaps_count=len(evaluation.outstanding_gaps),
122
+ )
123
+
124
+ return evaluation
125
+
126
+ except Exception as e:
127
+ self.logger.error("Knowledge gap evaluation failed", error=str(e))
128
+ # Return fallback: research not complete, suggest continuing
129
+ return KnowledgeGapOutput(
130
+ research_complete=False,
131
+ outstanding_gaps=[f"Continue research on: {query}"],
132
+ )
133
+
134
+
135
+ def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent:
136
+ """
137
+ Factory function to create a knowledge gap agent.
138
+
139
+ Args:
140
+ model: Optional Pydantic AI model. If None, uses settings default.
141
+
142
+ Returns:
143
+ Configured KnowledgeGapAgent instance
144
+
145
+ Raises:
146
+ ConfigurationError: If required API keys are missing
147
+ """
148
+ try:
149
+ if model is None:
150
+ model = get_model()
151
+
152
+ return KnowledgeGapAgent(model=model)
153
+
154
+ except Exception as e:
155
+ logger.error("Failed to create knowledge gap agent", error=str(e))
156
+ raise ConfigurationError(f"Failed to create knowledge gap agent: {e}") from e
src/agents/long_writer.py ADDED
@@ -0,0 +1,431 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Long writer agent for iteratively writing report sections.
2
+
3
+ Converts the folder/long_writer_agent.py implementation to use Pydantic AI.
4
+ """
5
+
6
+ import re
7
+ from datetime import datetime
8
+ from typing import Any
9
+
10
+ import structlog
11
+ from pydantic import BaseModel, Field
12
+ from pydantic_ai import Agent
13
+
14
+ from src.agent_factory.judges import get_model
15
+ from src.utils.exceptions import ConfigurationError
16
+ from src.utils.models import ReportDraft
17
+
18
+ logger = structlog.get_logger()
19
+
20
+
21
+ # LongWriterOutput model for structured output
22
+ class LongWriterOutput(BaseModel):
23
+ """Output from the long writer agent for a single section."""
24
+
25
+ next_section_markdown: str = Field(
26
+ description="The final draft of the next section in markdown format"
27
+ )
28
+ references: list[str] = Field(
29
+ description="A list of URLs and their corresponding reference numbers for the section"
30
+ )
31
+
32
+ model_config = {"frozen": True}
33
+
34
+
35
+ # System prompt for the long writer agent
36
+ SYSTEM_PROMPT = f"""
37
+ You are an expert report writer tasked with iteratively writing each section of a report.
38
+ Today's date is {datetime.now().strftime("%Y-%m-%d")}.
39
+ You will be provided with:
40
+ 1. The original research query
41
+ 2. A final draft of the report containing the table of contents and all sections written up until this point (in the first iteration there will be no sections written yet)
42
+ 3. A first draft of the next section of the report to be written
43
+
44
+ OBJECTIVE:
45
+ 1. Write a final draft of the next section of the report with numbered citations in square brackets in the body of the report
46
+ 2. Produce a list of references to be appended to the end of the report
47
+
48
+ CITATIONS/REFERENCES:
49
+ The citations should be in numerical order, written in numbered square brackets in the body of the report.
50
+ Separately, a list of all URLs and their corresponding reference numbers will be included at the end of the report.
51
+ Follow the example below for formatting.
52
+
53
+ LongWriterOutput(
54
+ next_section_markdown="The company specializes in IT consulting [1]. It operates in the software services market which is expected to grow at 10% per year [2].",
55
+ references=["[1] https://example.com/first-source-url", "[2] https://example.com/second-source-url"]
56
+ )
57
+
58
+ GUIDELINES:
59
+ - You can reformat and reorganize the flow of the content and headings within a section to flow logically, but DO NOT remove details that were included in the first draft
60
+ - Only remove text from the first draft if it is already mentioned earlier in the report, or if it should be covered in a later section per the table of contents
61
+ - Ensure the heading for the section matches the table of contents
62
+ - Format the final output and references section as markdown
63
+ - Do not include a title for the reference section, just a list of numbered references
64
+
65
+ Only output JSON. Follow the JSON schema for LongWriterOutput. Do not output anything else.
66
+ """
67
+
68
+
69
+ class LongWriterAgent:
70
+ """
71
+ Agent that iteratively writes report sections with proper citations.
72
+
73
+ Uses Pydantic AI to generate structured LongWriterOutput for each section.
74
+ """
75
+
76
+ def __init__(self, model: Any | None = None) -> None:
77
+ """
78
+ Initialize the long writer agent.
79
+
80
+ Args:
81
+ model: Optional Pydantic AI model. If None, uses config default.
82
+ """
83
+ self.model = model or get_model()
84
+ self.logger = logger
85
+
86
+ # Initialize Pydantic AI Agent
87
+ self.agent = Agent(
88
+ model=self.model,
89
+ output_type=LongWriterOutput,
90
+ system_prompt=SYSTEM_PROMPT,
91
+ retries=3,
92
+ )
93
+
94
+ async def write_next_section(
95
+ self,
96
+ original_query: str,
97
+ report_draft: str,
98
+ next_section_title: str,
99
+ next_section_draft: str,
100
+ ) -> LongWriterOutput:
101
+ """
102
+ Write the next section of the report.
103
+
104
+ Args:
105
+ original_query: The original research query
106
+ report_draft: Current report draft (all sections written so far)
107
+ next_section_title: Title of the section to write
108
+ next_section_draft: Draft content for the next section
109
+
110
+ Returns:
111
+ LongWriterOutput with formatted section and references
112
+
113
+ Raises:
114
+ ConfigurationError: If writing fails
115
+ """
116
+ # Input validation
117
+ if not original_query or not original_query.strip():
118
+ self.logger.warning("Empty query provided, using default")
119
+ original_query = "Research query"
120
+
121
+ if not next_section_title or not next_section_title.strip():
122
+ self.logger.warning("Empty section title provided, using default")
123
+ next_section_title = "Section"
124
+
125
+ if next_section_draft is None:
126
+ next_section_draft = ""
127
+
128
+ if report_draft is None:
129
+ report_draft = ""
130
+
131
+ # Truncate very long inputs
132
+ max_draft_length = 30000
133
+ if len(report_draft) > max_draft_length:
134
+ self.logger.warning(
135
+ "Report draft too long, truncating",
136
+ original_length=len(report_draft),
137
+ )
138
+ report_draft = report_draft[:max_draft_length] + "\n\n[Content truncated]"
139
+
140
+ if len(next_section_draft) > max_draft_length:
141
+ self.logger.warning(
142
+ "Section draft too long, truncating",
143
+ original_length=len(next_section_draft),
144
+ )
145
+ next_section_draft = next_section_draft[:max_draft_length] + "\n\n[Content truncated]"
146
+
147
+ self.logger.info(
148
+ "Writing next section",
149
+ section_title=next_section_title,
150
+ query=original_query[:100],
151
+ )
152
+
153
+ user_message = f"""
154
+ <ORIGINAL QUERY>
155
+ {original_query}
156
+ </ORIGINAL QUERY>
157
+
158
+ <CURRENT REPORT DRAFT>
159
+ {report_draft or "No draft yet"}
160
+ </CURRENT REPORT DRAFT>
161
+
162
+ <TITLE OF NEXT SECTION TO WRITE>
163
+ {next_section_title}
164
+ </TITLE OF NEXT SECTION TO WRITE>
165
+
166
+ <DRAFT OF NEXT SECTION>
167
+ {next_section_draft}
168
+ </DRAFT OF NEXT SECTION>
169
+ """
170
+
171
+ # Retry logic for transient failures
172
+ max_retries = 3
173
+ last_exception: Exception | None = None
174
+
175
+ for attempt in range(max_retries):
176
+ try:
177
+ # Run the agent
178
+ result = await self.agent.run(user_message)
179
+ output = result.output
180
+
181
+ # Validate output
182
+ if not output or not isinstance(output, LongWriterOutput):
183
+ raise ValueError("Invalid output format")
184
+
185
+ if not output.next_section_markdown or not output.next_section_markdown.strip():
186
+ self.logger.warning("Empty section generated, using fallback")
187
+ raise ValueError("Empty section generated")
188
+
189
+ self.logger.info(
190
+ "Section written",
191
+ section_title=next_section_title,
192
+ references_count=len(output.references),
193
+ attempt=attempt + 1,
194
+ )
195
+
196
+ return output
197
+
198
+ except (TimeoutError, ConnectionError) as e:
199
+ # Transient errors - retry
200
+ last_exception = e
201
+ if attempt < max_retries - 1:
202
+ self.logger.warning(
203
+ "Transient error, retrying",
204
+ error=str(e),
205
+ attempt=attempt + 1,
206
+ max_retries=max_retries,
207
+ )
208
+ continue
209
+ else:
210
+ self.logger.error("Max retries exceeded for transient error", error=str(e))
211
+ break
212
+
213
+ except Exception as e:
214
+ # Non-transient errors - don't retry
215
+ last_exception = e
216
+ self.logger.error(
217
+ "Section writing failed",
218
+ error=str(e),
219
+ error_type=type(e).__name__,
220
+ )
221
+ break
222
+
223
+ # Return fallback section if all attempts failed
224
+ self.logger.error(
225
+ "Section writing failed after all attempts",
226
+ error=str(last_exception) if last_exception else "Unknown error",
227
+ )
228
+ return LongWriterOutput(
229
+ next_section_markdown=f"## {next_section_title}\n\n{next_section_draft}",
230
+ references=[],
231
+ )
232
+
233
+ async def write_report(
234
+ self,
235
+ original_query: str,
236
+ report_title: str,
237
+ report_draft: ReportDraft,
238
+ ) -> str:
239
+ """
240
+ Write the final report by iteratively writing each section.
241
+
242
+ Args:
243
+ original_query: The original research query
244
+ report_title: Title of the report
245
+ report_draft: ReportDraft with all sections
246
+
247
+ Returns:
248
+ Complete markdown report string
249
+
250
+ Raises:
251
+ ConfigurationError: If writing fails
252
+ """
253
+ # Input validation
254
+ if not original_query or not original_query.strip():
255
+ self.logger.warning("Empty query provided, using default")
256
+ original_query = "Research query"
257
+
258
+ if not report_title or not report_title.strip():
259
+ self.logger.warning("Empty report title provided, using default")
260
+ report_title = "Research Report"
261
+
262
+ if not report_draft or not report_draft.sections:
263
+ self.logger.warning("Empty report draft provided, returning minimal report")
264
+ return f"# {report_title}\n\n## Query\n{original_query}\n\n*No sections available.*"
265
+
266
+ self.logger.info(
267
+ "Writing full report",
268
+ report_title=report_title,
269
+ sections_count=len(report_draft.sections),
270
+ )
271
+
272
+ # Initialize the final draft with title and table of contents
273
+ final_draft = (
274
+ f"# {report_title}\n\n## Table of Contents\n\n"
275
+ + "\n".join(
276
+ [
277
+ f"{i + 1}. {section.section_title}"
278
+ for i, section in enumerate(report_draft.sections)
279
+ ]
280
+ )
281
+ + "\n\n"
282
+ )
283
+ all_references: list[str] = []
284
+
285
+ for section in report_draft.sections:
286
+ # Write each section
287
+ next_section_output = await self.write_next_section(
288
+ original_query,
289
+ final_draft,
290
+ section.section_title,
291
+ section.section_content,
292
+ )
293
+
294
+ # Reformat references and update section markdown
295
+ section_markdown, all_references = self._reformat_references(
296
+ next_section_output.next_section_markdown,
297
+ next_section_output.references,
298
+ all_references,
299
+ )
300
+
301
+ # Reformat section headings
302
+ section_markdown = self._reformat_section_headings(section_markdown)
303
+
304
+ # Add to final draft
305
+ final_draft += section_markdown + "\n\n"
306
+
307
+ # Add final references
308
+ final_draft += "## References:\n\n" + " \n".join(all_references)
309
+
310
+ self.logger.info("Full report written", length=len(final_draft))
311
+
312
+ return final_draft
313
+
314
+ def _reformat_references(
315
+ self,
316
+ section_markdown: str,
317
+ section_references: list[str],
318
+ all_references: list[str],
319
+ ) -> tuple[str, list[str]]:
320
+ """
321
+ Reformat references: re-number, de-duplicate, and update markdown.
322
+
323
+ Args:
324
+ section_markdown: Markdown content with inline references [1], [2]
325
+ section_references: List of references for this section
326
+ all_references: Accumulated references from previous sections
327
+
328
+ Returns:
329
+ Tuple of (updated markdown, updated all_references)
330
+ """
331
+
332
+ # Convert reference lists to maps (URL -> ref_num)
333
+ def convert_ref_list_to_map(ref_list: list[str]) -> dict[str, int]:
334
+ ref_map: dict[str, int] = {}
335
+ for ref in ref_list:
336
+ try:
337
+ # Parse "[1] https://example.com" format
338
+ parts = ref.split("]", 1)
339
+ if len(parts) == 2:
340
+ ref_num = int(parts[0].strip("["))
341
+ url = parts[1].strip()
342
+ ref_map[url] = ref_num
343
+ except (ValueError, IndexError):
344
+ logger.warning("Invalid reference format", ref=ref)
345
+ continue
346
+ return ref_map
347
+
348
+ section_ref_map = convert_ref_list_to_map(section_references)
349
+ report_ref_map = convert_ref_list_to_map(all_references)
350
+ section_to_report_ref_map: dict[int, int] = {}
351
+
352
+ report_urls = set(report_ref_map.keys())
353
+ ref_count = max(report_ref_map.values() or [0])
354
+
355
+ # Map section references to report references
356
+ for url, section_ref_num in section_ref_map.items():
357
+ if url in report_urls:
358
+ # URL already exists - reuse its reference number
359
+ section_to_report_ref_map[section_ref_num] = report_ref_map[url]
360
+ else:
361
+ # New URL - assign next reference number
362
+ ref_count += 1
363
+ section_to_report_ref_map[section_ref_num] = ref_count
364
+ all_references.append(f"[{ref_count}] {url}")
365
+
366
+ # Replace reference numbers in markdown
367
+ def replace_reference(match: re.Match[str]) -> str:
368
+ ref_num = int(match.group(1))
369
+ mapped_ref_num = section_to_report_ref_map.get(ref_num)
370
+ if mapped_ref_num:
371
+ return f"[{mapped_ref_num}]"
372
+ return ""
373
+
374
+ updated_markdown = re.sub(r"\[(\d+)\]", replace_reference, section_markdown)
375
+
376
+ return updated_markdown, all_references
377
+
378
+ def _reformat_section_headings(self, section_markdown: str) -> str:
379
+ """
380
+ Reformat section headings to be consistent (level-2 for main heading).
381
+
382
+ Args:
383
+ section_markdown: Markdown content with headings
384
+
385
+ Returns:
386
+ Updated markdown with adjusted heading levels
387
+ """
388
+ if not section_markdown.strip():
389
+ return section_markdown
390
+
391
+ # Find first heading level
392
+ first_heading_match = re.search(r"^(#+)\s", section_markdown, re.MULTILINE)
393
+ if not first_heading_match:
394
+ return section_markdown
395
+
396
+ # Calculate level adjustment needed (target is level 2)
397
+ first_heading_level = len(first_heading_match.group(1))
398
+ level_adjustment = 2 - first_heading_level
399
+
400
+ def adjust_heading_level(match: re.Match[str]) -> str:
401
+ hashes = match.group(1)
402
+ content = match.group(2)
403
+ new_level = max(2, len(hashes) + level_adjustment)
404
+ return "#" * new_level + " " + content
405
+
406
+ # Apply heading adjustment
407
+ return re.sub(r"^(#+)\s(.+)$", adjust_heading_level, section_markdown, flags=re.MULTILINE)
408
+
409
+
410
+ def create_long_writer_agent(model: Any | None = None) -> LongWriterAgent:
411
+ """
412
+ Factory function to create a long writer agent.
413
+
414
+ Args:
415
+ model: Optional Pydantic AI model. If None, uses settings default.
416
+
417
+ Returns:
418
+ Configured LongWriterAgent instance
419
+
420
+ Raises:
421
+ ConfigurationError: If required API keys are missing
422
+ """
423
+ try:
424
+ if model is None:
425
+ model = get_model()
426
+
427
+ return LongWriterAgent(model=model)
428
+
429
+ except Exception as e:
430
+ logger.error("Failed to create long writer agent", error=str(e))
431
+ raise ConfigurationError(f"Failed to create long writer agent: {e}") from e
src/agents/magentic_agents.py CHANGED
@@ -1,7 +1,8 @@
1
  """Magentic-compatible agents using ChatAgent pattern."""
2
 
 
 
3
  from agent_framework import ChatAgent
4
- from agent_framework.openai import OpenAIChatClient
5
 
6
  from src.agents.tools import (
7
  get_bibliography,
@@ -9,22 +10,20 @@ from src.agents.tools import (
9
  search_preprints,
10
  search_pubmed,
11
  )
12
- from src.utils.config import settings
13
 
14
 
15
- def create_search_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
16
  """Create a search agent with internal LLM and search tools.
17
 
18
  Args:
19
- chat_client: Optional custom chat client. If None, uses default.
 
20
 
21
  Returns:
22
  ChatAgent configured for biomedical search
23
  """
24
- client = chat_client or OpenAIChatClient(
25
- model_id=settings.openai_model, # Use configured model
26
- api_key=settings.openai_api_key,
27
- )
28
 
29
  return ChatAgent(
30
  name="SearchAgent",
@@ -50,19 +49,17 @@ Focus on finding: mechanisms of action, clinical evidence, and specific drug can
50
  )
51
 
52
 
53
- def create_judge_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
54
  """Create a judge agent that evaluates evidence quality.
55
 
56
  Args:
57
- chat_client: Optional custom chat client. If None, uses default.
 
58
 
59
  Returns:
60
  ChatAgent configured for evidence assessment
61
  """
62
- client = chat_client or OpenAIChatClient(
63
- model_id=settings.openai_model,
64
- api_key=settings.openai_api_key,
65
- )
66
 
67
  return ChatAgent(
68
  name="JudgeAgent",
@@ -89,19 +86,17 @@ Be rigorous but fair. Look for:
89
  )
90
 
91
 
92
- def create_hypothesis_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
93
  """Create a hypothesis generation agent.
94
 
95
  Args:
96
- chat_client: Optional custom chat client. If None, uses default.
 
97
 
98
  Returns:
99
  ChatAgent configured for hypothesis generation
100
  """
101
- client = chat_client or OpenAIChatClient(
102
- model_id=settings.openai_model,
103
- api_key=settings.openai_api_key,
104
- )
105
 
106
  return ChatAgent(
107
  name="HypothesisAgent",
@@ -126,19 +121,17 @@ Focus on mechanistic plausibility and existing evidence.""",
126
  )
127
 
128
 
129
- def create_report_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
130
  """Create a report synthesis agent.
131
 
132
  Args:
133
- chat_client: Optional custom chat client. If None, uses default.
 
134
 
135
  Returns:
136
  ChatAgent configured for report generation
137
  """
138
- client = chat_client or OpenAIChatClient(
139
- model_id=settings.openai_model,
140
- api_key=settings.openai_api_key,
141
- )
142
 
143
  return ChatAgent(
144
  name="ReportAgent",
 
1
  """Magentic-compatible agents using ChatAgent pattern."""
2
 
3
+ from typing import Any
4
+
5
  from agent_framework import ChatAgent
 
6
 
7
  from src.agents.tools import (
8
  get_bibliography,
 
10
  search_preprints,
11
  search_pubmed,
12
  )
13
+ from src.utils.llm_factory import get_chat_client_for_agent
14
 
15
 
16
+ def create_search_agent(chat_client: Any | None = None) -> ChatAgent:
17
  """Create a search agent with internal LLM and search tools.
18
 
19
  Args:
20
+ chat_client: Optional custom chat client. If None, uses factory default
21
+ (HuggingFace preferred, OpenAI fallback).
22
 
23
  Returns:
24
  ChatAgent configured for biomedical search
25
  """
26
+ client = chat_client or get_chat_client_for_agent()
 
 
 
27
 
28
  return ChatAgent(
29
  name="SearchAgent",
 
49
  )
50
 
51
 
52
+ def create_judge_agent(chat_client: Any | None = None) -> ChatAgent:
53
  """Create a judge agent that evaluates evidence quality.
54
 
55
  Args:
56
+ chat_client: Optional custom chat client. If None, uses factory default
57
+ (HuggingFace preferred, OpenAI fallback).
58
 
59
  Returns:
60
  ChatAgent configured for evidence assessment
61
  """
62
+ client = chat_client or get_chat_client_for_agent()
 
 
 
63
 
64
  return ChatAgent(
65
  name="JudgeAgent",
 
86
  )
87
 
88
 
89
+ def create_hypothesis_agent(chat_client: Any | None = None) -> ChatAgent:
90
  """Create a hypothesis generation agent.
91
 
92
  Args:
93
+ chat_client: Optional custom chat client. If None, uses factory default
94
+ (HuggingFace preferred, OpenAI fallback).
95
 
96
  Returns:
97
  ChatAgent configured for hypothesis generation
98
  """
99
+ client = chat_client or get_chat_client_for_agent()
 
 
 
100
 
101
  return ChatAgent(
102
  name="HypothesisAgent",
 
121
  )
122
 
123
 
124
+ def create_report_agent(chat_client: Any | None = None) -> ChatAgent:
125
  """Create a report synthesis agent.
126
 
127
  Args:
128
+ chat_client: Optional custom chat client. If None, uses factory default
129
+ (HuggingFace preferred, OpenAI fallback).
130
 
131
  Returns:
132
  ChatAgent configured for report generation
133
  """
134
+ client = chat_client or get_chat_client_for_agent()
 
 
 
135
 
136
  return ChatAgent(
137
  name="ReportAgent",
src/agents/proofreader.py ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Proofreader agent for finalizing report drafts.
2
+
3
+ Converts the folder/proofreader_agent.py implementation to use Pydantic AI.
4
+ """
5
+
6
+ from datetime import datetime
7
+ from typing import Any
8
+
9
+ import structlog
10
+ from pydantic_ai import Agent
11
+
12
+ from src.agent_factory.judges import get_model
13
+ from src.utils.exceptions import ConfigurationError
14
+ from src.utils.models import ReportDraft
15
+
16
+ logger = structlog.get_logger()
17
+
18
+
19
+ # System prompt for the proofreader agent
20
+ SYSTEM_PROMPT = f"""
21
+ You are a research expert who proofreads and edits research reports.
22
+ Today's date is {datetime.now().strftime("%Y-%m-%d")}.
23
+
24
+ You are given:
25
+ 1. The original query topic for the report
26
+ 2. A first draft of the report in ReportDraft format containing each section in sequence
27
+
28
+ Your task is to:
29
+ 1. **Combine sections:** Concatenate the sections into a single string
30
+ 2. **Add section titles:** Add the section titles to the beginning of each section in markdown format, as well as a main title for the report
31
+ 3. **De-duplicate:** Remove duplicate content across sections to avoid repetition
32
+ 4. **Remove irrelevant sections:** If any sections or sub-sections are completely irrelevant to the query, remove them
33
+ 5. **Refine wording:** Edit the wording of the report to be polished, concise and punchy, but **without eliminating any detail** or large chunks of text
34
+ 6. **Add a summary:** Add a short report summary / outline to the beginning of the report to provide an overview of the sections and what is discussed
35
+ 7. **Preserve sources:** Preserve all sources / references - move the long list of references to the end of the report
36
+ 8. **Update reference numbers:** Continue to include reference numbers in square brackets ([1], [2], [3], etc.) in the main body of the report, but update the numbering to match the new order of references at the end of the report
37
+ 9. **Output final report:** Output the final report in markdown format (do not wrap it in a code block)
38
+
39
+ Guidelines:
40
+ - Do not add any new facts or data to the report
41
+ - Do not remove any content from the report unless it is very clearly wrong, contradictory or irrelevant
42
+ - Remove or reformat any redundant or excessive headings, and ensure that the final nesting of heading levels is correct
43
+ - Ensure that the final report flows well and has a logical structure
44
+ - Include all sources and references that are present in the final report
45
+ """
46
+
47
+
48
+ class ProofreaderAgent:
49
+ """
50
+ Agent that proofreads and finalizes report drafts.
51
+
52
+ Uses Pydantic AI to generate polished markdown reports from draft sections.
53
+ """
54
+
55
+ def __init__(self, model: Any | None = None) -> None:
56
+ """
57
+ Initialize the proofreader agent.
58
+
59
+ Args:
60
+ model: Optional Pydantic AI model. If None, uses config default.
61
+ """
62
+ self.model = model or get_model()
63
+ self.logger = logger
64
+
65
+ # Initialize Pydantic AI Agent (no structured output - returns markdown text)
66
+ self.agent = Agent(
67
+ model=self.model,
68
+ system_prompt=SYSTEM_PROMPT,
69
+ retries=3,
70
+ )
71
+
72
+ async def proofread(
73
+ self,
74
+ query: str,
75
+ report_draft: ReportDraft,
76
+ ) -> str:
77
+ """
78
+ Proofread and finalize a report draft.
79
+
80
+ Args:
81
+ query: The original research query
82
+ report_draft: ReportDraft with all sections
83
+
84
+ Returns:
85
+ Final polished markdown report string
86
+
87
+ Raises:
88
+ ConfigurationError: If proofreading fails
89
+ """
90
+ # Input validation
91
+ if not query or not query.strip():
92
+ self.logger.warning("Empty query provided, using default")
93
+ query = "Research query"
94
+
95
+ if not report_draft or not report_draft.sections:
96
+ self.logger.warning("Empty report draft provided, returning minimal report")
97
+ return f"# Research Report\n\n## Query\n{query}\n\n*No sections available.*"
98
+
99
+ # Validate section structure
100
+ valid_sections = []
101
+ for section in report_draft.sections:
102
+ if section.section_title and section.section_title.strip():
103
+ valid_sections.append(section)
104
+ else:
105
+ self.logger.warning("Skipping section with empty title")
106
+
107
+ if not valid_sections:
108
+ self.logger.warning("No valid sections in draft, returning minimal report")
109
+ return f"# Research Report\n\n## Query\n{query}\n\n*No valid sections available.*"
110
+
111
+ self.logger.info(
112
+ "Proofreading report",
113
+ query=query[:100],
114
+ sections_count=len(valid_sections),
115
+ )
116
+
117
+ # Create validated draft
118
+ validated_draft = ReportDraft(sections=valid_sections)
119
+
120
+ user_message = f"""
121
+ QUERY:
122
+ {query}
123
+
124
+ REPORT DRAFT:
125
+ {validated_draft.model_dump_json()}
126
+ """
127
+
128
+ # Retry logic for transient failures
129
+ max_retries = 3
130
+ last_exception: Exception | None = None
131
+
132
+ for attempt in range(max_retries):
133
+ try:
134
+ # Run the agent
135
+ result = await self.agent.run(user_message)
136
+ final_report = result.output
137
+
138
+ # Validate output
139
+ if not final_report or not final_report.strip():
140
+ self.logger.warning("Empty report generated, using fallback")
141
+ raise ValueError("Empty report generated")
142
+
143
+ self.logger.info("Report proofread", length=len(final_report), attempt=attempt + 1)
144
+
145
+ return final_report
146
+
147
+ except (TimeoutError, ConnectionError) as e:
148
+ # Transient errors - retry
149
+ last_exception = e
150
+ if attempt < max_retries - 1:
151
+ self.logger.warning(
152
+ "Transient error, retrying",
153
+ error=str(e),
154
+ attempt=attempt + 1,
155
+ max_retries=max_retries,
156
+ )
157
+ continue
158
+ else:
159
+ self.logger.error("Max retries exceeded for transient error", error=str(e))
160
+ break
161
+
162
+ except Exception as e:
163
+ # Non-transient errors - don't retry
164
+ last_exception = e
165
+ self.logger.error(
166
+ "Proofreading failed",
167
+ error=str(e),
168
+ error_type=type(e).__name__,
169
+ )
170
+ break
171
+
172
+ # Return fallback: combine sections manually
173
+ self.logger.error(
174
+ "Proofreading failed after all attempts",
175
+ error=str(last_exception) if last_exception else "Unknown error",
176
+ )
177
+ sections = [
178
+ f"## {section.section_title}\n\n{section.section_content or 'Content unavailable.'}"
179
+ for section in valid_sections
180
+ ]
181
+ return f"# Research Report\n\n## Query\n{query}\n\n" + "\n\n".join(sections)
182
+
183
+
184
+ def create_proofreader_agent(model: Any | None = None) -> ProofreaderAgent:
185
+ """
186
+ Factory function to create a proofreader agent.
187
+
188
+ Args:
189
+ model: Optional Pydantic AI model. If None, uses settings default.
190
+
191
+ Returns:
192
+ Configured ProofreaderAgent instance
193
+
194
+ Raises:
195
+ ConfigurationError: If required API keys are missing
196
+ """
197
+ try:
198
+ if model is None:
199
+ model = get_model()
200
+
201
+ return ProofreaderAgent(model=model)
202
+
203
+ except Exception as e:
204
+ logger.error("Failed to create proofreader agent", error=str(e))
205
+ raise ConfigurationError(f"Failed to create proofreader agent: {e}") from e
src/agents/retrieval_agent.py CHANGED
@@ -1,12 +1,13 @@
1
  """Retrieval agent for web search and context management."""
2
 
 
 
3
  import structlog
4
  from agent_framework import ChatAgent, ai_function
5
- from agent_framework.openai import OpenAIChatClient
6
 
7
- from src.state import get_magentic_state
8
  from src.tools.web_search import WebSearchTool
9
- from src.utils.config import settings
10
 
11
  logger = structlog.get_logger()
12
 
@@ -56,19 +57,17 @@ async def search_web(query: str, max_results: int = 10) -> str:
56
  return "\n".join(output)
57
 
58
 
59
- def create_retrieval_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
60
  """Create a retrieval agent.
61
 
62
  Args:
63
- chat_client: Optional custom chat client.
 
64
 
65
  Returns:
66
  ChatAgent configured for retrieval.
67
  """
68
- client = chat_client or OpenAIChatClient(
69
- model_id=settings.openai_model,
70
- api_key=settings.openai_api_key,
71
- )
72
 
73
  return ChatAgent(
74
  name="RetrievalAgent",
 
1
  """Retrieval agent for web search and context management."""
2
 
3
+ from typing import Any
4
+
5
  import structlog
6
  from agent_framework import ChatAgent, ai_function
 
7
 
8
+ from src.agents.state import get_magentic_state
9
  from src.tools.web_search import WebSearchTool
10
+ from src.utils.llm_factory import get_chat_client_for_agent
11
 
12
  logger = structlog.get_logger()
13
 
 
57
  return "\n".join(output)
58
 
59
 
60
+ def create_retrieval_agent(chat_client: Any | None = None) -> ChatAgent:
61
  """Create a retrieval agent.
62
 
63
  Args:
64
+ chat_client: Optional custom chat client. If None, uses factory default
65
+ (HuggingFace preferred, OpenAI fallback).
66
 
67
  Returns:
68
  ChatAgent configured for retrieval.
69
  """
70
+ client = chat_client or get_chat_client_for_agent()
 
 
 
71
 
72
  return ChatAgent(
73
  name="RetrievalAgent",
src/agents/search_agent.py CHANGED
@@ -10,7 +10,7 @@ from agent_framework import (
10
  Role,
11
  )
12
 
13
- from src.orchestrator import SearchHandlerProtocol
14
  from src.utils.models import Citation, Evidence, SearchResult
15
 
16
  if TYPE_CHECKING:
 
10
  Role,
11
  )
12
 
13
+ from src.legacy_orchestrator import SearchHandlerProtocol
14
  from src.utils.models import Citation, Evidence, SearchResult
15
 
16
  if TYPE_CHECKING:
src/agents/state.py CHANGED
@@ -1,9 +1,11 @@
1
  """Thread-safe state management for Magentic agents.
2
 
3
- Uses contextvars to ensure isolation between concurrent requests (e.g., multiple users
4
- searching simultaneously via Gradio).
 
5
  """
6
 
 
7
  from contextvars import ContextVar
8
  from typing import TYPE_CHECKING, Any
9
 
@@ -15,8 +17,20 @@ if TYPE_CHECKING:
15
  from src.services.embeddings import EmbeddingService
16
 
17
 
 
 
 
 
 
 
 
 
 
18
  class MagenticState(BaseModel):
19
- """Mutable state for a Magentic workflow session."""
 
 
 
20
 
21
  evidence: list[Evidence] = Field(default_factory=list)
22
  # Type as Any to avoid circular imports/runtime resolution issues
@@ -75,14 +89,22 @@ _magentic_state_var: ContextVar[MagenticState | None] = ContextVar("magentic_sta
75
 
76
 
77
  def init_magentic_state(embedding_service: "EmbeddingService | None" = None) -> MagenticState:
78
- """Initialize a new state for the current context."""
 
 
 
 
79
  state = MagenticState(embedding_service=embedding_service)
80
  _magentic_state_var.set(state)
81
  return state
82
 
83
 
84
  def get_magentic_state() -> MagenticState:
85
- """Get the current state. Raises RuntimeError if not initialized."""
 
 
 
 
86
  state = _magentic_state_var.get()
87
  if state is None:
88
  # Auto-initialize if missing (e.g. during tests or simple scripts)
 
1
  """Thread-safe state management for Magentic agents.
2
 
3
+ DEPRECATED: This module is deprecated. Use src.middleware.state_machine instead.
4
+
5
+ This file is kept for backward compatibility and will be removed in a future version.
6
  """
7
 
8
+ import warnings
9
  from contextvars import ContextVar
10
  from typing import TYPE_CHECKING, Any
11
 
 
17
  from src.services.embeddings import EmbeddingService
18
 
19
 
20
+ def _deprecation_warning() -> None:
21
+ """Emit deprecation warning for this module."""
22
+ warnings.warn(
23
+ "src.agents.state is deprecated. Use src.middleware.state_machine instead.",
24
+ DeprecationWarning,
25
+ stacklevel=3,
26
+ )
27
+
28
+
29
  class MagenticState(BaseModel):
30
+ """Mutable state for a Magentic workflow session.
31
+
32
+ DEPRECATED: Use WorkflowState from src.middleware.state_machine instead.
33
+ """
34
 
35
  evidence: list[Evidence] = Field(default_factory=list)
36
  # Type as Any to avoid circular imports/runtime resolution issues
 
89
 
90
 
91
  def init_magentic_state(embedding_service: "EmbeddingService | None" = None) -> MagenticState:
92
+ """Initialize a new state for the current context.
93
+
94
+ DEPRECATED: Use init_workflow_state from src.middleware.state_machine instead.
95
+ """
96
+ _deprecation_warning()
97
  state = MagenticState(embedding_service=embedding_service)
98
  _magentic_state_var.set(state)
99
  return state
100
 
101
 
102
  def get_magentic_state() -> MagenticState:
103
+ """Get the current state. Raises RuntimeError if not initialized.
104
+
105
+ DEPRECATED: Use get_workflow_state from src.middleware.state_machine instead.
106
+ """
107
+ _deprecation_warning()
108
  state = _magentic_state_var.get()
109
  if state is None:
110
  # Auto-initialize if missing (e.g. during tests or simple scripts)
src/agents/thinking.py ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Thinking agent for generating observations and reflections.
2
+
3
+ Converts the folder/thinking_agent.py implementation to use Pydantic AI.
4
+ """
5
+
6
+ from datetime import datetime
7
+ from typing import Any
8
+
9
+ import structlog
10
+ from pydantic_ai import Agent
11
+
12
+ from src.agent_factory.judges import get_model
13
+ from src.utils.exceptions import ConfigurationError
14
+
15
+ logger = structlog.get_logger()
16
+
17
+
18
+ # System prompt for the thinking agent
19
+ SYSTEM_PROMPT = f"""
20
+ You are a research expert who is managing a research process in iterations. Today's date is {datetime.now().strftime("%Y-%m-%d")}.
21
+
22
+ You are given:
23
+ 1. The original research query along with some supporting background context
24
+ 2. A history of the tasks, actions, findings and thoughts you've made up until this point in the research process (on iteration 1 you will be at the start of the research process, so this will be empty)
25
+
26
+ Your objective is to reflect on the research process so far and share your latest thoughts.
27
+
28
+ Specifically, your thoughts should include reflections on questions such as:
29
+ - What have you learned from the last iteration?
30
+ - What new areas would you like to explore next, or existing topics you'd like to go deeper into?
31
+ - Were you able to retrieve the information you were looking for in the last iteration?
32
+ - If not, should we change our approach or move to the next topic?
33
+ - Is there any info that is contradictory or conflicting?
34
+
35
+ Guidelines:
36
+ - Share your stream of consciousness on the above questions as raw text
37
+ - Keep your response concise and informal
38
+ - Focus most of your thoughts on the most recent iteration and how that influences this next iteration
39
+ - Our aim is to do very deep and thorough research - bear this in mind when reflecting on the research process
40
+ - DO NOT produce a draft of the final report. This is not your job.
41
+ - If this is the first iteration (i.e. no data from prior iterations), provide thoughts on what info we need to gather in the first iteration to get started
42
+ """
43
+
44
+
45
+ class ThinkingAgent:
46
+ """
47
+ Agent that generates observations and reflections on the research process.
48
+
49
+ Uses Pydantic AI to generate unstructured text observations about
50
+ the current state of research and next steps.
51
+ """
52
+
53
+ def __init__(self, model: Any | None = None) -> None:
54
+ """
55
+ Initialize the thinking agent.
56
+
57
+ Args:
58
+ model: Optional Pydantic AI model. If None, uses config default.
59
+ """
60
+ self.model = model or get_model()
61
+ self.logger = logger
62
+
63
+ # Initialize Pydantic AI Agent (no structured output - returns text)
64
+ self.agent = Agent(
65
+ model=self.model,
66
+ system_prompt=SYSTEM_PROMPT,
67
+ retries=3,
68
+ )
69
+
70
+ async def generate_observations(
71
+ self,
72
+ query: str,
73
+ background_context: str = "",
74
+ conversation_history: str = "",
75
+ iteration: int = 1,
76
+ ) -> str:
77
+ """
78
+ Generate observations about the research process.
79
+
80
+ Args:
81
+ query: The original research query
82
+ background_context: Optional background context
83
+ conversation_history: History of actions, findings, and thoughts
84
+ iteration: Current iteration number
85
+
86
+ Returns:
87
+ String containing observations and reflections
88
+
89
+ Raises:
90
+ ConfigurationError: If generation fails
91
+ """
92
+ self.logger.info(
93
+ "Generating observations",
94
+ query=query[:100],
95
+ iteration=iteration,
96
+ )
97
+
98
+ background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
99
+
100
+ user_message = f"""
101
+ You are starting iteration {iteration} of your research process.
102
+
103
+ ORIGINAL QUERY:
104
+ {query}
105
+
106
+ {background}
107
+
108
+ HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
109
+ {conversation_history or "No previous actions, findings or thoughts available."}
110
+ """
111
+
112
+ try:
113
+ # Run the agent
114
+ result = await self.agent.run(user_message)
115
+ observations = result.output
116
+
117
+ self.logger.info("Observations generated", length=len(observations))
118
+
119
+ return observations
120
+
121
+ except Exception as e:
122
+ self.logger.error("Observation generation failed", error=str(e))
123
+ # Return fallback observations
124
+ return f"Starting iteration {iteration}. Need to gather information about: {query}"
125
+
126
+
127
+ def create_thinking_agent(model: Any | None = None) -> ThinkingAgent:
128
+ """
129
+ Factory function to create a thinking agent.
130
+
131
+ Args:
132
+ model: Optional Pydantic AI model. If None, uses settings default.
133
+
134
+ Returns:
135
+ Configured ThinkingAgent instance
136
+
137
+ Raises:
138
+ ConfigurationError: If required API keys are missing
139
+ """
140
+ try:
141
+ if model is None:
142
+ model = get_model()
143
+
144
+ return ThinkingAgent(model=model)
145
+
146
+ except Exception as e:
147
+ logger.error("Failed to create thinking agent", error=str(e))
148
+ raise ConfigurationError(f"Failed to create thinking agent: {e}") from e
src/agents/tool_selector.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tool selector agent for choosing which tools to use for knowledge gaps.
2
+
3
+ Converts the folder/tool_selector_agent.py implementation to use Pydantic AI.
4
+ """
5
+
6
+ from datetime import datetime
7
+ from typing import Any
8
+
9
+ import structlog
10
+ from pydantic_ai import Agent
11
+
12
+ from src.agent_factory.judges import get_model
13
+ from src.utils.exceptions import ConfigurationError
14
+ from src.utils.models import AgentSelectionPlan
15
+
16
+ logger = structlog.get_logger()
17
+
18
+
19
+ # System prompt for the tool selector agent
20
+ SYSTEM_PROMPT = f"""
21
+ You are a Tool Selector responsible for determining which specialized agents should address a knowledge gap in a research project.
22
+ Today's date is {datetime.now().strftime("%Y-%m-%d")}.
23
+
24
+ You will be given:
25
+ 1. The original user query
26
+ 2. A knowledge gap identified in the research
27
+ 3. A full history of the tasks, actions, findings and thoughts you've made up until this point in the research process
28
+
29
+ Your task is to decide:
30
+ 1. Which specialized agents are best suited to address the gap
31
+ 2. What specific queries should be given to the agents (keep this short - 3-6 words)
32
+
33
+ Available specialized agents:
34
+ - WebSearchAgent: General web search for broad topics (can be called multiple times with different queries)
35
+ - SiteCrawlerAgent: Crawl the pages of a specific website to retrieve information about it - use this if you want to find out something about a particular company, entity or product
36
+ - RAGAgent: Semantic search within previously collected evidence - use when you need to find information from evidence already gathered in this research session. Best for finding connections, summarizing collected evidence, or retrieving specific details from earlier findings.
37
+
38
+ Guidelines:
39
+ - Aim to call at most 3 agents at a time in your final output
40
+ - You can list the WebSearchAgent multiple times with different queries if needed to cover the full scope of the knowledge gap
41
+ - Be specific and concise (3-6 words) with the agent queries - they should target exactly what information is needed
42
+ - If you know the website or domain name of an entity being researched, always include it in the query
43
+ - Use RAGAgent when: (1) You need to search within evidence already collected, (2) You want to find connections between different findings, (3) You need to retrieve specific details from earlier research iterations
44
+ - Use WebSearchAgent or SiteCrawlerAgent when: (1) You need fresh information from the web, (2) You're starting a new research direction, (3) You need information not yet in the collected evidence
45
+ - If a gap doesn't clearly match any agent's capability, default to the WebSearchAgent
46
+ - Use the history of actions / tool calls as a guide - try not to repeat yourself if an approach didn't work previously
47
+
48
+ Only output JSON. Follow the JSON schema for AgentSelectionPlan. Do not output anything else.
49
+ """
50
+
51
+
52
+ class ToolSelectorAgent:
53
+ """
54
+ Agent that selects appropriate tools to address knowledge gaps.
55
+
56
+ Uses Pydantic AI to generate structured AgentSelectionPlan with
57
+ specific tasks for web search and crawl agents.
58
+ """
59
+
60
+ def __init__(self, model: Any | None = None) -> None:
61
+ """
62
+ Initialize the tool selector agent.
63
+
64
+ Args:
65
+ model: Optional Pydantic AI model. If None, uses config default.
66
+ """
67
+ self.model = model or get_model()
68
+ self.logger = logger
69
+
70
+ # Initialize Pydantic AI Agent
71
+ self.agent = Agent(
72
+ model=self.model,
73
+ output_type=AgentSelectionPlan,
74
+ system_prompt=SYSTEM_PROMPT,
75
+ retries=3,
76
+ )
77
+
78
+ async def select_tools(
79
+ self,
80
+ gap: str,
81
+ query: str,
82
+ background_context: str = "",
83
+ conversation_history: str = "",
84
+ ) -> AgentSelectionPlan:
85
+ """
86
+ Select tools to address a knowledge gap.
87
+
88
+ Args:
89
+ gap: The knowledge gap to address
90
+ query: The original research query
91
+ background_context: Optional background context
92
+ conversation_history: History of actions, findings, and thoughts
93
+
94
+ Returns:
95
+ AgentSelectionPlan with tasks for selected agents
96
+
97
+ Raises:
98
+ ConfigurationError: If selection fails
99
+ """
100
+ self.logger.info("Selecting tools for gap", gap=gap[:100], query=query[:100])
101
+
102
+ background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
103
+
104
+ user_message = f"""
105
+ ORIGINAL QUERY:
106
+ {query}
107
+
108
+ KNOWLEDGE GAP TO ADDRESS:
109
+ {gap}
110
+
111
+ {background}
112
+
113
+ HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
114
+ {conversation_history or "No previous actions, findings or thoughts available."}
115
+ """
116
+
117
+ try:
118
+ # Run the agent
119
+ result = await self.agent.run(user_message)
120
+ selection_plan = result.output
121
+
122
+ self.logger.info(
123
+ "Tool selection complete",
124
+ tasks_count=len(selection_plan.tasks),
125
+ agents=[task.agent for task in selection_plan.tasks],
126
+ )
127
+
128
+ return selection_plan
129
+
130
+ except Exception as e:
131
+ self.logger.error("Tool selection failed", error=str(e))
132
+ # Return fallback: use web search
133
+ from src.utils.models import AgentTask
134
+
135
+ return AgentSelectionPlan(
136
+ tasks=[
137
+ AgentTask(
138
+ gap=gap,
139
+ agent="WebSearchAgent",
140
+ query=gap[:50], # Use gap as query
141
+ entity_website=None,
142
+ )
143
+ ]
144
+ )
145
+
146
+
147
+ def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent:
148
+ """
149
+ Factory function to create a tool selector agent.
150
+
151
+ Args:
152
+ model: Optional Pydantic AI model. If None, uses settings default.
153
+
154
+ Returns:
155
+ Configured ToolSelectorAgent instance
156
+
157
+ Raises:
158
+ ConfigurationError: If required API keys are missing
159
+ """
160
+ try:
161
+ if model is None:
162
+ model = get_model()
163
+
164
+ return ToolSelectorAgent(model=model)
165
+
166
+ except Exception as e:
167
+ logger.error("Failed to create tool selector agent", error=str(e))
168
+ raise ConfigurationError(f"Failed to create tool selector agent: {e}") from e
src/agents/writer.py ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Writer agent for generating final reports from findings.
2
+
3
+ Converts the folder/writer_agent.py implementation to use Pydantic AI.
4
+ """
5
+
6
+ from datetime import datetime
7
+ from typing import Any
8
+
9
+ import structlog
10
+ from pydantic_ai import Agent
11
+
12
+ from src.agent_factory.judges import get_model
13
+ from src.utils.exceptions import ConfigurationError
14
+
15
+ logger = structlog.get_logger()
16
+
17
+
18
+ # System prompt for the writer agent
19
+ SYSTEM_PROMPT = f"""
20
+ You are a senior researcher tasked with comprehensively answering a research query.
21
+ Today's date is {datetime.now().strftime("%Y-%m-%d")}.
22
+ You will be provided with the original query along with research findings put together by a research assistant.
23
+ Your objective is to generate the final response in markdown format.
24
+ The response should be as lengthy and detailed as possible with the information provided, focusing on answering the original query.
25
+ In your final output, include references to the source URLs for all information and data gathered.
26
+ This should be formatted in the form of a numbered square bracket next to the relevant information,
27
+ followed by a list of URLs at the end of the response, per the example below.
28
+
29
+ EXAMPLE REFERENCE FORMAT:
30
+ The company has XYZ products [1]. It operates in the software services market which is expected to grow at 10% per year [2].
31
+
32
+ References:
33
+ [1] https://example.com/first-source-url
34
+ [2] https://example.com/second-source-url
35
+
36
+ GUIDELINES:
37
+ * Answer the query directly, do not include unrelated or tangential information.
38
+ * Adhere to any instructions on the length of your final response if provided in the user prompt.
39
+ * If any additional guidelines are provided in the user prompt, follow them exactly and give them precedence over these system instructions.
40
+ """
41
+
42
+
43
+ class WriterAgent:
44
+ """
45
+ Agent that generates final reports from research findings.
46
+
47
+ Uses Pydantic AI to generate markdown reports with citations.
48
+ """
49
+
50
+ def __init__(self, model: Any | None = None) -> None:
51
+ """
52
+ Initialize the writer agent.
53
+
54
+ Args:
55
+ model: Optional Pydantic AI model. If None, uses config default.
56
+ """
57
+ self.model = model or get_model()
58
+ self.logger = logger
59
+
60
+ # Initialize Pydantic AI Agent (no structured output - returns markdown text)
61
+ self.agent = Agent(
62
+ model=self.model,
63
+ system_prompt=SYSTEM_PROMPT,
64
+ retries=3,
65
+ )
66
+
67
+ async def write_report(
68
+ self,
69
+ query: str,
70
+ findings: str,
71
+ output_length: str = "",
72
+ output_instructions: str = "",
73
+ ) -> str:
74
+ """
75
+ Write a final report from findings.
76
+
77
+ Args:
78
+ query: The original research query
79
+ findings: All findings collected during research
80
+ output_length: Optional description of desired output length
81
+ output_instructions: Optional additional instructions
82
+
83
+ Returns:
84
+ Markdown formatted report string
85
+
86
+ Raises:
87
+ ConfigurationError: If writing fails
88
+ """
89
+ # Input validation
90
+ if not query or not query.strip():
91
+ self.logger.warning("Empty query provided, using default")
92
+ query = "Research query"
93
+
94
+ if findings is None:
95
+ self.logger.warning("None findings provided, using empty string")
96
+ findings = "No findings available."
97
+
98
+ # Truncate very long inputs to prevent context overflow
99
+ max_findings_length = 50000 # ~12k tokens
100
+ if len(findings) > max_findings_length:
101
+ self.logger.warning(
102
+ "Findings too long, truncating",
103
+ original_length=len(findings),
104
+ truncated_length=max_findings_length,
105
+ )
106
+ findings = findings[:max_findings_length] + "\n\n[Content truncated due to length]"
107
+
108
+ self.logger.info("Writing final report", query=query[:100], findings_length=len(findings))
109
+
110
+ length_str = (
111
+ f"* The full response should be approximately {output_length}.\n"
112
+ if output_length
113
+ else ""
114
+ )
115
+ instructions_str = f"* {output_instructions}" if output_instructions else ""
116
+ guidelines_str = (
117
+ ("\n\nGUIDELINES:\n" + length_str + instructions_str).strip("\n")
118
+ if length_str or instructions_str
119
+ else ""
120
+ )
121
+
122
+ user_message = f"""
123
+ Provide a response based on the query and findings below with as much detail as possible. {guidelines_str}
124
+
125
+ QUERY: {query}
126
+
127
+ FINDINGS:
128
+ {findings}
129
+ """
130
+
131
+ # Retry logic for transient failures
132
+ max_retries = 3
133
+ last_exception: Exception | None = None
134
+
135
+ for attempt in range(max_retries):
136
+ try:
137
+ # Run the agent
138
+ result = await self.agent.run(user_message)
139
+ report = result.output
140
+
141
+ # Validate output
142
+ if not report or not report.strip():
143
+ self.logger.warning("Empty report generated, using fallback")
144
+ raise ValueError("Empty report generated")
145
+
146
+ self.logger.info("Report written", length=len(report), attempt=attempt + 1)
147
+
148
+ return report
149
+
150
+ except (TimeoutError, ConnectionError) as e:
151
+ # Transient errors - retry
152
+ last_exception = e
153
+ if attempt < max_retries - 1:
154
+ self.logger.warning(
155
+ "Transient error, retrying",
156
+ error=str(e),
157
+ attempt=attempt + 1,
158
+ max_retries=max_retries,
159
+ )
160
+ continue
161
+ else:
162
+ self.logger.error("Max retries exceeded for transient error", error=str(e))
163
+ break
164
+
165
+ except Exception as e:
166
+ # Non-transient errors - don't retry
167
+ last_exception = e
168
+ self.logger.error(
169
+ "Report writing failed", error=str(e), error_type=type(e).__name__
170
+ )
171
+ break
172
+
173
+ # Return fallback report if all attempts failed
174
+ self.logger.error(
175
+ "Report writing failed after all attempts",
176
+ error=str(last_exception) if last_exception else "Unknown error",
177
+ )
178
+ # Truncate findings in fallback if too long
179
+ fallback_findings = findings[:500] + "..." if len(findings) > 500 else findings
180
+ return (
181
+ f"# Research Report\n\n"
182
+ f"## Query\n{query}\n\n"
183
+ f"## Findings\n{fallback_findings}\n\n"
184
+ f"*Note: Report generation encountered an error. This is a fallback report.*"
185
+ )
186
+
187
+
188
+ def create_writer_agent(model: Any | None = None) -> WriterAgent:
189
+ """
190
+ Factory function to create a writer agent.
191
+
192
+ Args:
193
+ model: Optional Pydantic AI model. If None, uses settings default.
194
+
195
+ Returns:
196
+ Configured WriterAgent instance
197
+
198
+ Raises:
199
+ ConfigurationError: If required API keys are missing
200
+ """
201
+ try:
202
+ if model is None:
203
+ model = get_model()
204
+
205
+ return WriterAgent(model=model)
206
+
207
+ except Exception as e:
208
+ logger.error("Failed to create writer agent", error=str(e))
209
+ raise ConfigurationError(f"Failed to create writer agent: {e}") from e
src/app.py CHANGED
@@ -6,8 +6,10 @@ from typing import Any
6
 
7
  import gradio as gr
8
  from pydantic_ai.models.anthropic import AnthropicModel
9
- from pydantic_ai.models.openai import OpenAIModel
 
10
  from pydantic_ai.providers.anthropic import AnthropicProvider
 
11
  from pydantic_ai.providers.openai import OpenAIProvider
12
 
13
  from src.agent_factory.judges import HFInferenceJudgeHandler, JudgeHandler, MockJudgeHandler
@@ -24,7 +26,7 @@ def configure_orchestrator(
24
  use_mock: bool = False,
25
  mode: str = "simple",
26
  user_api_key: str | None = None,
27
- api_provider: str = "openai",
28
  ) -> tuple[Any, str]:
29
  """
30
  Create an orchestrator instance.
@@ -33,7 +35,7 @@ def configure_orchestrator(
33
  use_mock: If True, use MockJudgeHandler (no API key needed)
34
  mode: Orchestrator mode ("simple" or "advanced")
35
  user_api_key: Optional user-provided API key (BYOK)
36
- api_provider: API provider ("openai" or "anthropic")
37
 
38
  Returns:
39
  Tuple of (Orchestrator instance, backend_name)
@@ -59,13 +61,17 @@ def configure_orchestrator(
59
  judge_handler = MockJudgeHandler()
60
  backend_info = "Mock (Testing)"
61
 
62
- # 2. Paid API Key (User provided or Env)
63
  elif (
64
  user_api_key
 
 
 
 
65
  or (api_provider == "openai" and os.getenv("OPENAI_API_KEY"))
66
  or (api_provider == "anthropic" and os.getenv("ANTHROPIC_API_KEY"))
67
  ):
68
- model: AnthropicModel | OpenAIModel | None = None
69
  if user_api_key:
70
  # Validate key/provider match to prevent silent auth failures
71
  if api_provider == "openai" and user_api_key.startswith("sk-ant-"):
@@ -75,15 +81,19 @@ def configure_orchestrator(
75
  )
76
  if api_provider == "anthropic" and is_openai_key:
77
  raise ValueError("OpenAI key provided but Anthropic provider selected")
78
- if api_provider == "anthropic":
 
 
 
 
79
  anthropic_provider = AnthropicProvider(api_key=user_api_key)
80
  model = AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
81
  elif api_provider == "openai":
82
  openai_provider = OpenAIProvider(api_key=user_api_key)
83
  model = OpenAIModel(settings.openai_model, provider=openai_provider)
84
- backend_info = f"Paid API ({api_provider.upper()})"
85
  else:
86
- backend_info = "Paid API (Env Config)"
87
 
88
  judge_handler = JudgeHandler(model=model)
89
 
@@ -107,7 +117,7 @@ async def research_agent(
107
  history: list[dict[str, Any]],
108
  mode: str = "simple",
109
  api_key: str = "",
110
- api_provider: str = "openai",
111
  ) -> AsyncGenerator[str, None]:
112
  """
113
  Gradio chat function that runs the research agent.
@@ -117,7 +127,7 @@ async def research_agent(
117
  history: Chat history (Gradio format)
118
  mode: Orchestrator mode ("simple" or "advanced")
119
  api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
120
- api_provider: API provider ("openai" or "anthropic")
121
 
122
  Yields:
123
  Markdown-formatted responses for streaming
@@ -130,6 +140,7 @@ async def research_agent(
130
  user_api_key = api_key.strip() if api_key else None
131
 
132
  # Check available keys
 
133
  has_openai = bool(os.getenv("OPENAI_API_KEY"))
134
  has_anthropic = bool(os.getenv("ANTHROPIC_API_KEY"))
135
  has_user_key = bool(user_api_key)
@@ -149,11 +160,11 @@ async def research_agent(
149
  f"🔑 **Using your {api_provider.upper()} API key** - "
150
  "Your key is used only for this session and is never stored.\n\n"
151
  )
152
- elif not has_paid_key:
153
- # No paid keys - will use FREE HuggingFace Inference
154
  yield (
155
  "🤗 **Free Tier**: Using HuggingFace Inference (Llama 3.1 / Mistral) for AI analysis.\n"
156
- "For premium models, enter an OpenAI or Anthropic API key below.\n\n"
157
  )
158
 
159
  # Run the agent and stream events
@@ -232,8 +243,7 @@ def create_demo() -> gr.ChatInterface:
232
  value="simple",
233
  label="Orchestrator Mode",
234
  info=(
235
- "Simple: Linear (Free Tier Friendly) | "
236
- "Advanced: Multi-Agent (Requires OpenAI)"
237
  ),
238
  ),
239
  gr.Textbox(
@@ -243,10 +253,10 @@ def create_demo() -> gr.ChatInterface:
243
  info="Enter your own API key. Never stored.",
244
  ),
245
  gr.Radio(
246
- choices=["openai", "anthropic"],
247
- value="openai",
248
  label="API Provider",
249
- info="Select the provider for your API key",
250
  ),
251
  ],
252
  )
 
6
 
7
  import gradio as gr
8
  from pydantic_ai.models.anthropic import AnthropicModel
9
+ from pydantic_ai.models.huggingface import HuggingFaceModel
10
+ from pydantic_ai.models.openai import OpenAIChatModel as OpenAIModel
11
  from pydantic_ai.providers.anthropic import AnthropicProvider
12
+ from pydantic_ai.providers.huggingface import HuggingFaceProvider
13
  from pydantic_ai.providers.openai import OpenAIProvider
14
 
15
  from src.agent_factory.judges import HFInferenceJudgeHandler, JudgeHandler, MockJudgeHandler
 
26
  use_mock: bool = False,
27
  mode: str = "simple",
28
  user_api_key: str | None = None,
29
+ api_provider: str = "huggingface",
30
  ) -> tuple[Any, str]:
31
  """
32
  Create an orchestrator instance.
 
35
  use_mock: If True, use MockJudgeHandler (no API key needed)
36
  mode: Orchestrator mode ("simple" or "advanced")
37
  user_api_key: Optional user-provided API key (BYOK)
38
+ api_provider: API provider ("huggingface", "openai", or "anthropic")
39
 
40
  Returns:
41
  Tuple of (Orchestrator instance, backend_name)
 
61
  judge_handler = MockJudgeHandler()
62
  backend_info = "Mock (Testing)"
63
 
64
+ # 2. API Key (User provided or Env) - HuggingFace, OpenAI, or Anthropic
65
  elif (
66
  user_api_key
67
+ or (
68
+ api_provider == "huggingface"
69
+ and (os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACE_API_KEY"))
70
+ )
71
  or (api_provider == "openai" and os.getenv("OPENAI_API_KEY"))
72
  or (api_provider == "anthropic" and os.getenv("ANTHROPIC_API_KEY"))
73
  ):
74
+ model: AnthropicModel | HuggingFaceModel | OpenAIModel | None = None
75
  if user_api_key:
76
  # Validate key/provider match to prevent silent auth failures
77
  if api_provider == "openai" and user_api_key.startswith("sk-ant-"):
 
81
  )
82
  if api_provider == "anthropic" and is_openai_key:
83
  raise ValueError("OpenAI key provided but Anthropic provider selected")
84
+ if api_provider == "huggingface":
85
+ model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
86
+ hf_provider = HuggingFaceProvider(api_key=user_api_key)
87
+ model = HuggingFaceModel(model_name, provider=hf_provider)
88
+ elif api_provider == "anthropic":
89
  anthropic_provider = AnthropicProvider(api_key=user_api_key)
90
  model = AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
91
  elif api_provider == "openai":
92
  openai_provider = OpenAIProvider(api_key=user_api_key)
93
  model = OpenAIModel(settings.openai_model, provider=openai_provider)
94
+ backend_info = f"API ({api_provider.upper()})"
95
  else:
96
+ backend_info = "API (Env Config)"
97
 
98
  judge_handler = JudgeHandler(model=model)
99
 
 
117
  history: list[dict[str, Any]],
118
  mode: str = "simple",
119
  api_key: str = "",
120
+ api_provider: str = "huggingface",
121
  ) -> AsyncGenerator[str, None]:
122
  """
123
  Gradio chat function that runs the research agent.
 
127
  history: Chat history (Gradio format)
128
  mode: Orchestrator mode ("simple" or "advanced")
129
  api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
130
+ api_provider: API provider ("huggingface", "openai", or "anthropic")
131
 
132
  Yields:
133
  Markdown-formatted responses for streaming
 
140
  user_api_key = api_key.strip() if api_key else None
141
 
142
  # Check available keys
143
+ has_huggingface = bool(os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACE_API_KEY"))
144
  has_openai = bool(os.getenv("OPENAI_API_KEY"))
145
  has_anthropic = bool(os.getenv("ANTHROPIC_API_KEY"))
146
  has_user_key = bool(user_api_key)
 
160
  f"🔑 **Using your {api_provider.upper()} API key** - "
161
  "Your key is used only for this session and is never stored.\n\n"
162
  )
163
+ elif not has_paid_key and not has_huggingface:
164
+ # No keys at all - will use FREE HuggingFace Inference (public models)
165
  yield (
166
  "🤗 **Free Tier**: Using HuggingFace Inference (Llama 3.1 / Mistral) for AI analysis.\n"
167
+ "For premium models or higher rate limits, enter a HuggingFace, OpenAI, or Anthropic API key below.\n\n"
168
  )
169
 
170
  # Run the agent and stream events
 
243
  value="simple",
244
  label="Orchestrator Mode",
245
  info=(
246
+ "Simple: Linear (Free Tier Friendly) | Advanced: Multi-Agent (Requires OpenAI)"
 
247
  ),
248
  ),
249
  gr.Textbox(
 
253
  info="Enter your own API key. Never stored.",
254
  ),
255
  gr.Radio(
256
+ choices=["huggingface", "openai", "anthropic"],
257
+ value="huggingface",
258
  label="API Provider",
259
+ info="Select the provider for your API key (HuggingFace is default and free)",
260
  ),
261
  ],
262
  )
src/{orchestrator.py → legacy_orchestrator.py} RENAMED
File without changes
src/middleware/__init__.py CHANGED
@@ -1 +1,30 @@
1
- """Middleware components for orchestration."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Middleware for workflow state management, parallel loop coordination, and budget tracking.
2
+
3
+ This module provides:
4
+ - WorkflowState: Thread-safe state management using ContextVar
5
+ - WorkflowManager: Coordination of parallel research loops
6
+ - BudgetTracker: Token, time, and iteration budget tracking
7
+ """
8
+
9
+ from src.middleware.budget_tracker import BudgetStatus, BudgetTracker
10
+ from src.middleware.state_machine import (
11
+ WorkflowState,
12
+ get_workflow_state,
13
+ init_workflow_state,
14
+ )
15
+ from src.middleware.workflow_manager import (
16
+ LoopStatus,
17
+ ResearchLoop,
18
+ WorkflowManager,
19
+ )
20
+
21
+ __all__ = [
22
+ "BudgetStatus",
23
+ "BudgetTracker",
24
+ "LoopStatus",
25
+ "ResearchLoop",
26
+ "WorkflowManager",
27
+ "WorkflowState",
28
+ "get_workflow_state",
29
+ "init_workflow_state",
30
+ ]
src/middleware/budget_tracker.py ADDED
@@ -0,0 +1,390 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Budget tracking for research loops.
2
+
3
+ Tracks token usage, time elapsed, and iteration counts per loop and globally.
4
+ Enforces budget constraints to prevent infinite loops and excessive resource usage.
5
+ """
6
+
7
+ import time
8
+
9
+ import structlog
10
+ from pydantic import BaseModel, Field
11
+
12
+ logger = structlog.get_logger()
13
+
14
+
15
+ class BudgetStatus(BaseModel):
16
+ """Status of a budget (tokens, time, iterations)."""
17
+
18
+ tokens_used: int = Field(default=0, description="Total tokens used")
19
+ tokens_limit: int = Field(default=100000, description="Token budget limit", ge=0)
20
+ time_elapsed_seconds: float = Field(default=0.0, description="Time elapsed", ge=0.0)
21
+ time_limit_seconds: float = Field(
22
+ default=600.0, description="Time budget limit (10 min default)", ge=0.0
23
+ )
24
+ iterations: int = Field(default=0, description="Number of iterations completed", ge=0)
25
+ iterations_limit: int = Field(default=10, description="Maximum iterations", ge=1)
26
+ iteration_tokens: dict[int, int] = Field(
27
+ default_factory=dict,
28
+ description="Tokens used per iteration (iteration number -> token count)",
29
+ )
30
+
31
+ def is_exceeded(self) -> bool:
32
+ """Check if any budget limit has been exceeded.
33
+
34
+ Returns:
35
+ True if any limit is exceeded, False otherwise.
36
+ """
37
+ return (
38
+ self.tokens_used >= self.tokens_limit
39
+ or self.time_elapsed_seconds >= self.time_limit_seconds
40
+ or self.iterations >= self.iterations_limit
41
+ )
42
+
43
+ def remaining_tokens(self) -> int:
44
+ """Get remaining token budget.
45
+
46
+ Returns:
47
+ Remaining tokens (may be negative if exceeded).
48
+ """
49
+ return self.tokens_limit - self.tokens_used
50
+
51
+ def remaining_time_seconds(self) -> float:
52
+ """Get remaining time budget.
53
+
54
+ Returns:
55
+ Remaining time in seconds (may be negative if exceeded).
56
+ """
57
+ return self.time_limit_seconds - self.time_elapsed_seconds
58
+
59
+ def remaining_iterations(self) -> int:
60
+ """Get remaining iteration budget.
61
+
62
+ Returns:
63
+ Remaining iterations (may be negative if exceeded).
64
+ """
65
+ return self.iterations_limit - self.iterations
66
+
67
+ def add_iteration_tokens(self, iteration: int, tokens: int) -> None:
68
+ """Add tokens for a specific iteration.
69
+
70
+ Args:
71
+ iteration: Iteration number (1-indexed).
72
+ tokens: Number of tokens to add.
73
+ """
74
+ if iteration not in self.iteration_tokens:
75
+ self.iteration_tokens[iteration] = 0
76
+ self.iteration_tokens[iteration] += tokens
77
+ # Also add to total tokens
78
+ self.tokens_used += tokens
79
+
80
+ def get_iteration_tokens(self, iteration: int) -> int:
81
+ """Get tokens used for a specific iteration.
82
+
83
+ Args:
84
+ iteration: Iteration number.
85
+
86
+ Returns:
87
+ Token count for the iteration, or 0 if not found.
88
+ """
89
+ return self.iteration_tokens.get(iteration, 0)
90
+
91
+
92
+ class BudgetTracker:
93
+ """Tracks budgets per loop and globally."""
94
+
95
+ def __init__(self) -> None:
96
+ """Initialize the budget tracker."""
97
+ self._budgets: dict[str, BudgetStatus] = {}
98
+ self._start_times: dict[str, float] = {}
99
+ self._global_budget: BudgetStatus | None = None
100
+
101
+ def create_budget(
102
+ self,
103
+ loop_id: str,
104
+ tokens_limit: int = 100000,
105
+ time_limit_seconds: float = 600.0,
106
+ iterations_limit: int = 10,
107
+ ) -> BudgetStatus:
108
+ """Create a budget for a specific loop.
109
+
110
+ Args:
111
+ loop_id: Unique identifier for the loop.
112
+ tokens_limit: Maximum tokens allowed.
113
+ time_limit_seconds: Maximum time allowed in seconds.
114
+ iterations_limit: Maximum iterations allowed.
115
+
116
+ Returns:
117
+ The created BudgetStatus instance.
118
+ """
119
+ budget = BudgetStatus(
120
+ tokens_limit=tokens_limit,
121
+ time_limit_seconds=time_limit_seconds,
122
+ iterations_limit=iterations_limit,
123
+ )
124
+ self._budgets[loop_id] = budget
125
+ logger.debug(
126
+ "Budget created",
127
+ loop_id=loop_id,
128
+ tokens_limit=tokens_limit,
129
+ time_limit=time_limit_seconds,
130
+ iterations_limit=iterations_limit,
131
+ )
132
+ return budget
133
+
134
+ def get_budget(self, loop_id: str) -> BudgetStatus | None:
135
+ """Get the budget for a specific loop.
136
+
137
+ Args:
138
+ loop_id: Unique identifier for the loop.
139
+
140
+ Returns:
141
+ The BudgetStatus instance, or None if not found.
142
+ """
143
+ return self._budgets.get(loop_id)
144
+
145
+ def add_tokens(self, loop_id: str, tokens: int) -> None:
146
+ """Add tokens to a loop's budget.
147
+
148
+ Args:
149
+ loop_id: Unique identifier for the loop.
150
+ tokens: Number of tokens to add (can be negative).
151
+ """
152
+ if loop_id not in self._budgets:
153
+ logger.warning("Budget not found for loop", loop_id=loop_id)
154
+ return
155
+ self._budgets[loop_id].tokens_used += tokens
156
+ logger.debug("Tokens added", loop_id=loop_id, tokens=tokens)
157
+
158
+ def add_iteration_tokens(self, loop_id: str, iteration: int, tokens: int) -> None:
159
+ """Add tokens for a specific iteration.
160
+
161
+ Args:
162
+ loop_id: Loop identifier.
163
+ iteration: Iteration number (1-indexed).
164
+ tokens: Number of tokens to add.
165
+ """
166
+ if loop_id not in self._budgets:
167
+ logger.warning("Budget not found for loop", loop_id=loop_id)
168
+ return
169
+
170
+ budget = self._budgets[loop_id]
171
+ budget.add_iteration_tokens(iteration, tokens)
172
+
173
+ logger.debug(
174
+ "Iteration tokens added",
175
+ loop_id=loop_id,
176
+ iteration=iteration,
177
+ tokens=tokens,
178
+ total_iteration=budget.get_iteration_tokens(iteration),
179
+ )
180
+
181
+ def get_iteration_tokens(self, loop_id: str, iteration: int) -> int:
182
+ """Get tokens used for a specific iteration.
183
+
184
+ Args:
185
+ loop_id: Loop identifier.
186
+ iteration: Iteration number.
187
+
188
+ Returns:
189
+ Token count for the iteration, or 0 if not found.
190
+ """
191
+ if loop_id not in self._budgets:
192
+ return 0
193
+
194
+ return self._budgets[loop_id].get_iteration_tokens(iteration)
195
+
196
+ def start_timer(self, loop_id: str) -> None:
197
+ """Start the timer for a loop.
198
+
199
+ Args:
200
+ loop_id: Unique identifier for the loop.
201
+ """
202
+ self._start_times[loop_id] = time.time()
203
+ logger.debug("Timer started", loop_id=loop_id)
204
+
205
+ def update_timer(self, loop_id: str) -> None:
206
+ """Update the elapsed time for a loop.
207
+
208
+ Args:
209
+ loop_id: Unique identifier for the loop.
210
+ """
211
+ if loop_id not in self._start_times:
212
+ logger.warning("Timer not started for loop", loop_id=loop_id)
213
+ return
214
+ if loop_id not in self._budgets:
215
+ logger.warning("Budget not found for loop", loop_id=loop_id)
216
+ return
217
+
218
+ elapsed = time.time() - self._start_times[loop_id]
219
+ self._budgets[loop_id].time_elapsed_seconds = elapsed
220
+ logger.debug("Timer updated", loop_id=loop_id, elapsed=elapsed)
221
+
222
+ def increment_iteration(self, loop_id: str) -> None:
223
+ """Increment the iteration count for a loop.
224
+
225
+ Args:
226
+ loop_id: Unique identifier for the loop.
227
+ """
228
+ if loop_id not in self._budgets:
229
+ logger.warning("Budget not found for loop", loop_id=loop_id)
230
+ return
231
+ self._budgets[loop_id].iterations += 1
232
+ logger.debug(
233
+ "Iteration incremented",
234
+ loop_id=loop_id,
235
+ iterations=self._budgets[loop_id].iterations,
236
+ )
237
+
238
+ def check_budget(self, loop_id: str) -> tuple[bool, str]:
239
+ """Check if a loop's budget has been exceeded.
240
+
241
+ Args:
242
+ loop_id: Unique identifier for the loop.
243
+
244
+ Returns:
245
+ Tuple of (exceeded: bool, reason: str). Reason is empty if not exceeded.
246
+ """
247
+ if loop_id not in self._budgets:
248
+ return False, ""
249
+
250
+ budget = self._budgets[loop_id]
251
+ self.update_timer(loop_id) # Update time before checking
252
+
253
+ if budget.is_exceeded():
254
+ reasons = []
255
+ if budget.tokens_used >= budget.tokens_limit:
256
+ reasons.append("tokens")
257
+ if budget.time_elapsed_seconds >= budget.time_limit_seconds:
258
+ reasons.append("time")
259
+ if budget.iterations >= budget.iterations_limit:
260
+ reasons.append("iterations")
261
+ reason = f"Budget exceeded: {', '.join(reasons)}"
262
+ logger.warning("Budget exceeded", loop_id=loop_id, reason=reason)
263
+ return True, reason
264
+
265
+ return False, ""
266
+
267
+ def can_continue(self, loop_id: str) -> bool:
268
+ """Check if a loop can continue based on budget.
269
+
270
+ Args:
271
+ loop_id: Unique identifier for the loop.
272
+
273
+ Returns:
274
+ True if the loop can continue, False if budget is exceeded.
275
+ """
276
+ exceeded, _ = self.check_budget(loop_id)
277
+ return not exceeded
278
+
279
+ def get_budget_summary(self, loop_id: str) -> str:
280
+ """Get a formatted summary of a loop's budget status.
281
+
282
+ Args:
283
+ loop_id: Unique identifier for the loop.
284
+
285
+ Returns:
286
+ Formatted string summary.
287
+ """
288
+ if loop_id not in self._budgets:
289
+ return f"Budget not found for loop: {loop_id}"
290
+
291
+ budget = self._budgets[loop_id]
292
+ self.update_timer(loop_id)
293
+
294
+ return (
295
+ f"Loop {loop_id}: "
296
+ f"Tokens: {budget.tokens_used}/{budget.tokens_limit} "
297
+ f"({budget.remaining_tokens()} remaining), "
298
+ f"Time: {budget.time_elapsed_seconds:.1f}/{budget.time_limit_seconds:.1f}s "
299
+ f"({budget.remaining_time_seconds():.1f}s remaining), "
300
+ f"Iterations: {budget.iterations}/{budget.iterations_limit} "
301
+ f"({budget.remaining_iterations()} remaining)"
302
+ )
303
+
304
+ def reset_budget(self, loop_id: str) -> None:
305
+ """Reset the budget for a loop.
306
+
307
+ Args:
308
+ loop_id: Unique identifier for the loop.
309
+ """
310
+ if loop_id in self._budgets:
311
+ old_budget = self._budgets[loop_id]
312
+ # Preserve iteration_tokens when resetting
313
+ old_iteration_tokens = old_budget.iteration_tokens
314
+ self._budgets[loop_id] = BudgetStatus(
315
+ tokens_limit=old_budget.tokens_limit,
316
+ time_limit_seconds=old_budget.time_limit_seconds,
317
+ iterations_limit=old_budget.iterations_limit,
318
+ iteration_tokens=old_iteration_tokens, # Restore old iteration tokens
319
+ )
320
+ if loop_id in self._start_times:
321
+ self._start_times[loop_id] = time.time()
322
+ logger.debug("Budget reset", loop_id=loop_id)
323
+
324
+ def set_global_budget(
325
+ self,
326
+ tokens_limit: int = 100000,
327
+ time_limit_seconds: float = 600.0,
328
+ iterations_limit: int = 10,
329
+ ) -> None:
330
+ """Set a global budget that applies to all loops.
331
+
332
+ Args:
333
+ tokens_limit: Maximum tokens allowed globally.
334
+ time_limit_seconds: Maximum time allowed in seconds.
335
+ iterations_limit: Maximum iterations allowed globally.
336
+ """
337
+ self._global_budget = BudgetStatus(
338
+ tokens_limit=tokens_limit,
339
+ time_limit_seconds=time_limit_seconds,
340
+ iterations_limit=iterations_limit,
341
+ )
342
+ logger.debug(
343
+ "Global budget set",
344
+ tokens_limit=tokens_limit,
345
+ time_limit=time_limit_seconds,
346
+ iterations_limit=iterations_limit,
347
+ )
348
+
349
+ def get_global_budget(self) -> BudgetStatus | None:
350
+ """Get the global budget.
351
+
352
+ Returns:
353
+ The global BudgetStatus instance, or None if not set.
354
+ """
355
+ return self._global_budget
356
+
357
+ def add_global_tokens(self, tokens: int) -> None:
358
+ """Add tokens to the global budget.
359
+
360
+ Args:
361
+ tokens: Number of tokens to add (can be negative).
362
+ """
363
+ if self._global_budget is None:
364
+ logger.warning("Global budget not set")
365
+ return
366
+ self._global_budget.tokens_used += tokens
367
+ logger.debug("Global tokens added", tokens=tokens)
368
+
369
+ def estimate_tokens(self, text: str) -> int:
370
+ """Estimate token count from text (rough estimate: ~4 chars per token).
371
+
372
+ Args:
373
+ text: Text to estimate tokens for.
374
+
375
+ Returns:
376
+ Estimated token count.
377
+ """
378
+ return len(text) // 4
379
+
380
+ def estimate_llm_call_tokens(self, prompt: str, response: str) -> int:
381
+ """Estimate token count for an LLM call.
382
+
383
+ Args:
384
+ prompt: The prompt text.
385
+ response: The response text.
386
+
387
+ Returns:
388
+ Estimated total token count (prompt + response).
389
+ """
390
+ return self.estimate_tokens(prompt) + self.estimate_tokens(response)
src/middleware/state_machine.py ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Thread-safe state management for workflow agents.
2
+
3
+ Uses contextvars to ensure isolation between concurrent requests (e.g., multiple users
4
+ searching simultaneously via Gradio). Refactored from MagenticState to support both
5
+ iterative and deep research patterns.
6
+ """
7
+
8
+ from contextvars import ContextVar
9
+ from typing import TYPE_CHECKING, Any
10
+
11
+ import structlog
12
+ from pydantic import BaseModel, Field
13
+
14
+ from src.utils.models import Citation, Conversation, Evidence
15
+
16
+ if TYPE_CHECKING:
17
+ from src.services.embeddings import EmbeddingService
18
+
19
+ logger = structlog.get_logger()
20
+
21
+
22
+ class WorkflowState(BaseModel):
23
+ """Mutable state for a workflow session.
24
+
25
+ Supports both iterative and deep research patterns by tracking evidence,
26
+ conversation history, and providing semantic search capabilities.
27
+ """
28
+
29
+ evidence: list[Evidence] = Field(default_factory=list)
30
+ conversation: Conversation = Field(default_factory=Conversation)
31
+ # Type as Any to avoid circular imports/runtime resolution issues
32
+ # The actual object injected will be an EmbeddingService instance
33
+ embedding_service: Any = Field(default=None)
34
+
35
+ model_config = {"arbitrary_types_allowed": True}
36
+
37
+ def add_evidence(self, new_evidence: list[Evidence]) -> int:
38
+ """Add new evidence, deduplicating by URL.
39
+
40
+ Args:
41
+ new_evidence: List of Evidence objects to add.
42
+
43
+ Returns:
44
+ Number of *new* items added (excluding duplicates).
45
+ """
46
+ existing_urls = {e.citation.url for e in self.evidence}
47
+ count = 0
48
+ for item in new_evidence:
49
+ if item.citation.url not in existing_urls:
50
+ self.evidence.append(item)
51
+ existing_urls.add(item.citation.url)
52
+ count += 1
53
+ return count
54
+
55
+ async def search_related(self, query: str, n_results: int = 5) -> list[Evidence]:
56
+ """Search for semantically related evidence using the embedding service.
57
+
58
+ Args:
59
+ query: Search query string.
60
+ n_results: Maximum number of results to return.
61
+
62
+ Returns:
63
+ List of Evidence objects, ordered by relevance.
64
+ """
65
+ if not self.embedding_service:
66
+ logger.warning("Embedding service not available, returning empty results")
67
+ return []
68
+
69
+ results = await self.embedding_service.search_similar(query, n_results=n_results)
70
+
71
+ # Convert dict results back to Evidence objects
72
+ evidence_list = []
73
+ for item in results:
74
+ meta = item.get("metadata", {})
75
+ authors_str = meta.get("authors", "")
76
+ authors = [a.strip() for a in authors_str.split(",") if a.strip()]
77
+
78
+ ev = Evidence(
79
+ content=item["content"],
80
+ citation=Citation(
81
+ title=meta.get("title", "Related Evidence"),
82
+ url=item["id"],
83
+ source="pubmed", # Defaulting to pubmed if unknown
84
+ date=meta.get("date", "n.d."),
85
+ authors=authors,
86
+ ),
87
+ relevance=max(0.0, 1.0 - item.get("distance", 0.5)),
88
+ )
89
+ evidence_list.append(ev)
90
+
91
+ return evidence_list
92
+
93
+
94
+ # The ContextVar holds the WorkflowState for the current execution context
95
+ _workflow_state_var: ContextVar[WorkflowState | None] = ContextVar("workflow_state", default=None)
96
+
97
+
98
+ def init_workflow_state(
99
+ embedding_service: "EmbeddingService | None" = None,
100
+ ) -> WorkflowState:
101
+ """Initialize a new state for the current context.
102
+
103
+ Args:
104
+ embedding_service: Optional embedding service for semantic search.
105
+
106
+ Returns:
107
+ The initialized WorkflowState instance.
108
+ """
109
+ state = WorkflowState(embedding_service=embedding_service)
110
+ _workflow_state_var.set(state)
111
+ logger.debug("Workflow state initialized", has_embeddings=embedding_service is not None)
112
+ return state
113
+
114
+
115
+ def get_workflow_state() -> WorkflowState:
116
+ """Get the current state. Auto-initializes if not set.
117
+
118
+ Returns:
119
+ The current WorkflowState instance.
120
+
121
+ Raises:
122
+ RuntimeError: If state is not initialized and auto-initialization fails.
123
+ """
124
+ state = _workflow_state_var.get()
125
+ if state is None:
126
+ # Auto-initialize if missing (e.g. during tests or simple scripts)
127
+ logger.debug("Workflow state not found, auto-initializing")
128
+ return init_workflow_state()
129
+ return state
src/middleware/sub_iteration.py CHANGED
@@ -125,8 +125,7 @@ class SubIterationMiddleware:
125
  AgentEvent(
126
  type="looping",
127
  message=(
128
- f"Sub-iteration {i} result insufficient. "
129
- f"Feedback: {feedback[:100]}..."
130
  ),
131
  iteration=i,
132
  )
 
125
  AgentEvent(
126
  type="looping",
127
  message=(
128
+ f"Sub-iteration {i} result insufficient. Feedback: {feedback[:100]}..."
 
129
  ),
130
  iteration=i,
131
  )
src/middleware/workflow_manager.py ADDED
@@ -0,0 +1,322 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Workflow manager for coordinating parallel research loops.
2
+
3
+ Manages multiple research loops running in parallel, tracks their status,
4
+ and synchronizes evidence between loops and the global state.
5
+ """
6
+
7
+ import asyncio
8
+ from collections.abc import Callable
9
+ from typing import Any, Literal
10
+
11
+ import structlog
12
+ from pydantic import BaseModel, Field
13
+
14
+ from src.middleware.state_machine import get_workflow_state
15
+ from src.utils.models import Evidence
16
+
17
+ logger = structlog.get_logger()
18
+
19
+ LoopStatus = Literal["pending", "running", "completed", "failed", "cancelled"]
20
+
21
+
22
+ class ResearchLoop(BaseModel):
23
+ """Represents a single research loop."""
24
+
25
+ loop_id: str = Field(description="Unique identifier for the loop")
26
+ query: str = Field(description="The research query for this loop")
27
+ status: LoopStatus = Field(default="pending")
28
+ evidence: list[Evidence] = Field(default_factory=list)
29
+ iteration_count: int = Field(default=0, ge=0)
30
+ error: str | None = Field(default=None)
31
+
32
+ model_config = {"frozen": False} # Mutable for status updates
33
+
34
+
35
+ class WorkflowManager:
36
+ """Manages parallel research loops and state synchronization."""
37
+
38
+ def __init__(self) -> None:
39
+ """Initialize the workflow manager."""
40
+ self._loops: dict[str, ResearchLoop] = {}
41
+
42
+ async def add_loop(self, loop_id: str, query: str) -> ResearchLoop:
43
+ """Add a new research loop.
44
+
45
+ Args:
46
+ loop_id: Unique identifier for the loop.
47
+ query: The research query for this loop.
48
+
49
+ Returns:
50
+ The created ResearchLoop instance.
51
+ """
52
+ loop = ResearchLoop(loop_id=loop_id, query=query, status="pending")
53
+ self._loops[loop_id] = loop
54
+ logger.info("Loop added", loop_id=loop_id, query=query)
55
+ return loop
56
+
57
+ async def get_loop(self, loop_id: str) -> ResearchLoop | None:
58
+ """Get a research loop by ID.
59
+
60
+ Args:
61
+ loop_id: Unique identifier for the loop.
62
+
63
+ Returns:
64
+ The ResearchLoop instance, or None if not found.
65
+ """
66
+ return self._loops.get(loop_id)
67
+
68
+ async def update_loop_status(
69
+ self, loop_id: str, status: LoopStatus, error: str | None = None
70
+ ) -> None:
71
+ """Update the status of a research loop.
72
+
73
+ Args:
74
+ loop_id: Unique identifier for the loop.
75
+ status: New status for the loop.
76
+ error: Optional error message if status is "failed".
77
+ """
78
+ if loop_id not in self._loops:
79
+ logger.warning("Loop not found", loop_id=loop_id)
80
+ return
81
+
82
+ self._loops[loop_id].status = status
83
+ if error:
84
+ self._loops[loop_id].error = error
85
+ logger.info("Loop status updated", loop_id=loop_id, status=status)
86
+
87
+ async def add_loop_evidence(self, loop_id: str, evidence: list[Evidence]) -> None:
88
+ """Add evidence to a research loop.
89
+
90
+ Args:
91
+ loop_id: Unique identifier for the loop.
92
+ evidence: List of Evidence objects to add.
93
+ """
94
+ if loop_id not in self._loops:
95
+ logger.warning("Loop not found", loop_id=loop_id)
96
+ return
97
+
98
+ self._loops[loop_id].evidence.extend(evidence)
99
+ logger.debug(
100
+ "Evidence added to loop",
101
+ loop_id=loop_id,
102
+ evidence_count=len(evidence),
103
+ )
104
+
105
+ async def increment_loop_iteration(self, loop_id: str) -> None:
106
+ """Increment the iteration count for a research loop.
107
+
108
+ Args:
109
+ loop_id: Unique identifier for the loop.
110
+ """
111
+ if loop_id not in self._loops:
112
+ logger.warning("Loop not found", loop_id=loop_id)
113
+ return
114
+
115
+ self._loops[loop_id].iteration_count += 1
116
+ logger.debug(
117
+ "Iteration incremented",
118
+ loop_id=loop_id,
119
+ iteration=self._loops[loop_id].iteration_count,
120
+ )
121
+
122
+ async def run_loops_parallel(
123
+ self,
124
+ loop_configs: list[dict[str, Any]],
125
+ loop_func: Callable[[dict[str, Any]], Any],
126
+ judge_handler: Any | None = None,
127
+ budget_tracker: Any | None = None,
128
+ ) -> list[Any]:
129
+ """Run multiple research loops in parallel.
130
+
131
+ Args:
132
+ loop_configs: List of configuration dicts, each must contain 'loop_id' and 'query'.
133
+ loop_func: Async function that takes a config dict and returns loop results.
134
+ judge_handler: Optional JudgeHandler for early termination based on evidence sufficiency.
135
+ budget_tracker: Optional BudgetTracker for budget enforcement.
136
+
137
+ Returns:
138
+ List of results from each loop (in order of completion, not original order).
139
+ """
140
+ logger.info("Starting parallel loops", loop_count=len(loop_configs))
141
+
142
+ # Create loops
143
+ for config in loop_configs:
144
+ loop_id = config.get("loop_id")
145
+ query = config.get("query", "")
146
+ if loop_id:
147
+ await self.add_loop(loop_id, query)
148
+ await self.update_loop_status(loop_id, "running")
149
+
150
+ # Run loops in parallel
151
+ async def run_single_loop(config: dict[str, Any]) -> Any:
152
+ loop_id = config.get("loop_id", "unknown")
153
+ query = config.get("query", "")
154
+ try:
155
+ # Check budget before starting
156
+ if budget_tracker:
157
+ exceeded, reason = budget_tracker.check_budget(loop_id)
158
+ if exceeded:
159
+ await self.update_loop_status(loop_id, "cancelled", error=reason)
160
+ logger.warning(
161
+ "Loop cancelled due to budget", loop_id=loop_id, reason=reason
162
+ )
163
+ return None
164
+
165
+ # If loop_func supports periodic checkpoints, we could check judge here
166
+ # For now, the loop_func itself handles judge checks internally
167
+ result = await loop_func(config)
168
+
169
+ # Final check with judge if available
170
+ if judge_handler and query:
171
+ should_complete, reason = await self.check_loop_completion(
172
+ loop_id, query, judge_handler
173
+ )
174
+ if should_complete:
175
+ logger.info(
176
+ "Loop completed early based on judge assessment",
177
+ loop_id=loop_id,
178
+ reason=reason,
179
+ )
180
+
181
+ await self.update_loop_status(loop_id, "completed")
182
+ return result
183
+ except Exception as e:
184
+ error_msg = str(e)
185
+ await self.update_loop_status(loop_id, "failed", error=error_msg)
186
+ logger.error("Loop failed", loop_id=loop_id, error=error_msg)
187
+ raise
188
+
189
+ results = await asyncio.gather(
190
+ *(run_single_loop(config) for config in loop_configs),
191
+ return_exceptions=True,
192
+ )
193
+
194
+ # Log completion
195
+ completed = sum(1 for r in results if not isinstance(r, Exception))
196
+ failed = len(results) - completed
197
+ logger.info(
198
+ "Parallel loops completed",
199
+ total=len(loop_configs),
200
+ completed=completed,
201
+ failed=failed,
202
+ )
203
+
204
+ return results
205
+
206
+ async def wait_for_loops(
207
+ self, loop_ids: list[str], timeout: float | None = None
208
+ ) -> list[ResearchLoop]:
209
+ """Wait for loops to complete.
210
+
211
+ Args:
212
+ loop_ids: List of loop IDs to wait for.
213
+ timeout: Optional timeout in seconds.
214
+
215
+ Returns:
216
+ List of ResearchLoop instances (may be incomplete if timeout occurs).
217
+ """
218
+ start_time = asyncio.get_event_loop().time()
219
+
220
+ while True:
221
+ loops = [self._loops.get(loop_id) for loop_id in loop_ids]
222
+ all_complete = all(
223
+ loop and loop.status in ("completed", "failed", "cancelled") for loop in loops
224
+ )
225
+
226
+ if all_complete:
227
+ return [loop for loop in loops if loop is not None]
228
+
229
+ if timeout is not None:
230
+ elapsed = asyncio.get_event_loop().time() - start_time
231
+ if elapsed >= timeout:
232
+ logger.warning("Timeout waiting for loops", timeout=timeout)
233
+ return [loop for loop in loops if loop is not None]
234
+
235
+ await asyncio.sleep(0.1) # Small delay to avoid busy waiting
236
+
237
+ async def cancel_loop(self, loop_id: str) -> None:
238
+ """Cancel a research loop.
239
+
240
+ Args:
241
+ loop_id: Unique identifier for the loop.
242
+ """
243
+ await self.update_loop_status(loop_id, "cancelled")
244
+ logger.info("Loop cancelled", loop_id=loop_id)
245
+
246
+ async def get_all_loops(self) -> list[ResearchLoop]:
247
+ """Get all research loops.
248
+
249
+ Returns:
250
+ List of all ResearchLoop instances.
251
+ """
252
+ return list(self._loops.values())
253
+
254
+ async def sync_loop_evidence_to_state(self, loop_id: str) -> None:
255
+ """Synchronize evidence from a loop to the global state.
256
+
257
+ Args:
258
+ loop_id: Unique identifier for the loop.
259
+ """
260
+ if loop_id not in self._loops:
261
+ logger.warning("Loop not found", loop_id=loop_id)
262
+ return
263
+
264
+ loop = self._loops[loop_id]
265
+ state = get_workflow_state()
266
+ added_count = state.add_evidence(loop.evidence)
267
+ logger.debug(
268
+ "Loop evidence synced to state",
269
+ loop_id=loop_id,
270
+ evidence_count=len(loop.evidence),
271
+ added_count=added_count,
272
+ )
273
+
274
+ async def get_shared_evidence(self) -> list[Evidence]:
275
+ """Get evidence from the global state.
276
+
277
+ Returns:
278
+ List of Evidence objects from the global state.
279
+ """
280
+ state = get_workflow_state()
281
+ return state.evidence
282
+
283
+ async def get_loop_evidence(self, loop_id: str) -> list[Evidence]:
284
+ """Get evidence collected by a specific loop.
285
+
286
+ Args:
287
+ loop_id: Loop identifier.
288
+
289
+ Returns:
290
+ List of Evidence objects from the loop.
291
+ """
292
+ if loop_id not in self._loops:
293
+ return []
294
+
295
+ return self._loops[loop_id].evidence
296
+
297
+ async def check_loop_completion(
298
+ self, loop_id: str, query: str, judge_handler: Any
299
+ ) -> tuple[bool, str]:
300
+ """Check if a loop should complete using judge assessment.
301
+
302
+ Args:
303
+ loop_id: Loop identifier.
304
+ query: Research query.
305
+ judge_handler: JudgeHandler instance.
306
+
307
+ Returns:
308
+ Tuple of (should_complete: bool, reason: str).
309
+ """
310
+ evidence = await self.get_loop_evidence(loop_id)
311
+
312
+ if not evidence:
313
+ return False, "No evidence collected yet"
314
+
315
+ try:
316
+ assessment = await judge_handler.assess(query, evidence)
317
+ if assessment.sufficient:
318
+ return True, f"Judge assessment: {assessment.reasoning}"
319
+ return False, f"Judge assessment: {assessment.reasoning}"
320
+ except Exception as e:
321
+ logger.error("Judge assessment failed", error=str(e), loop_id=loop_id)
322
+ return False, f"Judge assessment failed: {e!s}"
src/orchestrator/__init__.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Orchestrator module for research flows and planner agent.
2
+
3
+ This module provides:
4
+ - PlannerAgent: Creates report plans with sections
5
+ - IterativeResearchFlow: Single research loop pattern
6
+ - DeepResearchFlow: Parallel research loops pattern
7
+ - GraphOrchestrator: Stub for Phase 4 (uses agent chains for now)
8
+ - Protocols: SearchHandlerProtocol, JudgeHandlerProtocol (re-exported from legacy_orchestrator)
9
+ - Orchestrator: Legacy orchestrator class (re-exported from legacy_orchestrator)
10
+ """
11
+
12
+ from typing import TYPE_CHECKING
13
+
14
+ # Re-export protocols and Orchestrator from legacy_orchestrator for backward compatibility
15
+ from src.legacy_orchestrator import (
16
+ JudgeHandlerProtocol,
17
+ Orchestrator,
18
+ SearchHandlerProtocol,
19
+ )
20
+
21
+ # Lazy imports to avoid circular dependencies
22
+ if TYPE_CHECKING:
23
+ from src.orchestrator.graph_orchestrator import GraphOrchestrator
24
+ from src.orchestrator.planner_agent import PlannerAgent, create_planner_agent
25
+ from src.orchestrator.research_flow import (
26
+ DeepResearchFlow,
27
+ IterativeResearchFlow,
28
+ )
29
+
30
+ # Public exports
31
+ from src.orchestrator.graph_orchestrator import (
32
+ GraphOrchestrator,
33
+ create_graph_orchestrator,
34
+ )
35
+ from src.orchestrator.planner_agent import PlannerAgent, create_planner_agent
36
+ from src.orchestrator.research_flow import DeepResearchFlow, IterativeResearchFlow
37
+
38
+ __all__ = [
39
+ "DeepResearchFlow",
40
+ "GraphOrchestrator",
41
+ "IterativeResearchFlow",
42
+ "JudgeHandlerProtocol",
43
+ "Orchestrator",
44
+ "PlannerAgent",
45
+ "SearchHandlerProtocol",
46
+ "create_graph_orchestrator",
47
+ "create_planner_agent",
48
+ ]
src/orchestrator/graph_orchestrator.py ADDED
@@ -0,0 +1,974 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Graph orchestrator for Phase 4.
2
+
3
+ Implements graph-based orchestration using Pydantic AI agents as nodes.
4
+ Supports both iterative and deep research patterns with parallel execution.
5
+ """
6
+
7
+ import asyncio
8
+ from collections.abc import AsyncGenerator, Callable
9
+ from typing import TYPE_CHECKING, Any, Literal
10
+
11
+ import structlog
12
+
13
+ from src.agent_factory.agents import (
14
+ create_input_parser_agent,
15
+ create_knowledge_gap_agent,
16
+ create_long_writer_agent,
17
+ create_planner_agent,
18
+ create_thinking_agent,
19
+ create_tool_selector_agent,
20
+ create_writer_agent,
21
+ )
22
+ from src.agent_factory.graph_builder import (
23
+ AgentNode,
24
+ DecisionNode,
25
+ ParallelNode,
26
+ ResearchGraph,
27
+ StateNode,
28
+ create_deep_graph,
29
+ create_iterative_graph,
30
+ )
31
+ from src.middleware.budget_tracker import BudgetTracker
32
+ from src.middleware.state_machine import WorkflowState, init_workflow_state
33
+ from src.orchestrator.research_flow import DeepResearchFlow, IterativeResearchFlow
34
+ from src.utils.models import AgentEvent
35
+
36
+ if TYPE_CHECKING:
37
+ pass
38
+
39
+ logger = structlog.get_logger()
40
+
41
+
42
+ class GraphExecutionContext:
43
+ """Context for managing graph execution state."""
44
+
45
+ def __init__(self, state: WorkflowState, budget_tracker: BudgetTracker) -> None:
46
+ """Initialize execution context.
47
+
48
+ Args:
49
+ state: Current workflow state
50
+ budget_tracker: Budget tracker instance
51
+ """
52
+ self.current_node: str = ""
53
+ self.visited_nodes: set[str] = set()
54
+ self.node_results: dict[str, Any] = {}
55
+ self.state = state
56
+ self.budget_tracker = budget_tracker
57
+ self.iteration_count = 0
58
+
59
+ def set_node_result(self, node_id: str, result: Any) -> None:
60
+ """Store result from node execution.
61
+
62
+ Args:
63
+ node_id: The node ID
64
+ result: The execution result
65
+ """
66
+ self.node_results[node_id] = result
67
+
68
+ def get_node_result(self, node_id: str) -> Any:
69
+ """Get result from node execution.
70
+
71
+ Args:
72
+ node_id: The node ID
73
+
74
+ Returns:
75
+ The stored result, or None if not found
76
+ """
77
+ return self.node_results.get(node_id)
78
+
79
+ def has_visited(self, node_id: str) -> bool:
80
+ """Check if node was visited.
81
+
82
+ Args:
83
+ node_id: The node ID
84
+
85
+ Returns:
86
+ True if visited, False otherwise
87
+ """
88
+ return node_id in self.visited_nodes
89
+
90
+ def mark_visited(self, node_id: str) -> None:
91
+ """Mark node as visited.
92
+
93
+ Args:
94
+ node_id: The node ID
95
+ """
96
+ self.visited_nodes.add(node_id)
97
+
98
+ def update_state(
99
+ self, updater: Callable[[WorkflowState, Any], WorkflowState], data: Any
100
+ ) -> None:
101
+ """Update workflow state.
102
+
103
+ Args:
104
+ updater: Function to update state
105
+ data: Data to pass to updater
106
+ """
107
+ self.state = updater(self.state, data)
108
+
109
+
110
+ class GraphOrchestrator:
111
+ """
112
+ Graph orchestrator using Pydantic AI Graphs.
113
+
114
+ Executes research workflows as graphs with nodes (agents) and edges (transitions).
115
+ Supports parallel execution, conditional routing, and state management.
116
+ """
117
+
118
+ def __init__(
119
+ self,
120
+ mode: Literal["iterative", "deep", "auto"] = "auto",
121
+ max_iterations: int = 5,
122
+ max_time_minutes: int = 10,
123
+ use_graph: bool = True,
124
+ ) -> None:
125
+ """
126
+ Initialize graph orchestrator.
127
+
128
+ Args:
129
+ mode: Research mode ("iterative", "deep", or "auto" to detect)
130
+ max_iterations: Maximum iterations per loop
131
+ max_time_minutes: Maximum time per loop
132
+ use_graph: Whether to use graph execution (True) or agent chains (False)
133
+ """
134
+ self.mode = mode
135
+ self.max_iterations = max_iterations
136
+ self.max_time_minutes = max_time_minutes
137
+ self.use_graph = use_graph
138
+ self.logger = logger
139
+
140
+ # Initialize flows (for backward compatibility)
141
+ self._iterative_flow: IterativeResearchFlow | None = None
142
+ self._deep_flow: DeepResearchFlow | None = None
143
+
144
+ # Graph execution components (lazy initialization)
145
+ self._graph: ResearchGraph | None = None
146
+ self._budget_tracker: BudgetTracker | None = None
147
+
148
+ async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
149
+ """
150
+ Run the research workflow.
151
+
152
+ Args:
153
+ query: The user's research query
154
+
155
+ Yields:
156
+ AgentEvent objects for real-time UI updates
157
+ """
158
+ self.logger.info(
159
+ "Starting graph orchestrator",
160
+ query=query[:100],
161
+ mode=self.mode,
162
+ use_graph=self.use_graph,
163
+ )
164
+
165
+ yield AgentEvent(
166
+ type="started",
167
+ message=f"Starting research ({self.mode} mode): {query}",
168
+ iteration=0,
169
+ )
170
+
171
+ try:
172
+ # Determine research mode
173
+ research_mode = self.mode
174
+ if research_mode == "auto":
175
+ research_mode = await self._detect_research_mode(query)
176
+
177
+ # Use graph execution if enabled, otherwise fall back to agent chains
178
+ if self.use_graph:
179
+ async for event in self._run_with_graph(query, research_mode):
180
+ yield event
181
+ else:
182
+ async for event in self._run_with_chains(query, research_mode):
183
+ yield event
184
+
185
+ except Exception as e:
186
+ self.logger.error("Graph orchestrator failed", error=str(e), exc_info=True)
187
+ yield AgentEvent(
188
+ type="error",
189
+ message=f"Research failed: {e!s}",
190
+ iteration=0,
191
+ )
192
+
193
+ async def _run_with_graph(
194
+ self, query: str, research_mode: Literal["iterative", "deep"]
195
+ ) -> AsyncGenerator[AgentEvent, None]:
196
+ """Run workflow using graph execution.
197
+
198
+ Args:
199
+ query: The research query
200
+ research_mode: The research mode
201
+
202
+ Yields:
203
+ AgentEvent objects
204
+ """
205
+ # Initialize state and budget tracker
206
+ from src.services.embeddings import get_embedding_service
207
+
208
+ embedding_service = get_embedding_service()
209
+ state = init_workflow_state(embedding_service=embedding_service)
210
+ budget_tracker = BudgetTracker()
211
+ budget_tracker.create_budget(
212
+ loop_id="graph_execution",
213
+ tokens_limit=100000,
214
+ time_limit_seconds=self.max_time_minutes * 60,
215
+ iterations_limit=self.max_iterations,
216
+ )
217
+ budget_tracker.start_timer("graph_execution")
218
+
219
+ context = GraphExecutionContext(state, budget_tracker)
220
+
221
+ # Build graph
222
+ self._graph = await self._build_graph(research_mode)
223
+
224
+ # Execute graph
225
+ async for event in self._execute_graph(query, context):
226
+ yield event
227
+
228
+ async def _run_with_chains(
229
+ self, query: str, research_mode: Literal["iterative", "deep"]
230
+ ) -> AsyncGenerator[AgentEvent, None]:
231
+ """Run workflow using agent chains (backward compatibility).
232
+
233
+ Args:
234
+ query: The research query
235
+ research_mode: The research mode
236
+
237
+ Yields:
238
+ AgentEvent objects
239
+ """
240
+ if research_mode == "iterative":
241
+ yield AgentEvent(
242
+ type="searching",
243
+ message="Running iterative research flow...",
244
+ iteration=1,
245
+ )
246
+
247
+ if self._iterative_flow is None:
248
+ self._iterative_flow = IterativeResearchFlow(
249
+ max_iterations=self.max_iterations,
250
+ max_time_minutes=self.max_time_minutes,
251
+ )
252
+
253
+ try:
254
+ final_report = await self._iterative_flow.run(query)
255
+ except Exception as e:
256
+ self.logger.error("Iterative flow failed", error=str(e), exc_info=True)
257
+ # Yield error event - outer handler will also catch and yield error event
258
+ yield AgentEvent(
259
+ type="error",
260
+ message=f"Iterative research failed: {e!s}",
261
+ iteration=1,
262
+ )
263
+ # Re-raise so outer handler can also yield error event for consistency
264
+ raise
265
+
266
+ yield AgentEvent(
267
+ type="complete",
268
+ message=final_report,
269
+ data={"mode": "iterative"},
270
+ iteration=1,
271
+ )
272
+
273
+ elif research_mode == "deep":
274
+ yield AgentEvent(
275
+ type="searching",
276
+ message="Running deep research flow...",
277
+ iteration=1,
278
+ )
279
+
280
+ if self._deep_flow is None:
281
+ self._deep_flow = DeepResearchFlow(
282
+ max_iterations=self.max_iterations,
283
+ max_time_minutes=self.max_time_minutes,
284
+ )
285
+
286
+ try:
287
+ final_report = await self._deep_flow.run(query)
288
+ except Exception as e:
289
+ self.logger.error("Deep flow failed", error=str(e), exc_info=True)
290
+ # Yield error event before re-raising so test can capture it
291
+ yield AgentEvent(
292
+ type="error",
293
+ message=f"Deep research failed: {e!s}",
294
+ iteration=1,
295
+ )
296
+ raise
297
+
298
+ yield AgentEvent(
299
+ type="complete",
300
+ message=final_report,
301
+ data={"mode": "deep"},
302
+ iteration=1,
303
+ )
304
+
305
+ async def _build_graph(self, mode: Literal["iterative", "deep"]) -> ResearchGraph:
306
+ """Build graph for the specified mode.
307
+
308
+ Args:
309
+ mode: Research mode
310
+
311
+ Returns:
312
+ Constructed ResearchGraph
313
+ """
314
+ if mode == "iterative":
315
+ # Get agents
316
+ knowledge_gap_agent = create_knowledge_gap_agent()
317
+ tool_selector_agent = create_tool_selector_agent()
318
+ thinking_agent = create_thinking_agent()
319
+ writer_agent = create_writer_agent()
320
+
321
+ # Create graph
322
+ graph = create_iterative_graph(
323
+ knowledge_gap_agent=knowledge_gap_agent.agent,
324
+ tool_selector_agent=tool_selector_agent.agent,
325
+ thinking_agent=thinking_agent.agent,
326
+ writer_agent=writer_agent.agent,
327
+ )
328
+ else: # deep
329
+ # Get agents
330
+ planner_agent = create_planner_agent()
331
+ knowledge_gap_agent = create_knowledge_gap_agent()
332
+ tool_selector_agent = create_tool_selector_agent()
333
+ thinking_agent = create_thinking_agent()
334
+ writer_agent = create_writer_agent()
335
+ long_writer_agent = create_long_writer_agent()
336
+
337
+ # Create graph
338
+ graph = create_deep_graph(
339
+ planner_agent=planner_agent.agent,
340
+ knowledge_gap_agent=knowledge_gap_agent.agent,
341
+ tool_selector_agent=tool_selector_agent.agent,
342
+ thinking_agent=thinking_agent.agent,
343
+ writer_agent=writer_agent.agent,
344
+ long_writer_agent=long_writer_agent.agent,
345
+ )
346
+
347
+ return graph
348
+
349
+ def _emit_start_event(
350
+ self, node: Any, current_node_id: str, iteration: int, context: GraphExecutionContext
351
+ ) -> AgentEvent:
352
+ """Emit start event for a node.
353
+
354
+ Args:
355
+ node: The node being executed
356
+ current_node_id: Current node ID
357
+ iteration: Current iteration number
358
+ context: Execution context
359
+
360
+ Returns:
361
+ AgentEvent for the start of node execution
362
+ """
363
+ if node and node.node_id == "planner":
364
+ return AgentEvent(
365
+ type="searching",
366
+ message="Creating report plan...",
367
+ iteration=iteration,
368
+ )
369
+ elif node and node.node_id == "parallel_loops":
370
+ # Get report plan to show section count
371
+ report_plan = context.get_node_result("planner")
372
+ if report_plan and hasattr(report_plan, "report_outline"):
373
+ section_count = len(report_plan.report_outline)
374
+ return AgentEvent(
375
+ type="looping",
376
+ message=f"Running parallel research loops for {section_count} sections...",
377
+ iteration=iteration,
378
+ data={"sections": section_count},
379
+ )
380
+ return AgentEvent(
381
+ type="looping",
382
+ message="Running parallel research loops...",
383
+ iteration=iteration,
384
+ )
385
+ elif node and node.node_id == "synthesizer":
386
+ return AgentEvent(
387
+ type="synthesizing",
388
+ message="Synthesizing final report from section drafts...",
389
+ iteration=iteration,
390
+ )
391
+ return AgentEvent(
392
+ type="looping",
393
+ message=f"Executing node: {current_node_id}",
394
+ iteration=iteration,
395
+ )
396
+
397
+ def _emit_completion_event(
398
+ self, node: Any, current_node_id: str, result: Any, iteration: int
399
+ ) -> AgentEvent:
400
+ """Emit completion event for a node.
401
+
402
+ Args:
403
+ node: The node that was executed
404
+ current_node_id: Current node ID
405
+ result: Node execution result
406
+ iteration: Current iteration number
407
+
408
+ Returns:
409
+ AgentEvent for the completion of node execution
410
+ """
411
+ if not node:
412
+ return AgentEvent(
413
+ type="looping",
414
+ message=f"Completed node: {current_node_id}",
415
+ iteration=iteration,
416
+ )
417
+
418
+ if node.node_id == "planner":
419
+ if isinstance(result, dict) and "report_outline" in result:
420
+ section_count = len(result["report_outline"])
421
+ return AgentEvent(
422
+ type="search_complete",
423
+ message=f"Report plan created with {section_count} sections",
424
+ iteration=iteration,
425
+ data={"sections": section_count},
426
+ )
427
+ return AgentEvent(
428
+ type="search_complete",
429
+ message="Report plan created",
430
+ iteration=iteration,
431
+ )
432
+ elif node.node_id == "parallel_loops":
433
+ if isinstance(result, list):
434
+ return AgentEvent(
435
+ type="search_complete",
436
+ message=f"Completed parallel research for {len(result)} sections",
437
+ iteration=iteration,
438
+ data={"sections_completed": len(result)},
439
+ )
440
+ return AgentEvent(
441
+ type="search_complete",
442
+ message="Parallel research loops completed",
443
+ iteration=iteration,
444
+ )
445
+ elif node.node_id == "synthesizer":
446
+ return AgentEvent(
447
+ type="synthesizing",
448
+ message="Final report synthesis completed",
449
+ iteration=iteration,
450
+ )
451
+ return AgentEvent(
452
+ type="searching" if node.node_type == "agent" else "looping",
453
+ message=f"Completed {node.node_type} node: {current_node_id}",
454
+ iteration=iteration,
455
+ )
456
+
457
+ async def _execute_graph(
458
+ self, query: str, context: GraphExecutionContext
459
+ ) -> AsyncGenerator[AgentEvent, None]:
460
+ """Execute the graph from entry node.
461
+
462
+ Args:
463
+ query: The research query
464
+ context: Execution context
465
+
466
+ Yields:
467
+ AgentEvent objects
468
+ """
469
+ if not self._graph:
470
+ raise ValueError("Graph not built")
471
+
472
+ current_node_id = self._graph.entry_node
473
+ iteration = 0
474
+
475
+ while current_node_id and current_node_id not in self._graph.exit_nodes:
476
+ # Check budget
477
+ if not context.budget_tracker.can_continue("graph_execution"):
478
+ self.logger.warning("Budget exceeded, exiting graph execution")
479
+ break
480
+
481
+ # Execute current node
482
+ iteration += 1
483
+ context.current_node = current_node_id
484
+ node = self._graph.get_node(current_node_id)
485
+
486
+ # Emit start event
487
+ yield self._emit_start_event(node, current_node_id, iteration, context)
488
+
489
+ try:
490
+ result = await self._execute_node(current_node_id, query, context)
491
+ context.set_node_result(current_node_id, result)
492
+ context.mark_visited(current_node_id)
493
+
494
+ # Yield completion event
495
+ yield self._emit_completion_event(node, current_node_id, result, iteration)
496
+
497
+ except Exception as e:
498
+ self.logger.error("Node execution failed", node_id=current_node_id, error=str(e))
499
+ yield AgentEvent(
500
+ type="error",
501
+ message=f"Node {current_node_id} failed: {e!s}",
502
+ iteration=iteration,
503
+ )
504
+ break
505
+
506
+ # Get next node(s)
507
+ next_nodes = self._get_next_node(current_node_id, context)
508
+
509
+ if not next_nodes:
510
+ # No more nodes, check if we're at exit
511
+ if current_node_id in self._graph.exit_nodes:
512
+ break
513
+ # Otherwise, we've reached a dead end
514
+ self.logger.warning("Reached dead end in graph", node_id=current_node_id)
515
+ break
516
+
517
+ current_node_id = next_nodes[0] # For now, take first next node (handle parallel later)
518
+
519
+ # Final event
520
+ final_result = context.get_node_result(current_node_id) if current_node_id else None
521
+ yield AgentEvent(
522
+ type="complete",
523
+ message=final_result if isinstance(final_result, str) else "Research completed",
524
+ data={"mode": self.mode, "iterations": iteration},
525
+ iteration=iteration,
526
+ )
527
+
528
+ async def _execute_node(self, node_id: str, query: str, context: GraphExecutionContext) -> Any:
529
+ """Execute a single node.
530
+
531
+ Args:
532
+ node_id: The node ID
533
+ query: The research query
534
+ context: Execution context
535
+
536
+ Returns:
537
+ Node execution result
538
+ """
539
+ if not self._graph:
540
+ raise ValueError("Graph not built")
541
+
542
+ node = self._graph.get_node(node_id)
543
+ if not node:
544
+ raise ValueError(f"Node {node_id} not found")
545
+
546
+ if isinstance(node, AgentNode):
547
+ return await self._execute_agent_node(node, query, context)
548
+ elif isinstance(node, StateNode):
549
+ return await self._execute_state_node(node, query, context)
550
+ elif isinstance(node, DecisionNode):
551
+ return await self._execute_decision_node(node, query, context)
552
+ elif isinstance(node, ParallelNode):
553
+ return await self._execute_parallel_node(node, query, context)
554
+ else:
555
+ raise ValueError(f"Unknown node type: {type(node)}")
556
+
557
+ async def _execute_agent_node(
558
+ self, node: AgentNode, query: str, context: GraphExecutionContext
559
+ ) -> Any:
560
+ """Execute an agent node.
561
+
562
+ Special handling for deep research nodes:
563
+ - "planner": Takes query string, returns ReportPlan
564
+ - "synthesizer": Takes query + ReportPlan + section drafts, returns final report
565
+
566
+ Args:
567
+ node: The agent node
568
+ query: The research query
569
+ context: Execution context
570
+
571
+ Returns:
572
+ Agent execution result
573
+ """
574
+ # Special handling for synthesizer node
575
+ if node.node_id == "synthesizer":
576
+ # Call LongWriterAgent.write_report() directly instead of using agent.run()
577
+ from src.agent_factory.agents import create_long_writer_agent
578
+ from src.utils.models import ReportDraft, ReportDraftSection, ReportPlan
579
+
580
+ report_plan = context.get_node_result("planner")
581
+ section_drafts = context.get_node_result("parallel_loops") or []
582
+
583
+ if not isinstance(report_plan, ReportPlan):
584
+ raise ValueError("ReportPlan not found for synthesizer")
585
+
586
+ if not section_drafts:
587
+ raise ValueError("Section drafts not found for synthesizer")
588
+
589
+ # Create ReportDraft from section drafts
590
+ report_draft = ReportDraft(
591
+ sections=[
592
+ ReportDraftSection(
593
+ section_title=section.title,
594
+ section_content=draft,
595
+ )
596
+ for section, draft in zip(
597
+ report_plan.report_outline, section_drafts, strict=False
598
+ )
599
+ ]
600
+ )
601
+
602
+ # Get LongWriterAgent instance and call write_report directly
603
+ long_writer_agent = create_long_writer_agent()
604
+ final_report = await long_writer_agent.write_report(
605
+ original_query=query,
606
+ report_title=report_plan.report_title,
607
+ report_draft=report_draft,
608
+ )
609
+
610
+ # Estimate tokens (rough estimate)
611
+ estimated_tokens = len(final_report) // 4 # Rough token estimate
612
+ context.budget_tracker.add_tokens("graph_execution", estimated_tokens)
613
+
614
+ return final_report
615
+
616
+ # Standard agent execution
617
+ # Prepare input based on node type
618
+ if node.node_id == "planner":
619
+ # Planner takes the original query
620
+ input_data = query
621
+ else:
622
+ # Standard: use previous node result or query
623
+ prev_result = context.get_node_result(context.current_node)
624
+ input_data = prev_result if prev_result is not None else query
625
+
626
+ # Apply input transformer if provided
627
+ if node.input_transformer:
628
+ input_data = node.input_transformer(input_data)
629
+
630
+ # Execute agent
631
+ result = await node.agent.run(input_data)
632
+
633
+ # Transform output if needed
634
+ output = result.output
635
+ if node.output_transformer:
636
+ output = node.output_transformer(output)
637
+
638
+ # Estimate and track tokens
639
+ if hasattr(result, "usage") and result.usage:
640
+ tokens = result.usage.total_tokens if hasattr(result.usage, "total_tokens") else 0
641
+ context.budget_tracker.add_tokens("graph_execution", tokens)
642
+
643
+ return output
644
+
645
+ async def _execute_state_node(
646
+ self, node: StateNode, query: str, context: GraphExecutionContext
647
+ ) -> Any:
648
+ """Execute a state node.
649
+
650
+ Special handling for deep research state nodes:
651
+ - "store_plan": Stores ReportPlan in context for parallel loops
652
+ - "collect_drafts": Stores section drafts in context for synthesizer
653
+
654
+ Args:
655
+ node: The state node
656
+ query: The research query
657
+ context: Execution context
658
+
659
+ Returns:
660
+ State update result
661
+ """
662
+ # Get previous result for state update
663
+ # For "store_plan", get from planner node
664
+ # For "collect_drafts", get from parallel_loops node
665
+ if node.node_id == "store_plan":
666
+ prev_result = context.get_node_result("planner")
667
+ elif node.node_id == "collect_drafts":
668
+ prev_result = context.get_node_result("parallel_loops")
669
+ else:
670
+ prev_result = context.get_node_result(context.current_node)
671
+
672
+ # Update state
673
+ updated_state = node.state_updater(context.state, prev_result)
674
+ context.state = updated_state
675
+
676
+ # Store result in context for next nodes to access
677
+ context.set_node_result(node.node_id, prev_result)
678
+
679
+ # Read state if needed
680
+ if node.state_reader:
681
+ return node.state_reader(context.state)
682
+
683
+ return prev_result # Return the stored result for next nodes
684
+
685
+ async def _execute_decision_node(
686
+ self, node: DecisionNode, query: str, context: GraphExecutionContext
687
+ ) -> str:
688
+ """Execute a decision node.
689
+
690
+ Args:
691
+ node: The decision node
692
+ query: The research query
693
+ context: Execution context
694
+
695
+ Returns:
696
+ Next node ID
697
+ """
698
+ # Get previous result for decision
699
+ prev_result = context.get_node_result(context.current_node)
700
+
701
+ # Make decision
702
+ next_node_id = node.decision_function(prev_result)
703
+
704
+ # Validate decision
705
+ if next_node_id not in node.options:
706
+ self.logger.warning(
707
+ "Decision function returned invalid node",
708
+ node_id=node.node_id,
709
+ returned=next_node_id,
710
+ options=node.options,
711
+ )
712
+ # Default to first option
713
+ next_node_id = node.options[0]
714
+
715
+ return next_node_id
716
+
717
+ async def _execute_parallel_node(
718
+ self, node: ParallelNode, query: str, context: GraphExecutionContext
719
+ ) -> list[Any]:
720
+ """Execute a parallel node.
721
+
722
+ Special handling for deep research "parallel_loops" node:
723
+ - Extracts report plan from previous node result
724
+ - Creates IterativeResearchFlow instances for each section
725
+ - Executes them in parallel
726
+ - Returns section drafts
727
+
728
+ Args:
729
+ node: The parallel node
730
+ query: The research query
731
+ context: Execution context
732
+
733
+ Returns:
734
+ List of results from parallel nodes
735
+ """
736
+ # Special handling for deep research parallel_loops node
737
+ if node.node_id == "parallel_loops":
738
+ return await self._execute_deep_research_parallel_loops(node, query, context)
739
+
740
+ # Standard parallel node execution
741
+ # Execute all parallel nodes concurrently
742
+ tasks = [
743
+ self._execute_node(parallel_node_id, query, context)
744
+ for parallel_node_id in node.parallel_nodes
745
+ ]
746
+
747
+ results = await asyncio.gather(*tasks, return_exceptions=True)
748
+
749
+ # Handle exceptions
750
+ for i, result in enumerate(results):
751
+ if isinstance(result, Exception):
752
+ self.logger.error(
753
+ "Parallel node execution failed",
754
+ node_id=node.parallel_nodes[i] if i < len(node.parallel_nodes) else "unknown",
755
+ error=str(result),
756
+ )
757
+ results[i] = None
758
+
759
+ # Aggregate if needed
760
+ if node.aggregator:
761
+ aggregated = node.aggregator(results)
762
+ # Type cast: aggregator returns Any, but we expect list[Any]
763
+ return list(aggregated) if isinstance(aggregated, list) else [aggregated]
764
+
765
+ return results
766
+
767
+ async def _execute_deep_research_parallel_loops(
768
+ self, node: ParallelNode, query: str, context: GraphExecutionContext
769
+ ) -> list[str]:
770
+ """Execute parallel iterative research loops for deep research.
771
+
772
+ Args:
773
+ node: The parallel node (should be "parallel_loops")
774
+ query: The research query
775
+ context: Execution context
776
+
777
+ Returns:
778
+ List of section draft strings
779
+ """
780
+ from src.agent_factory.judges import create_judge_handler
781
+ from src.orchestrator.research_flow import IterativeResearchFlow
782
+ from src.utils.models import ReportPlan
783
+
784
+ # Get report plan from previous node (store_plan)
785
+ # The plan should be stored in context.node_results from the planner node
786
+ planner_result = context.get_node_result("planner")
787
+ if not isinstance(planner_result, ReportPlan):
788
+ self.logger.error(
789
+ "Planner result is not a ReportPlan",
790
+ type=type(planner_result),
791
+ )
792
+ raise ValueError("Planner must return ReportPlan for deep research")
793
+
794
+ report_plan: ReportPlan = planner_result
795
+ self.logger.info(
796
+ "Executing parallel loops for deep research",
797
+ sections=len(report_plan.report_outline),
798
+ )
799
+
800
+ # Create judge handler for iterative flows
801
+ judge_handler = create_judge_handler()
802
+
803
+ # Create and execute iterative research flows for each section
804
+ async def run_section_research(section_index: int) -> str:
805
+ """Run iterative research for a single section."""
806
+ section = report_plan.report_outline[section_index]
807
+
808
+ try:
809
+ # Create iterative research flow
810
+ flow = IterativeResearchFlow(
811
+ max_iterations=self.max_iterations,
812
+ max_time_minutes=self.max_time_minutes,
813
+ verbose=False, # Less verbose in parallel execution
814
+ use_graph=False, # Use agent chains for section research
815
+ judge_handler=judge_handler,
816
+ )
817
+
818
+ # Run research for this section
819
+ section_draft = await flow.run(
820
+ query=section.key_question,
821
+ background_context=report_plan.background_context,
822
+ )
823
+
824
+ self.logger.info(
825
+ "Section research completed",
826
+ section_index=section_index,
827
+ section_title=section.title,
828
+ draft_length=len(section_draft),
829
+ )
830
+
831
+ return section_draft
832
+
833
+ except Exception as e:
834
+ self.logger.error(
835
+ "Section research failed",
836
+ section_index=section_index,
837
+ section_title=section.title,
838
+ error=str(e),
839
+ )
840
+ # Return empty string for failed sections
841
+ return f"# {section.title}\n\n[Research failed: {e!s}]"
842
+
843
+ # Execute all sections in parallel
844
+ section_drafts = await asyncio.gather(
845
+ *(run_section_research(i) for i in range(len(report_plan.report_outline))),
846
+ return_exceptions=True,
847
+ )
848
+
849
+ # Handle exceptions and filter None results
850
+ filtered_drafts: list[str] = []
851
+ for i, draft in enumerate(section_drafts):
852
+ if isinstance(draft, Exception):
853
+ self.logger.error(
854
+ "Section research exception",
855
+ section_index=i,
856
+ error=str(draft),
857
+ )
858
+ filtered_drafts.append(
859
+ f"# {report_plan.report_outline[i].title}\n\n[Research failed: {draft!s}]"
860
+ )
861
+ elif draft is not None:
862
+ # Type narrowing: after Exception check, draft is str | None
863
+ assert isinstance(draft, str), "Expected str after Exception check"
864
+ filtered_drafts.append(draft)
865
+
866
+ self.logger.info(
867
+ "Parallel loops completed",
868
+ sections=len(filtered_drafts),
869
+ total_sections=len(report_plan.report_outline),
870
+ )
871
+
872
+ return filtered_drafts
873
+
874
+ def _get_next_node(self, node_id: str, context: GraphExecutionContext) -> list[str]:
875
+ """Get next node(s) from current node.
876
+
877
+ Args:
878
+ node_id: Current node ID
879
+ context: Execution context
880
+
881
+ Returns:
882
+ List of next node IDs
883
+ """
884
+ if not self._graph:
885
+ return []
886
+
887
+ # Get node result for condition evaluation
888
+ node_result = context.get_node_result(node_id)
889
+
890
+ # Get next nodes
891
+ next_nodes = self._graph.get_next_nodes(node_id, context=node_result)
892
+
893
+ # If this was a decision node, use its result
894
+ node = self._graph.get_node(node_id)
895
+ if isinstance(node, DecisionNode):
896
+ decision_result = node_result
897
+ if isinstance(decision_result, str):
898
+ return [decision_result]
899
+
900
+ # Return next node IDs
901
+ return [next_node_id for next_node_id, _ in next_nodes]
902
+
903
+ async def _detect_research_mode(self, query: str) -> Literal["iterative", "deep"]:
904
+ """
905
+ Detect research mode from query using input parser agent.
906
+
907
+ Uses input parser agent to analyze query and determine research mode.
908
+ Falls back to heuristic if parser fails.
909
+
910
+ Args:
911
+ query: The research query
912
+
913
+ Returns:
914
+ Detected research mode
915
+ """
916
+ try:
917
+ # Use input parser agent for intelligent mode detection
918
+ input_parser = create_input_parser_agent()
919
+ parsed_query = await input_parser.parse(query)
920
+ self.logger.info(
921
+ "Research mode detected by input parser",
922
+ mode=parsed_query.research_mode,
923
+ query=query[:100],
924
+ )
925
+ return parsed_query.research_mode
926
+ except Exception as e:
927
+ # Fallback to heuristic if parser fails
928
+ self.logger.warning(
929
+ "Input parser failed, using heuristic",
930
+ error=str(e),
931
+ query=query[:100],
932
+ )
933
+ query_lower = query.lower()
934
+ if any(
935
+ keyword in query_lower
936
+ for keyword in [
937
+ "section",
938
+ "sections",
939
+ "report",
940
+ "outline",
941
+ "structure",
942
+ "comprehensive",
943
+ "analyze",
944
+ "analysis",
945
+ ]
946
+ ):
947
+ return "deep"
948
+ return "iterative"
949
+
950
+
951
+ def create_graph_orchestrator(
952
+ mode: Literal["iterative", "deep", "auto"] = "auto",
953
+ max_iterations: int = 5,
954
+ max_time_minutes: int = 10,
955
+ use_graph: bool = True,
956
+ ) -> GraphOrchestrator:
957
+ """
958
+ Factory function to create a graph orchestrator.
959
+
960
+ Args:
961
+ mode: Research mode
962
+ max_iterations: Maximum iterations per loop
963
+ max_time_minutes: Maximum time per loop
964
+ use_graph: Whether to use graph execution (True) or agent chains (False)
965
+
966
+ Returns:
967
+ Configured GraphOrchestrator instance
968
+ """
969
+ return GraphOrchestrator(
970
+ mode=mode,
971
+ max_iterations=max_iterations,
972
+ max_time_minutes=max_time_minutes,
973
+ use_graph=use_graph,
974
+ )
src/orchestrator/planner_agent.py ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Planner agent for creating report plans with sections and background context.
2
+
3
+ Converts the folder/planner_agent.py implementation to use Pydantic AI.
4
+ """
5
+
6
+ from datetime import datetime
7
+ from typing import Any
8
+
9
+ import structlog
10
+ from pydantic_ai import Agent
11
+
12
+ from src.agent_factory.judges import get_model
13
+ from src.tools.crawl_adapter import crawl_website
14
+ from src.tools.web_search_adapter import web_search
15
+ from src.utils.exceptions import ConfigurationError, JudgeError
16
+ from src.utils.models import ReportPlan, ReportPlanSection
17
+
18
+ logger = structlog.get_logger()
19
+
20
+
21
+ # System prompt for the planner agent
22
+ SYSTEM_PROMPT = f"""
23
+ You are a research manager, managing a team of research agents. Today's date is {datetime.now().strftime("%Y-%m-%d")}.
24
+ Given a research query, your job is to produce an initial outline of the report (section titles and key questions),
25
+ as well as some background context. Each section will be assigned to a different researcher in your team who will then
26
+ carry out research on the section.
27
+
28
+ You will be given:
29
+ - An initial research query
30
+
31
+ Your task is to:
32
+ 1. Produce 1-2 paragraphs of initial background context (if needed) on the query by running web searches or crawling websites
33
+ 2. Produce an outline of the report that includes a list of section titles and the key question to be addressed in each section
34
+ 3. Provide a title for the report that will be used as the main heading
35
+
36
+ Guidelines:
37
+ - Each section should cover a single topic/question that is independent of other sections
38
+ - The key question for each section should include both the NAME and DOMAIN NAME / WEBSITE (if available and applicable) if it is related to a company, product or similar
39
+ - The background_context should not be more than 2 paragraphs
40
+ - The background_context should be very specific to the query and include any information that is relevant for researchers across all sections of the report
41
+ - The background_context should be drawn only from web search or crawl results rather than prior knowledge (i.e. it should only be included if you have called tools)
42
+ - For example, if the query is about a company, the background context should include some basic information about what the company does
43
+ - DO NOT do more than 2 tool calls
44
+
45
+ Only output JSON. Follow the JSON schema for ReportPlan. Do not output anything else.
46
+ """
47
+
48
+
49
+ class PlannerAgent:
50
+ """
51
+ Planner agent that creates report plans with sections and background context.
52
+
53
+ Uses Pydantic AI to generate structured ReportPlan output with optional
54
+ web search and crawl tool usage for background context.
55
+ """
56
+
57
+ def __init__(
58
+ self,
59
+ model: Any | None = None,
60
+ web_search_tool: Any | None = None,
61
+ crawl_tool: Any | None = None,
62
+ ) -> None:
63
+ """
64
+ Initialize the planner agent.
65
+
66
+ Args:
67
+ model: Optional Pydantic AI model. If None, uses config default.
68
+ web_search_tool: Optional web search tool function. If None, uses default.
69
+ crawl_tool: Optional crawl tool function. If None, uses default.
70
+ """
71
+ self.model = model or get_model()
72
+ self.web_search_tool = web_search_tool or web_search
73
+ self.crawl_tool = crawl_tool or crawl_website
74
+ self.logger = logger
75
+
76
+ # Validate tools are callable
77
+ if not callable(self.web_search_tool):
78
+ raise ConfigurationError("web_search_tool must be callable")
79
+ if not callable(self.crawl_tool):
80
+ raise ConfigurationError("crawl_tool must be callable")
81
+
82
+ # Initialize Pydantic AI Agent
83
+ self.agent = Agent(
84
+ model=self.model,
85
+ output_type=ReportPlan,
86
+ system_prompt=SYSTEM_PROMPT,
87
+ tools=[self.web_search_tool, self.crawl_tool],
88
+ retries=3,
89
+ )
90
+
91
+ async def run(self, query: str) -> ReportPlan:
92
+ """
93
+ Run the planner agent to generate a report plan.
94
+
95
+ Args:
96
+ query: The user's research query
97
+
98
+ Returns:
99
+ ReportPlan with sections, background context, and report title
100
+
101
+ Raises:
102
+ JudgeError: If planning fails after retries
103
+ ConfigurationError: If agent configuration is invalid
104
+ """
105
+ self.logger.info("Starting report planning", query=query[:100])
106
+
107
+ user_message = f"QUERY: {query}"
108
+
109
+ try:
110
+ # Run the agent
111
+ result = await self.agent.run(user_message)
112
+ report_plan = result.output
113
+
114
+ # Validate report plan
115
+ if not report_plan.report_outline:
116
+ self.logger.warning("Report plan has no sections", query=query[:100])
117
+ # Return fallback plan instead of raising error
118
+ return ReportPlan(
119
+ background_context=report_plan.background_context or "",
120
+ report_outline=[
121
+ ReportPlanSection(
122
+ title="Overview",
123
+ key_question=query,
124
+ )
125
+ ],
126
+ report_title=report_plan.report_title or f"Research Report: {query[:50]}",
127
+ )
128
+
129
+ if not report_plan.report_title:
130
+ self.logger.warning("Report plan has no title", query=query[:100])
131
+ raise JudgeError("Report plan must have a title")
132
+
133
+ self.logger.info(
134
+ "Report plan created",
135
+ sections=len(report_plan.report_outline),
136
+ has_background=bool(report_plan.background_context),
137
+ )
138
+
139
+ return report_plan
140
+
141
+ except Exception as e:
142
+ self.logger.error("Planning failed", error=str(e), query=query[:100])
143
+
144
+ # Fallback: return minimal report plan
145
+ if isinstance(e, JudgeError | ConfigurationError):
146
+ raise
147
+
148
+ # For other errors, return a minimal plan
149
+ return ReportPlan(
150
+ background_context="",
151
+ report_outline=[
152
+ ReportPlanSection(
153
+ title="Research Findings",
154
+ key_question=query,
155
+ )
156
+ ],
157
+ report_title=f"Research Report: {query[:50]}",
158
+ )
159
+
160
+
161
+ def create_planner_agent(model: Any | None = None) -> PlannerAgent:
162
+ """
163
+ Factory function to create a planner agent.
164
+
165
+ Args:
166
+ model: Optional Pydantic AI model. If None, uses settings default.
167
+
168
+ Returns:
169
+ Configured PlannerAgent instance
170
+
171
+ Raises:
172
+ ConfigurationError: If required API keys are missing
173
+ """
174
+ try:
175
+ # Get model from settings if not provided
176
+ if model is None:
177
+ model = get_model()
178
+
179
+ # Create and return planner agent
180
+ return PlannerAgent(model=model)
181
+
182
+ except Exception as e:
183
+ logger.error("Failed to create planner agent", error=str(e))
184
+ raise ConfigurationError(f"Failed to create planner agent: {e}") from e
src/orchestrator/research_flow.py ADDED
@@ -0,0 +1,999 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Research flow implementations for iterative and deep research patterns.
2
+
3
+ Converts the folder/iterative_research.py and folder/deep_research.py
4
+ implementations to use Pydantic AI agents.
5
+ """
6
+
7
+ import asyncio
8
+ import time
9
+ from typing import Any
10
+
11
+ import structlog
12
+
13
+ from src.agent_factory.agents import (
14
+ create_graph_orchestrator,
15
+ create_knowledge_gap_agent,
16
+ create_long_writer_agent,
17
+ create_planner_agent,
18
+ create_proofreader_agent,
19
+ create_thinking_agent,
20
+ create_tool_selector_agent,
21
+ create_writer_agent,
22
+ )
23
+ from src.agent_factory.judges import create_judge_handler
24
+ from src.middleware.budget_tracker import BudgetTracker
25
+ from src.middleware.state_machine import get_workflow_state, init_workflow_state
26
+ from src.middleware.workflow_manager import WorkflowManager
27
+ from src.services.llamaindex_rag import LlamaIndexRAGService, get_rag_service
28
+ from src.tools.tool_executor import execute_tool_tasks
29
+ from src.utils.exceptions import ConfigurationError
30
+ from src.utils.models import (
31
+ AgentSelectionPlan,
32
+ AgentTask,
33
+ Citation,
34
+ Conversation,
35
+ Evidence,
36
+ JudgeAssessment,
37
+ KnowledgeGapOutput,
38
+ ReportDraft,
39
+ ReportDraftSection,
40
+ ReportPlan,
41
+ SourceName,
42
+ ToolAgentOutput,
43
+ )
44
+
45
+ logger = structlog.get_logger()
46
+
47
+
48
+ class IterativeResearchFlow:
49
+ """
50
+ Iterative research flow that runs a single research loop.
51
+
52
+ Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Repeat
53
+ until research is complete or constraints are met.
54
+ """
55
+
56
+ def __init__(
57
+ self,
58
+ max_iterations: int = 5,
59
+ max_time_minutes: int = 10,
60
+ verbose: bool = True,
61
+ use_graph: bool = False,
62
+ judge_handler: Any | None = None,
63
+ ) -> None:
64
+ """
65
+ Initialize iterative research flow.
66
+
67
+ Args:
68
+ max_iterations: Maximum number of iterations
69
+ max_time_minutes: Maximum time in minutes
70
+ verbose: Whether to log progress
71
+ use_graph: Whether to use graph-based execution (True) or agent chains (False)
72
+ """
73
+ self.max_iterations = max_iterations
74
+ self.max_time_minutes = max_time_minutes
75
+ self.verbose = verbose
76
+ self.use_graph = use_graph
77
+ self.logger = logger
78
+
79
+ # Initialize agents (only needed for agent chain execution)
80
+ if not use_graph:
81
+ self.knowledge_gap_agent = create_knowledge_gap_agent()
82
+ self.tool_selector_agent = create_tool_selector_agent()
83
+ self.thinking_agent = create_thinking_agent()
84
+ self.writer_agent = create_writer_agent()
85
+ # Initialize judge handler (use provided or create new)
86
+ self.judge_handler = judge_handler or create_judge_handler()
87
+
88
+ # Initialize state (only needed for agent chain execution)
89
+ if not use_graph:
90
+ self.conversation = Conversation()
91
+ self.iteration = 0
92
+ self.start_time: float | None = None
93
+ self.should_continue = True
94
+
95
+ # Initialize budget tracker
96
+ self.budget_tracker = BudgetTracker()
97
+ self.loop_id = "iterative_flow"
98
+ self.budget_tracker.create_budget(
99
+ loop_id=self.loop_id,
100
+ tokens_limit=100000,
101
+ time_limit_seconds=max_time_minutes * 60,
102
+ iterations_limit=max_iterations,
103
+ )
104
+ self.budget_tracker.start_timer(self.loop_id)
105
+
106
+ # Initialize RAG service (lazy, may be None if unavailable)
107
+ self._rag_service: LlamaIndexRAGService | None = None
108
+
109
+ # Graph orchestrator (lazy initialization)
110
+ self._graph_orchestrator: Any = None
111
+
112
+ async def run(
113
+ self,
114
+ query: str,
115
+ background_context: str = "",
116
+ output_length: str = "",
117
+ output_instructions: str = "",
118
+ ) -> str:
119
+ """
120
+ Run the iterative research flow.
121
+
122
+ Args:
123
+ query: The research query
124
+ background_context: Optional background context
125
+ output_length: Optional description of desired output length
126
+ output_instructions: Optional additional instructions
127
+
128
+ Returns:
129
+ Final report string
130
+ """
131
+ if self.use_graph:
132
+ return await self._run_with_graph(
133
+ query, background_context, output_length, output_instructions
134
+ )
135
+ else:
136
+ return await self._run_with_chains(
137
+ query, background_context, output_length, output_instructions
138
+ )
139
+
140
+ async def _run_with_chains(
141
+ self,
142
+ query: str,
143
+ background_context: str = "",
144
+ output_length: str = "",
145
+ output_instructions: str = "",
146
+ ) -> str:
147
+ """
148
+ Run the iterative research flow using agent chains.
149
+
150
+ Args:
151
+ query: The research query
152
+ background_context: Optional background context
153
+ output_length: Optional description of desired output length
154
+ output_instructions: Optional additional instructions
155
+
156
+ Returns:
157
+ Final report string
158
+ """
159
+ self.start_time = time.time()
160
+ self.logger.info("Starting iterative research (agent chains)", query=query[:100])
161
+
162
+ # Initialize conversation with first iteration
163
+ self.conversation.add_iteration()
164
+
165
+ # Main research loop
166
+ while self.should_continue and self._check_constraints():
167
+ self.iteration += 1
168
+ self.logger.info("Starting iteration", iteration=self.iteration)
169
+
170
+ # Add new iteration to conversation
171
+ self.conversation.add_iteration()
172
+
173
+ # 1. Generate observations
174
+ await self._generate_observations(query, background_context)
175
+
176
+ # 2. Evaluate gaps
177
+ evaluation = await self._evaluate_gaps(query, background_context)
178
+
179
+ # 3. Assess with judge (after tools execute, we'll assess again)
180
+ # For now, check knowledge gap evaluation
181
+ # After tool execution, we'll do a full judge assessment
182
+
183
+ # Check if research is complete (knowledge gap agent says complete)
184
+ if evaluation.research_complete:
185
+ self.should_continue = False
186
+ self.logger.info("Research marked as complete by knowledge gap agent")
187
+ break
188
+
189
+ # 4. Select tools for next gap
190
+ next_gap = evaluation.outstanding_gaps[0] if evaluation.outstanding_gaps else query
191
+ selection_plan = await self._select_agents(next_gap, query, background_context)
192
+
193
+ # 5. Execute tools
194
+ await self._execute_tools(selection_plan.tasks)
195
+
196
+ # 6. Assess evidence sufficiency with judge
197
+ judge_assessment = await self._assess_with_judge(query)
198
+
199
+ # Check if judge says evidence is sufficient
200
+ if judge_assessment.sufficient:
201
+ self.should_continue = False
202
+ self.logger.info(
203
+ "Research marked as complete by judge",
204
+ confidence=judge_assessment.confidence,
205
+ reasoning=judge_assessment.reasoning[:100],
206
+ )
207
+ break
208
+
209
+ # Update budget tracker
210
+ self.budget_tracker.increment_iteration(self.loop_id)
211
+ self.budget_tracker.update_timer(self.loop_id)
212
+
213
+ # Create final report
214
+ report = await self._create_final_report(query, output_length, output_instructions)
215
+
216
+ elapsed = time.time() - (self.start_time or time.time())
217
+ self.logger.info(
218
+ "Iterative research completed",
219
+ iterations=self.iteration,
220
+ elapsed_minutes=elapsed / 60,
221
+ )
222
+
223
+ return report
224
+
225
+ async def _run_with_graph(
226
+ self,
227
+ query: str,
228
+ background_context: str = "",
229
+ output_length: str = "",
230
+ output_instructions: str = "",
231
+ ) -> str:
232
+ """
233
+ Run the iterative research flow using graph execution.
234
+
235
+ Args:
236
+ query: The research query
237
+ background_context: Optional background context (currently ignored in graph execution)
238
+ output_length: Optional description of desired output length (currently ignored in graph execution)
239
+ output_instructions: Optional additional instructions (currently ignored in graph execution)
240
+
241
+ Returns:
242
+ Final report string
243
+ """
244
+ self.logger.info("Starting iterative research (graph execution)", query=query[:100])
245
+
246
+ # Create graph orchestrator (lazy initialization)
247
+ if self._graph_orchestrator is None:
248
+ self._graph_orchestrator = create_graph_orchestrator(
249
+ mode="iterative",
250
+ max_iterations=self.max_iterations,
251
+ max_time_minutes=self.max_time_minutes,
252
+ use_graph=True,
253
+ )
254
+
255
+ # Run orchestrator and collect events
256
+ final_report = ""
257
+ async for event in self._graph_orchestrator.run(query):
258
+ if event.type == "complete":
259
+ final_report = event.message
260
+ break
261
+ elif event.type == "error":
262
+ self.logger.error("Graph execution error", error=event.message)
263
+ raise RuntimeError(f"Graph execution failed: {event.message}")
264
+
265
+ if not final_report:
266
+ self.logger.warning("No complete event received from graph orchestrator")
267
+ final_report = "Research completed but no report was generated."
268
+
269
+ self.logger.info("Iterative research completed (graph execution)")
270
+
271
+ return final_report
272
+
273
+ def _check_constraints(self) -> bool:
274
+ """Check if we've exceeded constraints."""
275
+ if self.iteration >= self.max_iterations:
276
+ self.logger.info("Max iterations reached", max=self.max_iterations)
277
+ return False
278
+
279
+ if self.start_time:
280
+ elapsed_minutes = (time.time() - self.start_time) / 60
281
+ if elapsed_minutes >= self.max_time_minutes:
282
+ self.logger.info("Max time reached", max=self.max_time_minutes)
283
+ return False
284
+
285
+ # Check budget tracker
286
+ self.budget_tracker.update_timer(self.loop_id)
287
+ exceeded, reason = self.budget_tracker.check_budget(self.loop_id)
288
+ if exceeded:
289
+ self.logger.info("Budget exceeded", reason=reason)
290
+ return False
291
+
292
+ return True
293
+
294
+ async def _generate_observations(self, query: str, background_context: str = "") -> str:
295
+ """Generate observations from current research state."""
296
+ # Build input prompt for token estimation
297
+ conversation_history = self.conversation.compile_conversation_history()
298
+ # Build background context section separately to avoid backslash in f-string
299
+ background_section = (
300
+ f"BACKGROUND CONTEXT:\n{background_context}\n\n" if background_context else ""
301
+ )
302
+ input_prompt = f"""
303
+ You are starting iteration {self.iteration} of your research process.
304
+
305
+ ORIGINAL QUERY:
306
+ {query}
307
+
308
+ {background_section}HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
309
+ {conversation_history or "No previous actions, findings or thoughts available."}
310
+ """
311
+
312
+ observations = await self.thinking_agent.generate_observations(
313
+ query=query,
314
+ background_context=background_context,
315
+ conversation_history=conversation_history,
316
+ iteration=self.iteration,
317
+ )
318
+
319
+ # Track tokens for this iteration
320
+ estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(input_prompt, observations)
321
+ self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
322
+ self.logger.debug(
323
+ "Tokens tracked for thinking agent",
324
+ iteration=self.iteration,
325
+ tokens=estimated_tokens,
326
+ )
327
+
328
+ self.conversation.set_latest_thought(observations)
329
+ return observations
330
+
331
+ async def _evaluate_gaps(self, query: str, background_context: str = "") -> KnowledgeGapOutput:
332
+ """Evaluate knowledge gaps in current research."""
333
+ if self.start_time:
334
+ elapsed_minutes = (time.time() - self.start_time) / 60
335
+ else:
336
+ elapsed_minutes = 0.0
337
+
338
+ # Build input prompt for token estimation
339
+ conversation_history = self.conversation.compile_conversation_history()
340
+ background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
341
+ input_prompt = f"""
342
+ Current Iteration Number: {self.iteration}
343
+ Time Elapsed: {elapsed_minutes:.2f} minutes of maximum {self.max_time_minutes} minutes
344
+
345
+ ORIGINAL QUERY:
346
+ {query}
347
+
348
+ {background}
349
+
350
+ HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
351
+ {conversation_history or "No previous actions, findings or thoughts available."}
352
+ """
353
+
354
+ evaluation = await self.knowledge_gap_agent.evaluate(
355
+ query=query,
356
+ background_context=background_context,
357
+ conversation_history=conversation_history,
358
+ iteration=self.iteration,
359
+ time_elapsed_minutes=elapsed_minutes,
360
+ max_time_minutes=self.max_time_minutes,
361
+ )
362
+
363
+ # Track tokens for this iteration
364
+ evaluation_text = f"research_complete={evaluation.research_complete}, gaps={len(evaluation.outstanding_gaps)}"
365
+ estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
366
+ input_prompt, evaluation_text
367
+ )
368
+ self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
369
+ self.logger.debug(
370
+ "Tokens tracked for knowledge gap agent",
371
+ iteration=self.iteration,
372
+ tokens=estimated_tokens,
373
+ )
374
+
375
+ if not evaluation.research_complete and evaluation.outstanding_gaps:
376
+ self.conversation.set_latest_gap(evaluation.outstanding_gaps[0])
377
+
378
+ return evaluation
379
+
380
+ async def _assess_with_judge(self, query: str) -> JudgeAssessment:
381
+ """Assess evidence sufficiency using JudgeHandler.
382
+
383
+ Args:
384
+ query: The research query
385
+
386
+ Returns:
387
+ JudgeAssessment with sufficiency evaluation
388
+ """
389
+ state = get_workflow_state()
390
+ evidence = state.evidence # Get all collected evidence
391
+
392
+ self.logger.info(
393
+ "Assessing evidence with judge",
394
+ query=query[:100],
395
+ evidence_count=len(evidence),
396
+ )
397
+
398
+ assessment = await self.judge_handler.assess(query, evidence)
399
+
400
+ # Track tokens for judge call
401
+ # Estimate tokens from query + evidence + assessment
402
+ evidence_text = "\n".join([e.content[:500] for e in evidence[:10]]) # Sample
403
+ estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
404
+ query + evidence_text, str(assessment.reasoning)
405
+ )
406
+ self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
407
+
408
+ self.logger.info(
409
+ "Judge assessment complete",
410
+ sufficient=assessment.sufficient,
411
+ confidence=assessment.confidence,
412
+ recommendation=assessment.recommendation,
413
+ )
414
+
415
+ return assessment
416
+
417
+ async def _select_agents(
418
+ self, gap: str, query: str, background_context: str = ""
419
+ ) -> AgentSelectionPlan:
420
+ """Select tools to address knowledge gap."""
421
+ # Build input prompt for token estimation
422
+ conversation_history = self.conversation.compile_conversation_history()
423
+ background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
424
+ input_prompt = f"""
425
+ ORIGINAL QUERY:
426
+ {query}
427
+
428
+ KNOWLEDGE GAP TO ADDRESS:
429
+ {gap}
430
+
431
+ {background}
432
+
433
+ HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
434
+ {conversation_history or "No previous actions, findings or thoughts available."}
435
+ """
436
+
437
+ selection_plan = await self.tool_selector_agent.select_tools(
438
+ gap=gap,
439
+ query=query,
440
+ background_context=background_context,
441
+ conversation_history=conversation_history,
442
+ )
443
+
444
+ # Track tokens for this iteration
445
+ selection_text = f"tasks={len(selection_plan.tasks)}, agents={[task.agent for task in selection_plan.tasks]}"
446
+ estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
447
+ input_prompt, selection_text
448
+ )
449
+ self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
450
+ self.logger.debug(
451
+ "Tokens tracked for tool selector agent",
452
+ iteration=self.iteration,
453
+ tokens=estimated_tokens,
454
+ )
455
+
456
+ # Store tool calls in conversation
457
+ tool_calls = [
458
+ f"[Agent] {task.agent} [Query] {task.query} [Entity] {task.entity_website or 'null'}"
459
+ for task in selection_plan.tasks
460
+ ]
461
+ self.conversation.set_latest_tool_calls(tool_calls)
462
+
463
+ return selection_plan
464
+
465
+ def _get_rag_service(self) -> LlamaIndexRAGService | None:
466
+ """
467
+ Get or create RAG service instance.
468
+
469
+ Returns:
470
+ RAG service instance, or None if unavailable
471
+ """
472
+ if self._rag_service is None:
473
+ try:
474
+ self._rag_service = get_rag_service()
475
+ self.logger.info("RAG service initialized for research flow")
476
+ except (ConfigurationError, ImportError) as e:
477
+ self.logger.warning(
478
+ "RAG service unavailable", error=str(e), hint="OPENAI_API_KEY required"
479
+ )
480
+ return None
481
+ return self._rag_service
482
+
483
+ async def _execute_tools(self, tasks: list[AgentTask]) -> dict[str, ToolAgentOutput]:
484
+ """Execute selected tools concurrently."""
485
+ try:
486
+ results = await execute_tool_tasks(tasks)
487
+ except Exception as e:
488
+ # Handle tool execution errors gracefully
489
+ self.logger.error(
490
+ "Tool execution failed",
491
+ error=str(e),
492
+ task_count=len(tasks),
493
+ exc_info=True,
494
+ )
495
+ # Return empty results to allow research flow to continue
496
+ # The flow can still generate a report based on previous iterations
497
+ results = {}
498
+
499
+ # Store findings in conversation (only if we have results)
500
+ evidence_list: list[Evidence] = []
501
+ if results:
502
+ findings = [result.output for result in results.values()]
503
+ self.conversation.set_latest_findings(findings)
504
+
505
+ # Convert tool outputs to Evidence objects and store in workflow state
506
+ evidence_list = self._convert_tool_outputs_to_evidence(results)
507
+
508
+ if evidence_list:
509
+ state = get_workflow_state()
510
+ added_count = state.add_evidence(evidence_list)
511
+ self.logger.info(
512
+ "Evidence added to workflow state",
513
+ count=added_count,
514
+ total_evidence=len(state.evidence),
515
+ )
516
+
517
+ # Ingest evidence into RAG if available (Phase 6 requirement)
518
+ rag_service = self._get_rag_service()
519
+ if rag_service is not None:
520
+ try:
521
+ # ingest_evidence is synchronous, run in executor to avoid blocking
522
+ loop = asyncio.get_event_loop()
523
+ await loop.run_in_executor(None, rag_service.ingest_evidence, evidence_list)
524
+ self.logger.info(
525
+ "Evidence ingested into RAG",
526
+ count=len(evidence_list),
527
+ )
528
+ except Exception as e:
529
+ # Don't fail the research loop if RAG ingestion fails
530
+ self.logger.warning(
531
+ "Failed to ingest evidence into RAG",
532
+ error=str(e),
533
+ count=len(evidence_list),
534
+ )
535
+
536
+ return results
537
+
538
+ def _convert_tool_outputs_to_evidence(
539
+ self, tool_results: dict[str, ToolAgentOutput]
540
+ ) -> list[Evidence]:
541
+ """Convert ToolAgentOutput to Evidence objects.
542
+
543
+ Args:
544
+ tool_results: Dictionary of tool execution results
545
+
546
+ Returns:
547
+ List of Evidence objects
548
+ """
549
+ evidence_list = []
550
+ for key, result in tool_results.items():
551
+ # Extract URLs from sources
552
+ if result.sources:
553
+ # Create one Evidence object per source URL
554
+ for url in result.sources:
555
+ # Determine source type from URL or tool name
556
+ # Default to "web" for unknown web sources
557
+ source_type: SourceName = "web"
558
+ if "pubmed" in url.lower() or "ncbi" in url.lower():
559
+ source_type = "pubmed"
560
+ elif "clinicaltrials" in url.lower():
561
+ source_type = "clinicaltrials"
562
+ elif "europepmc" in url.lower():
563
+ source_type = "europepmc"
564
+ elif "biorxiv" in url.lower():
565
+ source_type = "biorxiv"
566
+ elif "arxiv" in url.lower() or "preprint" in url.lower():
567
+ source_type = "preprint"
568
+ # Note: "web" is now a valid SourceName for general web sources
569
+
570
+ citation = Citation(
571
+ title=f"Tool Result: {key}",
572
+ url=url,
573
+ source=source_type,
574
+ date="n.d.",
575
+ authors=[],
576
+ )
577
+ # Truncate content to reasonable length for judge (1500 chars)
578
+ content = result.output[:1500]
579
+ if len(result.output) > 1500:
580
+ content += "... [truncated]"
581
+
582
+ evidence = Evidence(
583
+ content=content,
584
+ citation=citation,
585
+ relevance=0.5, # Default relevance
586
+ )
587
+ evidence_list.append(evidence)
588
+ else:
589
+ # No URLs, create a single Evidence object with tool output
590
+ # Use a placeholder URL based on the tool name
591
+ # Determine source type from tool name
592
+ tool_source_type: SourceName = "web" # Default for unknown sources
593
+ if "RAG" in key:
594
+ tool_source_type = "rag"
595
+ elif "WebSearch" in key or "SiteCrawler" in key:
596
+ tool_source_type = "web"
597
+ # "web" is now a valid SourceName for general web sources
598
+
599
+ citation = Citation(
600
+ title=f"Tool Result: {key}",
601
+ url=f"tool://{key}",
602
+ source=tool_source_type,
603
+ date="n.d.",
604
+ authors=[],
605
+ )
606
+ content = result.output[:1500]
607
+ if len(result.output) > 1500:
608
+ content += "... [truncated]"
609
+
610
+ evidence = Evidence(
611
+ content=content,
612
+ citation=citation,
613
+ relevance=0.5,
614
+ )
615
+ evidence_list.append(evidence)
616
+
617
+ return evidence_list
618
+
619
+ async def _create_final_report(
620
+ self, query: str, length: str = "", instructions: str = ""
621
+ ) -> str:
622
+ """Create final report from all findings."""
623
+ all_findings = "\n\n".join(self.conversation.get_all_findings())
624
+ if not all_findings:
625
+ all_findings = "No findings available yet."
626
+
627
+ # Build input prompt for token estimation
628
+ length_str = f"* The full response should be approximately {length}.\n" if length else ""
629
+ instructions_str = f"* {instructions}" if instructions else ""
630
+ guidelines_str = (
631
+ ("\n\nGUIDELINES:\n" + length_str + instructions_str).strip("\n")
632
+ if length or instructions
633
+ else ""
634
+ )
635
+ input_prompt = f"""
636
+ Provide a response based on the query and findings below with as much detail as possible. {guidelines_str}
637
+
638
+ QUERY: {query}
639
+
640
+ FINDINGS:
641
+ {all_findings}
642
+ """
643
+
644
+ report = await self.writer_agent.write_report(
645
+ query=query,
646
+ findings=all_findings,
647
+ output_length=length,
648
+ output_instructions=instructions,
649
+ )
650
+
651
+ # Track tokens for final report (not per iteration, just total)
652
+ estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(input_prompt, report)
653
+ self.budget_tracker.add_tokens(self.loop_id, estimated_tokens)
654
+ self.logger.debug(
655
+ "Tokens tracked for writer agent (final report)",
656
+ tokens=estimated_tokens,
657
+ )
658
+
659
+ # Note: Citation validation for markdown reports would require Evidence objects
660
+ # Currently, findings are strings, not Evidence objects. For full validation,
661
+ # consider using ResearchReport format or passing Evidence objects separately.
662
+ # See src/utils/citation_validator.py for markdown citation validation utilities.
663
+
664
+ return report
665
+
666
+
667
+ class DeepResearchFlow:
668
+ """
669
+ Deep research flow that runs parallel iterative loops per section.
670
+
671
+ Pattern: Plan → Parallel Iterative Loops (one per section) → Synthesis
672
+ """
673
+
674
+ def __init__(
675
+ self,
676
+ max_iterations: int = 5,
677
+ max_time_minutes: int = 10,
678
+ verbose: bool = True,
679
+ use_long_writer: bool = True,
680
+ use_graph: bool = False,
681
+ ) -> None:
682
+ """
683
+ Initialize deep research flow.
684
+
685
+ Args:
686
+ max_iterations: Maximum iterations per section
687
+ max_time_minutes: Maximum time per section
688
+ verbose: Whether to log progress
689
+ use_long_writer: Whether to use long writer (True) or proofreader (False)
690
+ use_graph: Whether to use graph-based execution (True) or agent chains (False)
691
+ """
692
+ self.max_iterations = max_iterations
693
+ self.max_time_minutes = max_time_minutes
694
+ self.verbose = verbose
695
+ self.use_long_writer = use_long_writer
696
+ self.use_graph = use_graph
697
+ self.logger = logger
698
+
699
+ # Initialize agents (only needed for agent chain execution)
700
+ if not use_graph:
701
+ self.planner_agent = create_planner_agent()
702
+ self.long_writer_agent = create_long_writer_agent()
703
+ self.proofreader_agent = create_proofreader_agent()
704
+ # Initialize judge handler for section loop completion
705
+ self.judge_handler = create_judge_handler()
706
+ # Initialize budget tracker for token tracking
707
+ self.budget_tracker = BudgetTracker()
708
+ self.loop_id = "deep_research_flow"
709
+ self.budget_tracker.create_budget(
710
+ loop_id=self.loop_id,
711
+ tokens_limit=200000, # Higher limit for deep research
712
+ time_limit_seconds=max_time_minutes
713
+ * 60
714
+ * 2, # Allow more time for parallel sections
715
+ iterations_limit=max_iterations * 10, # Allow for multiple sections
716
+ )
717
+ self.budget_tracker.start_timer(self.loop_id)
718
+
719
+ # Graph orchestrator (lazy initialization)
720
+ self._graph_orchestrator: Any = None
721
+
722
+ async def run(self, query: str) -> str:
723
+ """
724
+ Run the deep research flow.
725
+
726
+ Args:
727
+ query: The research query
728
+
729
+ Returns:
730
+ Final report string
731
+ """
732
+ if self.use_graph:
733
+ return await self._run_with_graph(query)
734
+ else:
735
+ return await self._run_with_chains(query)
736
+
737
+ async def _run_with_chains(self, query: str) -> str:
738
+ """
739
+ Run the deep research flow using agent chains.
740
+
741
+ Args:
742
+ query: The research query
743
+
744
+ Returns:
745
+ Final report string
746
+ """
747
+ self.logger.info("Starting deep research (agent chains)", query=query[:100])
748
+
749
+ # Initialize workflow state for deep research
750
+ try:
751
+ from src.services.embeddings import get_embedding_service
752
+
753
+ embedding_service = get_embedding_service()
754
+ except (ImportError, Exception):
755
+ # If embedding service is unavailable, initialize without it
756
+ embedding_service = None
757
+ self.logger.debug("Embedding service unavailable, initializing state without it")
758
+
759
+ init_workflow_state(embedding_service=embedding_service)
760
+ self.logger.debug("Workflow state initialized for deep research")
761
+
762
+ # 1. Build report plan
763
+ report_plan = await self._build_report_plan(query)
764
+ self.logger.info(
765
+ "Report plan created",
766
+ sections=len(report_plan.report_outline),
767
+ title=report_plan.report_title,
768
+ )
769
+
770
+ # 2. Run parallel research loops with state synchronization
771
+ section_drafts = await self._run_research_loops(report_plan)
772
+
773
+ # Verify state synchronization - log evidence count
774
+ state = get_workflow_state()
775
+ self.logger.info(
776
+ "State synchronization complete",
777
+ total_evidence=len(state.evidence),
778
+ sections_completed=len(section_drafts),
779
+ )
780
+
781
+ # 3. Create final report
782
+ final_report = await self._create_final_report(query, report_plan, section_drafts)
783
+
784
+ self.logger.info(
785
+ "Deep research completed",
786
+ sections=len(section_drafts),
787
+ final_report_length=len(final_report),
788
+ )
789
+
790
+ return final_report
791
+
792
+ async def _run_with_graph(self, query: str) -> str:
793
+ """
794
+ Run the deep research flow using graph execution.
795
+
796
+ Args:
797
+ query: The research query
798
+
799
+ Returns:
800
+ Final report string
801
+ """
802
+ self.logger.info("Starting deep research (graph execution)", query=query[:100])
803
+
804
+ # Create graph orchestrator (lazy initialization)
805
+ if self._graph_orchestrator is None:
806
+ self._graph_orchestrator = create_graph_orchestrator(
807
+ mode="deep",
808
+ max_iterations=self.max_iterations,
809
+ max_time_minutes=self.max_time_minutes,
810
+ use_graph=True,
811
+ )
812
+
813
+ # Run orchestrator and collect events
814
+ final_report = ""
815
+ async for event in self._graph_orchestrator.run(query):
816
+ if event.type == "complete":
817
+ final_report = event.message
818
+ break
819
+ elif event.type == "error":
820
+ self.logger.error("Graph execution error", error=event.message)
821
+ raise RuntimeError(f"Graph execution failed: {event.message}")
822
+
823
+ if not final_report:
824
+ self.logger.warning("No complete event received from graph orchestrator")
825
+ final_report = "Research completed but no report was generated."
826
+
827
+ self.logger.info("Deep research completed (graph execution)")
828
+
829
+ return final_report
830
+
831
+ async def _build_report_plan(self, query: str) -> ReportPlan:
832
+ """Build the initial report plan."""
833
+ self.logger.info("Building report plan")
834
+
835
+ # Build input prompt for token estimation
836
+ input_prompt = f"QUERY: {query}"
837
+
838
+ report_plan = await self.planner_agent.run(query)
839
+
840
+ # Track tokens for planner agent
841
+ if not self.use_graph and hasattr(self, "budget_tracker"):
842
+ plan_text = (
843
+ f"title={report_plan.report_title}, sections={len(report_plan.report_outline)}"
844
+ )
845
+ estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(input_prompt, plan_text)
846
+ self.budget_tracker.add_tokens(self.loop_id, estimated_tokens)
847
+ self.logger.debug(
848
+ "Tokens tracked for planner agent",
849
+ tokens=estimated_tokens,
850
+ )
851
+
852
+ self.logger.info(
853
+ "Report plan created",
854
+ sections=len(report_plan.report_outline),
855
+ has_background=bool(report_plan.background_context),
856
+ )
857
+
858
+ return report_plan
859
+
860
+ async def _run_research_loops(self, report_plan: ReportPlan) -> list[str]:
861
+ """Run parallel iterative research loops for each section."""
862
+ self.logger.info("Running research loops", sections=len(report_plan.report_outline))
863
+
864
+ # Create workflow manager for parallel execution
865
+ workflow_manager = WorkflowManager()
866
+
867
+ # Create loop configurations
868
+ loop_configs = [
869
+ {
870
+ "loop_id": f"section_{i}",
871
+ "query": section.key_question,
872
+ "section_title": section.title,
873
+ "background_context": report_plan.background_context,
874
+ }
875
+ for i, section in enumerate(report_plan.report_outline)
876
+ ]
877
+
878
+ async def run_research_for_section(config: dict[str, Any]) -> str:
879
+ """Run iterative research for a single section."""
880
+ loop_id = config.get("loop_id", "unknown")
881
+ query = config.get("query", "")
882
+ background_context = config.get("background_context", "")
883
+
884
+ try:
885
+ # Update loop status
886
+ await workflow_manager.update_loop_status(loop_id, "running")
887
+
888
+ # Create iterative research flow
889
+ flow = IterativeResearchFlow(
890
+ max_iterations=self.max_iterations,
891
+ max_time_minutes=self.max_time_minutes,
892
+ verbose=self.verbose,
893
+ use_graph=self.use_graph,
894
+ judge_handler=self.judge_handler if not self.use_graph else None,
895
+ )
896
+
897
+ # Run research
898
+ result = await flow.run(
899
+ query=query,
900
+ background_context=background_context,
901
+ )
902
+
903
+ # Sync evidence from flow to loop
904
+ state = get_workflow_state()
905
+ if state.evidence:
906
+ await workflow_manager.add_loop_evidence(loop_id, state.evidence)
907
+
908
+ # Update loop status
909
+ await workflow_manager.update_loop_status(loop_id, "completed")
910
+
911
+ return result
912
+
913
+ except Exception as e:
914
+ error_msg = str(e)
915
+ await workflow_manager.update_loop_status(loop_id, "failed", error=error_msg)
916
+ self.logger.error(
917
+ "Section research failed",
918
+ loop_id=loop_id,
919
+ error=error_msg,
920
+ )
921
+ raise
922
+
923
+ # Run all sections in parallel using workflow manager
924
+ section_drafts = await workflow_manager.run_loops_parallel(
925
+ loop_configs=loop_configs,
926
+ loop_func=run_research_for_section,
927
+ judge_handler=self.judge_handler if not self.use_graph else None,
928
+ budget_tracker=self.budget_tracker if not self.use_graph else None,
929
+ )
930
+
931
+ # Sync evidence from all loops to global state
932
+ for config in loop_configs:
933
+ loop_id = config.get("loop_id")
934
+ if loop_id:
935
+ await workflow_manager.sync_loop_evidence_to_state(loop_id)
936
+
937
+ # Filter out None results (failed loops)
938
+ section_drafts = [draft for draft in section_drafts if draft is not None]
939
+
940
+ self.logger.info(
941
+ "Research loops completed",
942
+ drafts=len(section_drafts),
943
+ total_sections=len(report_plan.report_outline),
944
+ )
945
+
946
+ return section_drafts
947
+
948
+ async def _create_final_report(
949
+ self, query: str, report_plan: ReportPlan, section_drafts: list[str]
950
+ ) -> str:
951
+ """Create final report from section drafts."""
952
+ self.logger.info("Creating final report")
953
+
954
+ # Create ReportDraft from section drafts
955
+ report_draft = ReportDraft(
956
+ sections=[
957
+ ReportDraftSection(
958
+ section_title=section.title,
959
+ section_content=draft,
960
+ )
961
+ for section, draft in zip(report_plan.report_outline, section_drafts, strict=False)
962
+ ]
963
+ )
964
+
965
+ # Build input prompt for token estimation
966
+ draft_text = "\n".join(
967
+ [s.section_content[:500] for s in report_draft.sections[:5]]
968
+ ) # Sample
969
+ input_prompt = f"QUERY: {query}\nTITLE: {report_plan.report_title}\nDRAFT: {draft_text}"
970
+
971
+ if self.use_long_writer:
972
+ # Use long writer agent
973
+ final_report = await self.long_writer_agent.write_report(
974
+ original_query=query,
975
+ report_title=report_plan.report_title,
976
+ report_draft=report_draft,
977
+ )
978
+ else:
979
+ # Use proofreader agent
980
+ final_report = await self.proofreader_agent.proofread(
981
+ query=query,
982
+ report_draft=report_draft,
983
+ )
984
+
985
+ # Track tokens for final report synthesis
986
+ if not self.use_graph and hasattr(self, "budget_tracker"):
987
+ estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
988
+ input_prompt, final_report
989
+ )
990
+ self.budget_tracker.add_tokens(self.loop_id, estimated_tokens)
991
+ self.logger.debug(
992
+ "Tokens tracked for final report synthesis",
993
+ tokens=estimated_tokens,
994
+ agent="long_writer" if self.use_long_writer else "proofreader",
995
+ )
996
+
997
+ self.logger.info("Final report created", length=len(final_report))
998
+
999
+ return final_report