Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

Tonic commited on 12 days ago

Commit

40aa8de

unverified ·

2 Parent(s): 898cd37 3ab54ea

Merge pull request #1 from Josephrp/feature/iterative-deep-research-workflows

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.cursorrules +240 -0
.env.example +3 -3
.github/README.md +203 -0
.github/workflows/ci.yml +47 -14
.gitignore +3 -0
.pre-commit-config.yaml +44 -1
.pre-commit-hooks/run_pytest.ps1 +14 -0
.pre-commit-hooks/run_pytest.sh +15 -0
AGENTS.md +0 -118
AGENTS.txt +236 -0
CLAUDE.md +0 -111
CONTRIBUTING.md +1 -0
GEMINI.md +0 -98
Makefile +9 -3
README.md +98 -18
docs/CONFIGURATION.md +301 -0
docs/architecture/graph_orchestration.md +151 -0
docs/examples/writer_agents_usage.md +425 -0
docs/implementation/02_phase_search.md +31 -19
examples/rate_limiting_demo.py +1 -1
main.py +0 -6
pyproject.toml +30 -1
requirements.txt +2 -0
src/agent_factory/agents.py +339 -0
src/agent_factory/graph_builder.py +608 -0
src/agent_factory/judges.py +21 -6
src/agents/code_executor_agent.py +6 -8
src/agents/input_parser.py +178 -0
src/agents/judge_agent.py +1 -1
src/agents/knowledge_gap.py +156 -0
src/agents/long_writer.py +431 -0
src/agents/magentic_agents.py +19 -26
src/agents/proofreader.py +205 -0
src/agents/retrieval_agent.py +8 -9
src/agents/search_agent.py +1 -1
src/agents/state.py +27 -5
src/agents/thinking.py +148 -0
src/agents/tool_selector.py +168 -0
src/agents/writer.py +209 -0
src/app.py +28 -18
src/{orchestrator.py → legacy_orchestrator.py} +0 -0
src/middleware/__init__.py +30 -1
src/middleware/budget_tracker.py +390 -0
src/middleware/state_machine.py +129 -0
src/middleware/sub_iteration.py +1 -2
src/middleware/workflow_manager.py +322 -0
src/orchestrator/__init__.py +48 -0
src/orchestrator/graph_orchestrator.py +974 -0
src/orchestrator/planner_agent.py +184 -0
src/orchestrator/research_flow.py +999 -0

.cursorrules ADDED Viewed

	@@ -0,0 +1,240 @@

+# DeepCritical Project - Cursor Rules
+## Project-Wide Rules
+**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
+**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
+**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
+**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
+**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
+**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
+**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
+**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
+**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
+**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
+**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
+---
+## src/agents/ - Agent Implementation Rules
+**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
+**Agent Structure**:
+- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
+- Agent class with `__init__(model: Any | None = None)`
+- Main method (e.g., `async def evaluate()`, `async def write_report()`)
+- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
+**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
+**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
+**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
+**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
+**Agent-Specific Rules**:
+- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
+- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
+- `writer.py`: Returns markdown string. Includes citations in numbered format.
+- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
+- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
+- `thinking.py`: Returns observation string from conversation history.
+- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
+---
+## src/tools/ - Search Tool Rules
+**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
+**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
+**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
+**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
+**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
+**Tool-Specific Rules**:
+- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
+- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
+- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
+- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
+- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
+---
+## src/middleware/ - Middleware Rules
+**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
+**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
+**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
+**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
+**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
+---
+## src/orchestrator/ - Orchestration Rules
+**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
+**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
+**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
+**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
+**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
+**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
+---
+## src/services/ - Service Rules
+**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
+**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
+**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
+**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
+---
+## src/utils/ - Utility Rules
+**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
+**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
+**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
+**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
+**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
+---
+## src/orchestrator_factory.py Rules
+**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
+**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
+**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
+**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
+**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
+---
+## src/orchestrator_hierarchical.py Rules
+**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
+**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
+**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
+**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
+**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
+---
+## src/orchestrator_magentic.py Rules
+**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
+**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
+**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
+**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
+**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
+**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
+**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
+---
+## src/agent_factory/ - Factory Rules
+**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
+**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
+**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
+**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
+**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
+---
+## src/prompts/ - Prompt Rules
+**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
+**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
+**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
+**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
+---
+## Testing Rules
+**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
+**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
+**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
+**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
+---
+## File-Specific Agent Rules
+**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
+**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
+**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
+**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
+**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
+**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
+**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

.env.example CHANGED Viewed

@@ -7,9 +7,9 @@ LLM_PROVIDER=openai
 OPENAI_API_KEY=sk-your-key-here
 ANTHROPIC_API_KEY=sk-ant-your-key-here
-# Model names (optional - sensible defaults)
-ANTHROPIC_MODEL=claude-3-5-sonnet-20240620
-OPENAI_MODEL=gpt-4-turbo
 # ============== EMBEDDINGS ==============

 OPENAI_API_KEY=sk-your-key-here
 ANTHROPIC_API_KEY=sk-ant-your-key-here
+# Model names (optional - sensible defaults set in config.py)
+# ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
+# OPENAI_MODEL=gpt-5.1
 # ============== EMBEDDINGS ==============

.github/README.md ADDED Viewed

	@@ -0,0 +1,203 @@

+---
+title: DeepCritical
+emoji: 🧬
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: "6.0.1"
+python_version: "3.11"
+app_file: src/app.py
+pinned: false
+license: mit
+tags:
+  - mcp-in-action-track-enterprise
+  - mcp-hackathon
+  - drug-repurposing
+  - biomedical-ai
+  - pydantic-ai
+  - llamaindex
+  - modal
+---
+# DeepCritical
+## Intro
+## Features
+- **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
+- **MCP Integration**: Use our tools from Claude Desktop or any MCP client
+- **Modal Sandbox**: Secure execution of AI-generated statistical code
+- **LlamaIndex RAG**: Semantic search and evidence synthesis
+- **HuggingfaceInference**:
+- **HuggingfaceMCP Custom Config To Use Community Tools**:
+- **Strongly Typed Composable Graphs**:
+- **Specialized Research Teams of Agents**:
+## Quick Start
+### 1. Environment Setup
+```bash
+# Install uv if you haven't already
+pip install uv
+# Sync dependencies
+uv sync
+```
+### 2. Run the UI
+```bash
+# Start the Gradio app
+uv run gradio run src/app.py
+```
+Open your browser to `http://localhost:7860`.
+### 3. Connect via MCP
+This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
+**MCP Server URL**: `http://localhost:7860/gradio_api/mcp/`
+**Claude Desktop Configuration**:
+Add this to your `claude_desktop_config.json`:
+```json
+{
+  "mcpServers": {
+    "deepcritical": {
+      "url": "http://localhost:7860/gradio_api/mcp/"
+    }
+  }
+}
+```
+**Available Tools**:
+- `search_pubmed`: Search peer-reviewed biomedical literature.
+- `search_clinical_trials`: Search ClinicalTrials.gov.
+- `search_biorxiv`: Search bioRxiv/medRxiv preprints.
+- `search_all`: Search all sources simultaneously.
+- `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
+## Deep Research Flows
+- iterativeResearch
+- deepResearch
+- researchTeam
+### Iterative Research
+sequenceDiagram
+    participant IterativeFlow
+    participant ThinkingAgent
+    participant KnowledgeGapAgent
+    participant ToolSelector
+    participant ToolExecutor
+    participant JudgeHandler
+    participant WriterAgent
+    IterativeFlow->>IterativeFlow: run(query)
+    loop Until complete or max_iterations
+        IterativeFlow->>ThinkingAgent: generate_observations()
+        ThinkingAgent-->>IterativeFlow: observations
+        IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
+        KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
+        alt Research complete
+            IterativeFlow->>WriterAgent: create_final_report()
+            WriterAgent-->>IterativeFlow: final_report
+        else Gaps remain
+            IterativeFlow->>ToolSelector: select_agents(gap)
+            ToolSelector-->>IterativeFlow: AgentSelectionPlan
+            IterativeFlow->>ToolExecutor: execute_tool_tasks()
+            ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
+            IterativeFlow->>JudgeHandler: assess_evidence()
+            JudgeHandler-->>IterativeFlow: should_continue
+        end
+    end
+### Deep Research
+sequenceDiagram
+    actor User
+    participant GraphOrchestrator
+    participant InputParser
+    participant GraphBuilder
+    participant GraphExecutor
+    participant Agent
+    participant BudgetTracker
+    participant WorkflowState
+    User->>GraphOrchestrator: run(query)
+    GraphOrchestrator->>InputParser: detect_research_mode(query)
+    InputParser-->>GraphOrchestrator: mode (iterative/deep)
+    GraphOrchestrator->>GraphBuilder: build_graph(mode)
+    GraphBuilder-->>GraphOrchestrator: ResearchGraph
+    GraphOrchestrator->>WorkflowState: init_workflow_state()
+    GraphOrchestrator->>BudgetTracker: create_budget()
+    GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
+    loop For each node in graph
+        GraphExecutor->>Agent: execute_node(agent_node)
+        Agent->>Agent: process_input
+        Agent-->>GraphExecutor: result
+        GraphExecutor->>WorkflowState: update_state(result)
+        GraphExecutor->>BudgetTracker: add_tokens(used)
+        GraphExecutor->>BudgetTracker: check_budget()
+        alt Budget exceeded
+            GraphExecutor->>GraphOrchestrator: emit(error_event)
+        else Continue
+            GraphExecutor->>GraphOrchestrator: emit(progress_event)
+        end
+    end
+    GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
+### Research Team
+Critical Deep Research Agent
+## Development
+### Run Tests
+```bash
+uv run pytest
+```
+### Run Checks
+```bash
+make check
+```
+## Architecture
+DeepCritical uses a Vertical Slice Architecture:
+1.  **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
+2.  **Judge Slice**: Evaluating evidence quality using LLMs.
+3.  **Orchestrator Slice**: Managing the research loop and UI.
+Built with:
+- **PydanticAI**: For robust agent interactions.
+- **Gradio**: For the streaming user interface.
+- **PubMed, ClinicalTrials.gov, bioRxiv**: For biomedical data.
+- **MCP**: For universal tool access.
+- **Modal**: For secure code execution.
+## Team
+- The-Obstacle-Is-The-Way
+- MarioAderman
+- Josephrp
+## Links
+- [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)

.github/workflows/ci.yml CHANGED Viewed

@@ -2,33 +2,66 @@ name: CI
 on:
   push:
-    branches: [main, dev]
   pull_request:
-    branches: [main, dev]
 jobs:
-  check:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v4
-      - name: Install uv
-        uses: astral-sh/setup-uv@v4
         with:
-          version: "latest"
-      - name: Set up Python 3.11
-        run: uv python install 3.11
       - name: Install dependencies
-        run: uv sync --all-extras
       - name: Lint with ruff
-        run: uv run ruff check src tests
       - name: Type check with mypy
-        run: uv run mypy src
-      - name: Run tests
-        run: uv run pytest tests/unit/ -v

 on:
   push:
+    branches: [main, develop]
   pull_request:
+    branches: [main, develop]
 jobs:
+  test:
     runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.11"]
     steps:
       - uses: actions/checkout@v4
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
         with:
+          python-version: ${{ matrix.python-version }}
       - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
       - name: Lint with ruff
+        run: |
+          ruff check . --exclude tests
+          ruff format --check . --exclude tests
       - name: Type check with mypy
+        run: |
+          mypy src
+      - name: Install embedding dependencies
+        run: |
+          pip install -e ".[embeddings]"
+      - name: Run unit tests (excluding OpenAI and embedding providers)
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
+      - name: Run local embeddings tests
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire || true
+        continue-on-error: true  # Allow failures if dependencies not available
+      - name: Run HuggingFace integration tests
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire || true
+        continue-on-error: true  # Allow failures if HF_TOKEN not set
+      - name: Run non-OpenAI integration tests (excluding embedding providers)
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          pytest tests/integration/ -v -m "integration and not openai and not embedding_provider" --tb=short -p no:logfire || true
+        continue-on-error: true  # Allow failures if dependencies not available

.gitignore CHANGED Viewed

@@ -1,3 +1,6 @@
 # Python
 __pycache__/
 *.py[cod]

+folder/
+.cursor/
+.ruff_cache/
 # Python
 __pycache__/
 *.py[cod]

.pre-commit-config.yaml CHANGED Viewed

@@ -3,9 +3,10 @@ repos:
     rev: v0.4.4
     hooks:
       - id: ruff
-        args: [--fix]
         exclude: ^reference_repos/
       - id: ruff-format
         exclude: ^reference_repos/
   - repo: https://github.com/pre-commit/mirrors-mypy
@@ -13,9 +14,51 @@ repos:
     hooks:
       - id: mypy
         files: ^src/
         additional_dependencies:
           - pydantic>=2.7
           - pydantic-settings>=2.2
           - tenacity>=8.2
           - pydantic-ai>=0.0.16
         args: [--ignore-missing-imports]

     rev: v0.4.4
     hooks:
       - id: ruff
+        args: [--fix, --exclude, tests]
         exclude: ^reference_repos/
       - id: ruff-format
+        args: [--exclude, tests]
         exclude: ^reference_repos/
   - repo: https://github.com/pre-commit/mirrors-mypy
     hooks:
       - id: mypy
         files: ^src/
+        exclude: ^folder
         additional_dependencies:
           - pydantic>=2.7
           - pydantic-settings>=2.2
           - tenacity>=8.2
           - pydantic-ai>=0.0.16
         args: [--ignore-missing-imports]
+  - repo: local
+    hooks:
+      - id: pytest-unit
+        name: pytest unit tests (no OpenAI)
+        entry: uv
+        language: system
+        types: [python]
+        args: [
+          "run",
+          "pytest",
+          "tests/unit/",
+          "-v",
+          "-m",
+          "not openai and not embedding_provider",
+          "--tb=short",
+          "-p",
+          "no:logfire",
+        ]
+        pass_filenames: false
+        always_run: true
+        require_serial: false
+      - id: pytest-local-embeddings
+        name: pytest local embeddings tests
+        entry: uv
+        language: system
+        types: [python]
+        args: [
+          "run",
+          "pytest",
+          "tests/",
+          "-v",
+          "-m",
+          "local_embeddings",
+          "--tb=short",
+          "-p",
+          "no:logfire",
+        ]
+        pass_filenames: false
+        always_run: true
+        require_serial: false

.pre-commit-hooks/run_pytest.ps1 ADDED Viewed

	@@ -0,0 +1,14 @@

+# PowerShell pytest runner for pre-commit (Windows)
+# Uses uv if available, otherwise falls back to pytest
+if (Get-Command uv -ErrorAction SilentlyContinue) {
+    uv run pytest $args
+} else {
+    Write-Warning "uv not found, using system pytest (may have missing dependencies)"
+    pytest $args
+}

.pre-commit-hooks/run_pytest.sh ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/bin/bash
+# Cross-platform pytest runner for pre-commit
+# Uses uv if available, otherwise falls back to pytest
+if command -v uv >/dev/null 2>&1; then
+    uv run pytest "$@"
+else
+    echo "Warning: uv not found, using system pytest (may have missing dependencies)"
+    pytest "$@"
+fi

AGENTS.md DELETED Viewed

@@ -1,118 +0,0 @@
-# AGENTS.md
-This file provides guidance to AI agents when working with code in this repository.
-## Project Overview
-DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
-**Current Status:** Phases 1-13 COMPLETE (Foundation through Modal sandbox integration).
-## Development Commands
-```bash
-# Install all dependencies (including dev)
-make install   # or: uv sync --all-extras && uv run pre-commit install
-# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
-make check
-# Individual commands
-make test        # uv run pytest tests/unit/ -v
-make lint        # uv run ruff check src tests
-make format      # uv run ruff format src tests
-make typecheck   # uv run mypy src
-make test-cov    # uv run pytest --cov=src --cov-report=term-missing
-# Run single test
-uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
-# Integration tests (real APIs)
-uv run pytest -m integration
-```
-## Architecture
-**Pattern**: Search-and-judge loop with multi-tool orchestration.
-```text
-User Question → Orchestrator
-    ↓
-Search Loop:
-  1. Query PubMed, ClinicalTrials.gov, bioRxiv
-  2. Gather evidence
-  3. Judge quality ("Do we have enough?")
-  4. If NO → Refine query, search more
-  5. If YES → Synthesize findings (+ optional Modal analysis)
-    ↓
-Research Report with Citations
-```
-**Key Components**:
-- `src/orchestrator.py` - Main agent loop
-- `src/tools/pubmed.py` - PubMed E-utilities search
-- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
-- `src/tools/biorxiv.py` - bioRxiv/medRxiv preprint search
-- `src/tools/code_execution.py` - Modal sandbox execution
-- `src/tools/search_handler.py` - Scatter-gather orchestration
-- `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
-- `src/services/statistical_analyzer.py` - Statistical analysis via Modal
-- `src/agent_factory/judges.py` - LLM-based evidence assessment
-- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
-- `src/mcp_tools.py` - MCP tool wrappers for Claude Desktop
-- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
-- `src/utils/models.py` - Evidence, Citation, SearchResult models
-- `src/utils/exceptions.py` - Exception hierarchy
-- `src/app.py` - Gradio UI with MCP server (HuggingFace Spaces)
-**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
-## Configuration
-Settings via pydantic-settings from `.env`:
-- `LLM_PROVIDER`: "openai" or "anthropic"
-- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
-- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
-- `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
-- `MAX_ITERATIONS`: 1-50, default 10
-- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
-## Exception Hierarchy
-```text
-DeepCriticalError (base)
-├── SearchError
-│   └── RateLimitError
-├── JudgeError
-└── ConfigurationError
-```
-## Testing
-- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
-- **Markers**: `unit`, `integration`, `slow`
-- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
-- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
-## Coding Standards
-- Python 3.11+, strict mypy, ruff (100-char lines)
-- Type all functions, use Pydantic models for data
-- Use `structlog` for logging, not print
-- Conventional commits: `feat(scope):`, `fix:`, `docs:`
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Remote `origin`: GitHub (source of truth for PRs/code review)
-- Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
-**HuggingFace Spaces Collaboration:**
-- Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
-- **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
-- GitHub is the source of truth; HuggingFace is for deployment/demo
-- Consider using git hooks to prevent accidental pushes to protected branches

AGENTS.txt ADDED Viewed

	@@ -0,0 +1,236 @@

+# DeepCritical Project - Rules
+## Project-Wide Rules
+**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
+**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
+**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
+**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
+**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
+**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
+**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
+**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
+**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
+**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
+**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
+---
+## src/agents/ - Agent Implementation Rules
+**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
+**Agent Structure**:
+- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
+- Agent class with `__init__(model: Any | None = None)`
+- Main method (e.g., `async def evaluate()`, `async def write_report()`)
+- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
+**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
+**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
+**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
+**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
+**Agent-Specific Rules**:
+- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
+- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
+- `writer.py`: Returns markdown string. Includes citations in numbered format.
+- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
+- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
+- `thinking.py`: Returns observation string from conversation history.
+- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
+---
+## src/tools/ - Search Tool Rules
+**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
+**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
+**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
+**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
+**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
+**Tool-Specific Rules**:
+- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
+- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
+- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
+- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
+- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
+---
+## src/middleware/ - Middleware Rules
+**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
+**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
+**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
+**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
+**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
+---
+## src/orchestrator/ - Orchestration Rules
+**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
+**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
+**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
+**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
+**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
+**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
+---
+## src/services/ - Service Rules
+**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
+**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
+**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
+**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
+---
+## src/utils/ - Utility Rules
+**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
+**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
+**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
+**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
+**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
+---
+## src/orchestrator_factory.py Rules
+**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
+**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
+**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
+**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
+**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
+---
+## src/orchestrator_hierarchical.py Rules
+**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
+**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
+**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
+**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
+**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
+---
+## src/orchestrator_magentic.py Rules
+**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
+**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
+**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
+**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
+**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
+**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
+**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
+---
+## src/agent_factory/ - Factory Rules
+**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
+**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
+**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
+**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
+**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
+---
+## src/prompts/ - Prompt Rules
+**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
+**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
+**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
+**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
+---
+## Testing Rules
+**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
+**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
+**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
+**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
+---
+## File-Specific Agent Rules
+**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
+**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
+**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
+**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
+**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
+**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
+**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

CLAUDE.md DELETED Viewed

@@ -1,111 +0,0 @@
-# CLAUDE.md
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-## Project Overview
-DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
-**Current Status:** Phases 1-13 COMPLETE (Foundation through Modal sandbox integration).
-## Development Commands
-```bash
-# Install all dependencies (including dev)
-make install   # or: uv sync --all-extras && uv run pre-commit install
-# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
-make check
-# Individual commands
-make test        # uv run pytest tests/unit/ -v
-make lint        # uv run ruff check src tests
-make format      # uv run ruff format src tests
-make typecheck   # uv run mypy src
-make test-cov    # uv run pytest --cov=src --cov-report=term-missing
-# Run single test
-uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
-# Integration tests (real APIs)
-uv run pytest -m integration
-```
-## Architecture
-**Pattern**: Search-and-judge loop with multi-tool orchestration.
-```text
-User Question → Orchestrator
-    ↓
-Search Loop:
-  1. Query PubMed, ClinicalTrials.gov, bioRxiv
-  2. Gather evidence
-  3. Judge quality ("Do we have enough?")
-  4. If NO → Refine query, search more
-  5. If YES → Synthesize findings (+ optional Modal analysis)
-    ↓
-Research Report with Citations
-```
-**Key Components**:
-- `src/orchestrator.py` - Main agent loop
-- `src/tools/pubmed.py` - PubMed E-utilities search
-- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
-- `src/tools/biorxiv.py` - bioRxiv/medRxiv preprint search
-- `src/tools/code_execution.py` - Modal sandbox execution
-- `src/tools/search_handler.py` - Scatter-gather orchestration
-- `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
-- `src/services/statistical_analyzer.py` - Statistical analysis via Modal
-- `src/agent_factory/judges.py` - LLM-based evidence assessment
-- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
-- `src/mcp_tools.py` - MCP tool wrappers for Claude Desktop
-- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
-- `src/utils/models.py` - Evidence, Citation, SearchResult models
-- `src/utils/exceptions.py` - Exception hierarchy
-- `src/app.py` - Gradio UI with MCP server (HuggingFace Spaces)
-**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
-## Configuration
-Settings via pydantic-settings from `.env`:
-- `LLM_PROVIDER`: "openai" or "anthropic"
-- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
-- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
-- `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
-- `MAX_ITERATIONS`: 1-50, default 10
-- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
-## Exception Hierarchy
-```text
-DeepCriticalError (base)
-├── SearchError
-│   └── RateLimitError
-├── JudgeError
-└── ConfigurationError
-```
-## Testing
-- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
-- **Markers**: `unit`, `integration`, `slow`
-- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
-- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Remote `origin`: GitHub (source of truth for PRs/code review)
-- Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
-**HuggingFace Spaces Collaboration:**
-- Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
-- **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
-- GitHub is the source of truth; HuggingFace is for deployment/demo
-- Consider using git hooks to prevent accidental pushes to protected branches

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ make sure you run the full pre-commit checks before opening a PR (not draft) otherwise Obstacle is the Way will loose his mind

GEMINI.md DELETED Viewed

@@ -1,98 +0,0 @@
-# DeepCritical Context
-## Project Overview
-**DeepCritical** is an AI-native Medical Drug Repurposing Research Agent.
-**Goal:** To accelerate the discovery of new uses for existing drugs by intelligently searching biomedical literature (PubMed, ClinicalTrials.gov, bioRxiv), evaluating evidence, and hypothesizing potential applications.
-**Architecture:**
-The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orchestrator) and adheres to **Strict TDD** (Test-Driven Development).
-**Current Status:**
-- **Phases 1-9:** COMPLETE. Foundation, Search, Judge, UI, Orchestrator, Embeddings, Hypothesis, Report, Cleanup.
-- **Phases 10-11:** COMPLETE. ClinicalTrials.gov and bioRxiv integration.
-- **Phase 12:** COMPLETE. MCP Server integration (Gradio MCP at `/gradio_api/mcp/`).
-- **Phase 13:** COMPLETE. Modal sandbox for statistical analysis.
-## Tech Stack & Tooling
-- **Language:** Python 3.11 (Pinned)
-- **Package Manager:** `uv` (Rust-based, extremely fast)
-- **Frameworks:** `pydantic`, `pydantic-ai`, `httpx`, `gradio[mcp]`
-- **Vector DB:** `chromadb` with `sentence-transformers` for semantic search
-- **Code Execution:** `modal` for secure sandboxed Python execution
-- **Testing:** `pytest`, `pytest-asyncio`, `respx` (for mocking)
-- **Quality:** `ruff` (linting/formatting), `mypy` (strict type checking), `pre-commit`
-## Building & Running
-| Command | Description |
-| :--- | :--- |
-| `make install` | Install dependencies and pre-commit hooks. |
-| `make test` | Run unit tests. |
-| `make lint` | Run Ruff linter. |
-| `make format` | Run Ruff formatter. |
-| `make typecheck` | Run Mypy static type checker. |
-| `make check` | **The Golden Gate:** Runs lint, typecheck, and test. Must pass before committing. |
-| `make clean` | Clean up cache and artifacts. |
-## Directory Structure
-- `src/`: Source code
-  - `utils/`: Shared utilities (`config.py`, `exceptions.py`, `models.py`)
-  - `tools/`: Search tools (`pubmed.py`, `clinicaltrials.py`, `biorxiv.py`, `code_execution.py`)
-  - `services/`: Services (`embeddings.py`, `statistical_analyzer.py`)
-  - `agents/`: Magentic multi-agent mode agents
-  - `agent_factory/`: Agent definitions (judges, prompts)
-  - `mcp_tools.py`: MCP tool wrappers for Claude Desktop integration
-  - `app.py`: Gradio UI with MCP server
-- `tests/`: Test suite
-  - `unit/`: Isolated unit tests (Mocked)
-  - `integration/`: Real API tests (Marked as slow/integration)
-- `docs/`: Documentation and Implementation Specs
-- `examples/`: Working demos for each phase
-## Key Components
-- `src/orchestrator.py` - Main agent loop
-- `src/tools/pubmed.py` - PubMed E-utilities search
-- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
-- `src/tools/biorxiv.py` - bioRxiv/medRxiv preprint search
-- `src/tools/code_execution.py` - Modal sandbox execution
-- `src/services/statistical_analyzer.py` - Statistical analysis via Modal
-- `src/mcp_tools.py` - MCP tool wrappers
-- `src/app.py` - Gradio UI (HuggingFace Spaces) with MCP server
-## Configuration
-Settings via pydantic-settings from `.env`:
-- `LLM_PROVIDER`: "openai" or "anthropic"
-- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
-- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
-- `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
-- `MAX_ITERATIONS`: 1-50, default 10
-- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
-## Development Conventions
-1. **Strict TDD:** Write failing tests in `tests/unit/` *before* implementing logic in `src/`.
-2. **Type Safety:** All code must pass `mypy --strict`. Use Pydantic models for data exchange.
-3. **Linting:** Zero tolerance for Ruff errors.
-4. **Mocking:** Use `respx` or `unittest.mock` for all external API calls in unit tests.
-5. **Vertical Slices:** Implement features end-to-end rather than layer-by-layer.
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Remote `origin`: GitHub (source of truth for PRs/code review)
-- Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
-**HuggingFace Spaces Collaboration:**
-- Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
-- **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
-- GitHub is the source of truth; HuggingFace is for deployment/demo
-- Consider using git hooks to prevent accidental pushes to protected branches

Makefile CHANGED Viewed

@@ -8,15 +8,21 @@ install:
 	uv run pre-commit install
 test:
-	uv run pytest tests/unit/ -v
 # Coverage aliases
 cov: test-cov
 test-cov:
-	uv run pytest --cov=src --cov-report=term-missing
 cov-html:
-	uv run pytest --cov=src --cov-report=html
 	@echo "Coverage report: open htmlcov/index.html"
 lint:

 	uv run pre-commit install
 test:
+	uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
+test-hf:
+	uv run pytest tests/ -v -m "huggingface" -p no:logfire
+test-all:
+	uv run pytest tests/ -v -p no:logfire
 # Coverage aliases
 cov: test-cov
 test-cov:
+	uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
 cov-html:
+	uv run pytest --cov=src --cov-report=html -p no:logfire
 	@echo "Coverage report: open htmlcov/index.html"
 lint:

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ tags:
 # DeepCritical
-AI-Powered Drug Repurposing Research Agent
 ## Features
@@ -29,6 +29,10 @@ AI-Powered Drug Repurposing Research Agent
 - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
 - **Modal Sandbox**: Secure execution of AI-generated statistical code
 - **LlamaIndex RAG**: Semantic search and evidence synthesis
 ## Quick Start
@@ -46,7 +50,7 @@ uv sync
 ```bash
 # Start the Gradio app
-uv run python src/app.py
 ```
 Open your browser to `http://localhost:7860`.
@@ -76,6 +80,97 @@ Add this to your `claude_desktop_config.json`:
 - `search_all`: Search all sources simultaneously.
 - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
 ## Development
 ### Run Tests
@@ -90,22 +185,7 @@ uv run pytest
 make check
 ```
-## Architecture
-DeepCritical uses a Vertical Slice Architecture:
-1.  **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
-2.  **Judge Slice**: Evaluating evidence quality using LLMs.
-3.  **Orchestrator Slice**: Managing the research loop and UI.
-Built with:
-- **PydanticAI**: For robust agent interactions.
-- **Gradio**: For the streaming user interface.
-- **PubMed, ClinicalTrials.gov, bioRxiv**: For biomedical data.
-- **MCP**: For universal tool access.
-- **Modal**: For secure code execution.
-## Team
 - The-Obstacle-Is-The-Way
 - MarioAderman

 # DeepCritical
+## Intro
 ## Features
 - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
 - **Modal Sandbox**: Secure execution of AI-generated statistical code
 - **LlamaIndex RAG**: Semantic search and evidence synthesis
+- **HuggingfaceInference**:
+- **HuggingfaceMCP Custom Config To Use Community Tools**:
+- **Strongly Typed Composable Graphs**:
+- **Specialized Research Teams of Agents**:
 ## Quick Start
 ```bash
 # Start the Gradio app
+uv run gradio run src/app.py
 ```
 Open your browser to `http://localhost:7860`.
 - `search_all`: Search all sources simultaneously.
 - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
+## Architecture
+DeepCritical uses a Vertical Slice Architecture:
+1.  **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
+2.  **Judge Slice**: Evaluating evidence quality using LLMs.
+3.  **Orchestrator Slice**: Managing the research loop and UI.
+- iterativeResearch
+- deepResearch
+- researchTeam
+### Iterative Research
+sequenceDiagram
+    participant IterativeFlow
+    participant ThinkingAgent
+    participant KnowledgeGapAgent
+    participant ToolSelector
+    participant ToolExecutor
+    participant JudgeHandler
+    participant WriterAgent
+    IterativeFlow->>IterativeFlow: run(query)
+    loop Until complete or max_iterations
+        IterativeFlow->>ThinkingAgent: generate_observations()
+        ThinkingAgent-->>IterativeFlow: observations
+        IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
+        KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
+        alt Research complete
+            IterativeFlow->>WriterAgent: create_final_report()
+            WriterAgent-->>IterativeFlow: final_report
+        else Gaps remain
+            IterativeFlow->>ToolSelector: select_agents(gap)
+            ToolSelector-->>IterativeFlow: AgentSelectionPlan
+            IterativeFlow->>ToolExecutor: execute_tool_tasks()
+            ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
+            IterativeFlow->>JudgeHandler: assess_evidence()
+            JudgeHandler-->>IterativeFlow: should_continue
+        end
+    end
+### Deep Research
+sequenceDiagram
+    actor User
+    participant GraphOrchestrator
+    participant InputParser
+    participant GraphBuilder
+    participant GraphExecutor
+    participant Agent
+    participant BudgetTracker
+    participant WorkflowState
+    User->>GraphOrchestrator: run(query)
+    GraphOrchestrator->>InputParser: detect_research_mode(query)
+    InputParser-->>GraphOrchestrator: mode (iterative/deep)
+    GraphOrchestrator->>GraphBuilder: build_graph(mode)
+    GraphBuilder-->>GraphOrchestrator: ResearchGraph
+    GraphOrchestrator->>WorkflowState: init_workflow_state()
+    GraphOrchestrator->>BudgetTracker: create_budget()
+    GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
+    loop For each node in graph
+        GraphExecutor->>Agent: execute_node(agent_node)
+        Agent->>Agent: process_input
+        Agent-->>GraphExecutor: result
+        GraphExecutor->>WorkflowState: update_state(result)
+        GraphExecutor->>BudgetTracker: add_tokens(used)
+        GraphExecutor->>BudgetTracker: check_budget()
+        alt Budget exceeded
+            GraphExecutor->>GraphOrchestrator: emit(error_event)
+        else Continue
+            GraphExecutor->>GraphOrchestrator: emit(progress_event)
+        end
+    end
+    GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
+### Research Team
+Critical Deep Research Agent
 ## Development
 ### Run Tests
 make check
 ```
+## Join Us
 - The-Obstacle-Is-The-Way
 - MarioAderman

docs/CONFIGURATION.md ADDED Viewed

	@@ -0,0 +1,301 @@

+# Configuration Guide
+## Overview
+DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
+## Quick Start
+1. Copy the example environment file (if available) or create a `.env` file in the project root
+2. Set at least one LLM API key (`OPENAI_API_KEY` or `ANTHROPIC_API_KEY`)
+3. Optionally configure other services as needed
+## Configuration System
+### How It Works
+- **Settings Class**: `Settings` class in `src/utils/config.py` extends `BaseSettings` from `pydantic_settings`
+- **Environment File**: Automatically loads from `.env` file (if present)
+- **Environment Variables**: Reads from environment variables (case-insensitive)
+- **Type Safety**: Strongly-typed fields with validation
+- **Singleton Pattern**: Global `settings` instance for easy access
+### Usage
+```python
+from src.utils.config import settings
+# Check if API keys are available
+if settings.has_openai_key:
+    # Use OpenAI
+    pass
+# Access configuration values
+max_iterations = settings.max_iterations
+web_search_provider = settings.web_search_provider
+```
+## Required Configuration
+### At Least One LLM Provider
+You must configure at least one LLM provider:
+**OpenAI:**
+```bash
+LLM_PROVIDER=openai
+OPENAI_API_KEY=your_openai_api_key_here
+OPENAI_MODEL=gpt-5.1
+```
+**Anthropic:**
+```bash
+LLM_PROVIDER=anthropic
+ANTHROPIC_API_KEY=your_anthropic_api_key_here
+ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
+```
+## Optional Configuration
+### Embedding Configuration
+```bash
+# Embedding Provider: "openai", "local", or "huggingface"
+EMBEDDING_PROVIDER=local
+# OpenAI Embedding Model (used by LlamaIndex RAG)
+OPENAI_EMBEDDING_MODEL=text-embedding-3-small
+# Local Embedding Model (sentence-transformers)
+LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
+# HuggingFace Embedding Model
+HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
+```
+### HuggingFace Configuration
+```bash
+# HuggingFace API Token (for inference API)
+HUGGINGFACE_API_KEY=your_huggingface_api_key_here
+# Or use HF_TOKEN (alternative name)
+# Default HuggingFace Model ID
+HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
+```
+### Web Search Configuration
+```bash
+# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
+# Default: "duckduckgo" (no API key required)
+WEB_SEARCH_PROVIDER=duckduckgo
+# Serper API Key (for Google search via Serper)
+SERPER_API_KEY=your_serper_api_key_here
+# SearchXNG Host URL
+SEARCHXNG_HOST=http://localhost:8080
+# Brave Search API Key
+BRAVE_API_KEY=your_brave_api_key_here
+# Tavily API Key
+TAVILY_API_KEY=your_tavily_api_key_here
+```
+### PubMed Configuration
+```bash
+# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
+NCBI_API_KEY=your_ncbi_api_key_here
+```
+### Agent Configuration
+```bash
+# Maximum iterations per research loop
+MAX_ITERATIONS=10
+# Search timeout in seconds
+SEARCH_TIMEOUT=30
+# Use graph-based execution for research flows
+USE_GRAPH_EXECUTION=false
+```
+### Budget & Rate Limiting Configuration
+```bash
+# Default token budget per research loop
+DEFAULT_TOKEN_LIMIT=100000
+# Default time limit per research loop (minutes)
+DEFAULT_TIME_LIMIT_MINUTES=10
+# Default iterations limit per research loop
+DEFAULT_ITERATIONS_LIMIT=10
+```
+### RAG Service Configuration
+```bash
+# ChromaDB collection name for RAG
+RAG_COLLECTION_NAME=deepcritical_evidence
+# Number of top results to retrieve from RAG
+RAG_SIMILARITY_TOP_K=5
+# Automatically ingest evidence into RAG
+RAG_AUTO_INGEST=true
+```
+### ChromaDB Configuration
+```bash
+# ChromaDB storage path
+CHROMA_DB_PATH=./chroma_db
+# Whether to persist ChromaDB to disk
+CHROMA_DB_PERSIST=true
+# ChromaDB server host (for remote ChromaDB, optional)
+# CHROMA_DB_HOST=localhost
+# ChromaDB server port (for remote ChromaDB, optional)
+# CHROMA_DB_PORT=8000
+```
+### External Services
+```bash
+# Modal Token ID (for Modal sandbox execution)
+MODAL_TOKEN_ID=your_modal_token_id_here
+# Modal Token Secret
+MODAL_TOKEN_SECRET=your_modal_token_secret_here
+```
+### Logging Configuration
+```bash
+# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
+LOG_LEVEL=INFO
+```
+## Configuration Properties
+The `Settings` class provides helpful properties for checking configuration:
+```python
+from src.utils.config import settings
+# Check API key availability
+settings.has_openai_key          # bool
+settings.has_anthropic_key       # bool
+settings.has_huggingface_key     # bool
+settings.has_any_llm_key         # bool
+# Check service availability
+settings.modal_available         # bool
+settings.web_search_available    # bool
+```
+## Environment Variables Reference
+### Required (at least one LLM)
+- `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` - At least one LLM provider key
+### Optional LLM Providers
+- `DEEPSEEK_API_KEY` (Phase 2)
+- `OPENROUTER_API_KEY` (Phase 2)
+- `GEMINI_API_KEY` (Phase 2)
+- `PERPLEXITY_API_KEY` (Phase 2)
+- `HUGGINGFACE_API_KEY` or `HF_TOKEN`
+- `AZURE_OPENAI_ENDPOINT` (Phase 2)
+- `AZURE_OPENAI_DEPLOYMENT` (Phase 2)
+- `AZURE_OPENAI_API_KEY` (Phase 2)
+- `AZURE_OPENAI_API_VERSION` (Phase 2)
+- `LOCAL_MODEL_URL` (Phase 2)
+### Web Search
+- `WEB_SEARCH_PROVIDER` (default: "duckduckgo")
+- `SERPER_API_KEY`
+- `SEARCHXNG_HOST`
+- `BRAVE_API_KEY`
+- `TAVILY_API_KEY`
+### Embeddings
+- `EMBEDDING_PROVIDER` (default: "local")
+- `HUGGINGFACE_EMBEDDING_MODEL` (optional)
+### RAG
+- `RAG_COLLECTION_NAME` (default: "deepcritical_evidence")
+- `RAG_SIMILARITY_TOP_K` (default: 5)
+- `RAG_AUTO_INGEST` (default: true)
+### ChromaDB
+- `CHROMA_DB_PATH` (default: "./chroma_db")
+- `CHROMA_DB_PERSIST` (default: true)
+- `CHROMA_DB_HOST` (optional)
+- `CHROMA_DB_PORT` (optional)
+### Budget
+- `DEFAULT_TOKEN_LIMIT` (default: 100000)
+- `DEFAULT_TIME_LIMIT_MINUTES` (default: 10)
+- `DEFAULT_ITERATIONS_LIMIT` (default: 10)
+### Other
+- `LLM_PROVIDER` (default: "openai")
+- `NCBI_API_KEY` (optional)
+- `MODAL_TOKEN_ID` (optional)
+- `MODAL_TOKEN_SECRET` (optional)
+- `MAX_ITERATIONS` (default: 10)
+- `LOG_LEVEL` (default: "INFO")
+- `USE_GRAPH_EXECUTION` (default: false)
+## Validation
+Settings are validated on load using Pydantic validation:
+- **Type checking**: All fields are strongly typed
+- **Range validation**: Numeric fields have min/max constraints
+- **Literal validation**: Enum fields only accept specific values
+- **Required fields**: API keys are checked when accessed via `get_api_key()`
+## Error Handling
+Configuration errors raise `ConfigurationError`:
+```python
+from src.utils.config import settings
+from src.utils.exceptions import ConfigurationError
+try:
+    api_key = settings.get_api_key()
+except ConfigurationError as e:
+    print(f"Configuration error: {e}")
+```
+## Future Enhancements (Phase 2)
+The following configurations are planned for Phase 2:
+1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
+2. **Model Selection**: Reasoning/main/fast model configuration
+3. **Service Integration**: Migrate `folder/llm_config.py` to centralized config
+See `CONFIGURATION_ANALYSIS.md` for the complete implementation plan.

docs/architecture/graph_orchestration.md ADDED Viewed

	@@ -0,0 +1,151 @@

+# Graph Orchestration Architecture
+## Overview
+Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
+## Graph Structure
+### Nodes
+Graph nodes represent different stages in the research workflow:
+1. **Agent Nodes**: Execute Pydantic AI agents
+   - Input: Prompt/query
+   - Output: Structured or unstructured response
+   - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
+2. **State Nodes**: Update or read workflow state
+   - Input: Current state
+   - Output: Updated state
+   - Examples: Update evidence, update conversation history
+3. **Decision Nodes**: Make routing decisions based on conditions
+   - Input: Current state/results
+   - Output: Next node ID
+   - Examples: Continue research vs. complete research
+4. **Parallel Nodes**: Execute multiple nodes concurrently
+   - Input: List of node IDs
+   - Output: Aggregated results
+   - Examples: Parallel iterative research loops
+### Edges
+Edges define transitions between nodes:
+1. **Sequential Edges**: Always traversed (no condition)
+   - From: Source node
+   - To: Target node
+   - Condition: None (always True)
+2. **Conditional Edges**: Traversed based on condition
+   - From: Source node
+   - To: Target node
+   - Condition: Callable that returns bool
+   - Example: If research complete → go to writer, else → continue loop
+3. **Parallel Edges**: Used for parallel execution branches
+   - From: Parallel node
+   - To: Multiple target nodes
+   - Execution: All targets run concurrently
+## Graph Patterns
+### Iterative Research Graph
+```
+[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
+                                              ↓ No          ↓ Yes
+                                    [Tool Selector]    [Writer]
+                                              ↓
+                                    [Execute Tools] → [Loop Back]
+```
+### Deep Research Graph
+```
+[Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
+                           ↓         ↓         ↓
+                        [Loop1]  [Loop2]  [Loop3]
+```
+## State Management
+State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
+- **Evidence**: Collected evidence from searches
+- **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
+- **Embedding Service**: For semantic search
+State transitions occur at state nodes, which update the global workflow state.
+## Execution Flow
+1. **Graph Construction**: Build graph from nodes and edges
+2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
+3. **Graph Execution**: Traverse graph from entry node
+4. **Node Execution**: Execute each node based on type
+5. **Edge Evaluation**: Determine next node(s) based on edges
+6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
+7. **State Updates**: Update state at state nodes
+8. **Event Streaming**: Yield events during execution for UI
+## Conditional Routing
+Decision nodes evaluate conditions and return next node IDs:
+- **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
+- **Budget Decision**: If budget exceeded → exit, else → continue
+- **Iteration Decision**: If max iterations → exit, else → continue
+## Parallel Execution
+Parallel nodes execute multiple nodes concurrently:
+- Each parallel branch runs independently
+- Results are aggregated after all branches complete
+- State is synchronized after parallel execution
+- Errors in one branch don't stop other branches
+## Budget Enforcement
+Budget constraints are enforced at decision nodes:
+- **Token Budget**: Track LLM token usage
+- **Time Budget**: Track elapsed time
+- **Iteration Budget**: Track iteration count
+If any budget is exceeded, execution routes to exit node.
+## Error Handling
+Errors are handled at multiple levels:
+1. **Node Level**: Catch errors in individual node execution
+2. **Graph Level**: Handle errors during graph traversal
+3. **State Level**: Rollback state changes on error
+Errors are logged and yield error events for UI.
+## Backward Compatibility
+Graph execution is optional via feature flag:
+- `USE_GRAPH_EXECUTION=true`: Use graph-based execution
+- `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
+This allows gradual migration and fallback if needed.

docs/examples/writer_agents_usage.md ADDED Viewed

	@@ -0,0 +1,425 @@

+# Writer Agents Usage Examples
+This document provides examples of how to use the writer agents in DeepCritical for generating research reports.
+## Overview
+DeepCritical provides three writer agents for different report generation scenarios:
+1. **WriterAgent** - Basic writer for simple reports from findings
+2. **LongWriterAgent** - Iterative writer for long-form multi-section reports
+3. **ProofreaderAgent** - Finalizes and polishes report drafts
+## WriterAgent
+The `WriterAgent` generates final reports from research findings. It's used in iterative research flows.
+### Basic Usage
+```python
+from src.agent_factory.agents import create_writer_agent
+# Create writer agent
+writer = create_writer_agent()
+# Generate report
+query = "What is the capital of France?"
+findings = """
+Paris is the capital of France [1].
+It is located in the north-central part of the country [2].
+[1] https://example.com/france-info
+[2] https://example.com/paris-info
+"""
+report = await writer.write_report(
+    query=query,
+    findings=findings,
+)
+print(report)
+```
+### With Output Length Specification
+```python
+report = await writer.write_report(
+    query="Explain machine learning",
+    findings=findings,
+    output_length="500 words",
+)
+```
+### With Additional Instructions
+```python
+report = await writer.write_report(
+    query="Explain machine learning",
+    findings=findings,
+    output_length="A comprehensive overview",
+    output_instructions="Use formal academic language and include examples",
+)
+```
+### Integration with IterativeResearchFlow
+The `WriterAgent` is automatically used by `IterativeResearchFlow`:
+```python
+from src.agent_factory.agents import create_iterative_flow
+flow = create_iterative_flow(max_iterations=5, max_time_minutes=10)
+report = await flow.run(
+    query="What is quantum computing?",
+    output_length="A detailed explanation",
+    output_instructions="Include practical applications",
+)
+```
+## LongWriterAgent
+The `LongWriterAgent` iteratively writes report sections with proper citation management. It's used in deep research flows.
+### Basic Usage
+```python
+from src.agent_factory.agents import create_long_writer_agent
+from src.utils.models import ReportDraft, ReportDraftSection
+# Create long writer agent
+long_writer = create_long_writer_agent()
+# Create report draft with sections
+report_draft = ReportDraft(
+    sections=[
+        ReportDraftSection(
+            section_title="Introduction",
+            section_content="Draft content for introduction with [1].",
+        ),
+        ReportDraftSection(
+            section_title="Methods",
+            section_content="Draft content for methods with [2].",
+        ),
+        ReportDraftSection(
+            section_title="Results",
+            section_content="Draft content for results with [3].",
+        ),
+    ]
+)
+# Generate full report
+report = await long_writer.write_report(
+    original_query="What are the main features of Python?",
+    report_title="Python Programming Language Overview",
+    report_draft=report_draft,
+)
+print(report)
+```
+### Writing Individual Sections
+You can also write sections one at a time:
+```python
+# Write first section
+section_output = await long_writer.write_next_section(
+    original_query="What is Python?",
+    report_draft="",  # No existing draft
+    next_section_title="Introduction",
+    next_section_draft="Python is a programming language...",
+)
+print(section_output.next_section_markdown)
+print(section_output.references)
+# Write second section with existing draft
+section_output = await long_writer.write_next_section(
+    original_query="What is Python?",
+    report_draft="# Report\n\n## Introduction\n\nContent...",
+    next_section_title="Features",
+    next_section_draft="Python features include...",
+)
+```
+### Integration with DeepResearchFlow
+The `LongWriterAgent` is automatically used by `DeepResearchFlow`:
+```python
+from src.agent_factory.agents import create_deep_flow
+flow = create_deep_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+    use_long_writer=True,  # Use long writer (default)
+)
+report = await flow.run("What are the main features of Python programming language?")
+```
+## ProofreaderAgent
+The `ProofreaderAgent` finalizes and polishes report drafts by removing duplicates, adding summaries, and refining wording.
+### Basic Usage
+```python
+from src.agent_factory.agents import create_proofreader_agent
+from src.utils.models import ReportDraft, ReportDraftSection
+# Create proofreader agent
+proofreader = create_proofreader_agent()
+# Create report draft
+report_draft = ReportDraft(
+    sections=[
+        ReportDraftSection(
+            section_title="Introduction",
+            section_content="Python is a programming language [1].",
+        ),
+        ReportDraftSection(
+            section_title="Features",
+            section_content="Python has many features [2].",
+        ),
+    ]
+)
+# Proofread and finalize
+final_report = await proofreader.proofread(
+    query="What is Python?",
+    report_draft=report_draft,
+)
+print(final_report)
+```
+### Integration with DeepResearchFlow
+Use `ProofreaderAgent` instead of `LongWriterAgent`:
+```python
+from src.agent_factory.agents import create_deep_flow
+flow = create_deep_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+    use_long_writer=False,  # Use proofreader instead
+)
+report = await flow.run("What are the main features of Python?")
+```
+## Error Handling
+All writer agents include robust error handling:
+### Handling Empty Inputs
+```python
+# WriterAgent handles empty findings gracefully
+report = await writer.write_report(
+    query="Test query",
+    findings="",  # Empty findings
+)
+# Returns a fallback report
+# LongWriterAgent handles empty sections
+report = await long_writer.write_report(
+    original_query="Test",
+    report_title="Test Report",
+    report_draft=ReportDraft(sections=[]),  # Empty draft
+)
+# Returns minimal report
+# ProofreaderAgent handles empty drafts
+report = await proofreader.proofread(
+    query="Test",
+    report_draft=ReportDraft(sections=[]),
+)
+# Returns minimal report
+```
+### Retry Logic
+All agents automatically retry on transient errors (timeouts, connection errors):
+```python
+# Automatically retries up to 3 times on transient failures
+report = await writer.write_report(
+    query="Test query",
+    findings=findings,
+)
+```
+### Fallback Reports
+If all retries fail, agents return fallback reports:
+```python
+# Returns fallback report with query and findings
+report = await writer.write_report(
+    query="Test query",
+    findings=findings,
+)
+# Fallback includes: "# Research Report\n\n## Query\n...\n\n## Findings\n..."
+```
+## Citation Validation
+### For Markdown Reports
+Use the markdown citation validator:
+```python
+from src.utils.citation_validator import validate_markdown_citations
+from src.utils.models import Evidence, Citation
+# Collect evidence during research
+evidence = [
+    Evidence(
+        content="Paris is the capital of France",
+        citation=Citation(
+            source="web",
+            title="France Information",
+            url="https://example.com/france",
+            date="2024-01-01",
+        ),
+    ),
+]
+# Generate report
+report = await writer.write_report(query="What is the capital of France?", findings=findings)
+# Validate citations
+validated_report, removed_count = validate_markdown_citations(report, evidence)
+if removed_count > 0:
+    print(f"Removed {removed_count} invalid citations")
+```
+### For ResearchReport Objects
+Use the structured citation validator:
+```python
+from src.utils.citation_validator import validate_references
+# For ResearchReport objects (from ReportAgent)
+validated_report = validate_references(report, evidence)
+```
+## Custom Model Configuration
+All writer agents support custom model configuration:
+```python
+from pydantic_ai import Model
+# Create custom model
+custom_model = Model("openai", "gpt-4")
+# Use with writer agents
+writer = create_writer_agent(model=custom_model)
+long_writer = create_long_writer_agent(model=custom_model)
+proofreader = create_proofreader_agent(model=custom_model)
+```
+## Best Practices
+1. **Use WriterAgent for simple reports** - When you have findings as a string and need a quick report
+2. **Use LongWriterAgent for structured reports** - When you need multiple sections with proper citation management
+3. **Use ProofreaderAgent for final polish** - When you have draft sections and need a polished final report
+4. **Validate citations** - Always validate citations against collected evidence
+5. **Handle errors gracefully** - All agents return fallback reports on failure
+6. **Specify output length** - Use `output_length` parameter to control report size
+7. **Provide instructions** - Use `output_instructions` for specific formatting requirements
+## Integration Examples
+### Full Iterative Research Flow
+```python
+from src.agent_factory.agents import create_iterative_flow
+flow = create_iterative_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+)
+report = await flow.run(
+    query="What is machine learning?",
+    output_length="A comprehensive 1000-word explanation",
+    output_instructions="Include practical examples and use cases",
+)
+```
+### Full Deep Research Flow with Long Writer
+```python
+from src.agent_factory.agents import create_deep_flow
+flow = create_deep_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+    use_long_writer=True,
+)
+report = await flow.run("What are the main features of Python programming language?")
+```
+### Full Deep Research Flow with Proofreader
+```python
+from src.agent_factory.agents import create_deep_flow
+flow = create_deep_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+    use_long_writer=False,  # Use proofreader
+)
+report = await flow.run("Explain quantum computing basics")
+```
+## Troubleshooting
+### Empty Reports
+If you get empty reports, check:
+- Input validation logs (agents log warnings for empty inputs)
+- LLM API key configuration
+- Network connectivity
+### Citation Issues
+If citations are missing or invalid:
+- Use `validate_markdown_citations()` to check citations
+- Ensure Evidence objects are properly collected during research
+- Check that URLs in findings match Evidence URLs
+### Performance Issues
+For large reports:
+- Use `LongWriterAgent` for better section management
+- Consider truncating very long findings (agents do this automatically)
+- Use appropriate `max_time_minutes` settings
+## See Also
+- [Research Flows Documentation](../orchestrator/research_flows.md)
+- [Citation Validation](../utils/citation_validation.md)
+- [Agent Factory](../agent_factory/agents.md)

docs/implementation/02_phase_search.md CHANGED Viewed

@@ -4,6 +4,8 @@
 **Philosophy**: "Real data, mocked connections."
 **Prerequisite**: Phase 1 complete (all tests passing)
 ---
 ## 1. The Slice Definition
@@ -12,17 +14,20 @@ This slice covers:
 1. **Input**: A string query (e.g., "metformin Alzheimer's disease").
 2. **Process**:
    - Fetch from PubMed (E-utilities API).
-   - Fetch from Web (DuckDuckGo).
    - Normalize results into `Evidence` models.
 3. **Output**: A list of `Evidence` objects.
 **Files to Create**:
 - `src/utils/models.py` - Pydantic models (Evidence, Citation, SearchResult)
 - `src/tools/pubmed.py` - PubMed E-utilities tool
-- `src/tools/websearch.py` - DuckDuckGo search tool
 - `src/tools/search_handler.py` - Orchestrates multiple tools
 - `src/tools/__init__.py` - Exports
 ---
 ## 2. PubMed E-utilities API Reference
@@ -767,17 +772,23 @@ async def test_pubmed_live_search():
 ## 8. Implementation Checklist
-- [ ] Create `src/utils/models.py` with all Pydantic models (Evidence, Citation, SearchResult)
-- [ ] Create `src/tools/__init__.py` with SearchTool Protocol and exports
-- [ ] Implement `src/tools/pubmed.py` with PubMedTool class
-- [ ] Implement `src/tools/websearch.py` with WebTool class
-- [ ] Create `src/tools/search_handler.py` with SearchHandler class
-- [ ] Write tests in `tests/unit/tools/test_pubmed.py`
-- [ ] Write tests in `tests/unit/tools/test_websearch.py`
-- [ ] Write tests in `tests/unit/tools/test_search_handler.py`
-- [ ] Run `uv run pytest tests/unit/tools/ -v` — **ALL TESTS MUST PASS**
 - [ ] (Optional) Run integration test: `uv run pytest -m integration`
-- [ ] Commit: `git commit -m "feat: phase 2 search slice complete"`
 ---
@@ -785,20 +796,19 @@ async def test_pubmed_live_search():
 Phase 2 is **COMPLETE** when:
-1. All unit tests pass: `uv run pytest tests/unit/tools/ -v`
-2. `SearchHandler` can execute with both tools
-3. Graceful degradation: if PubMed fails, WebTool results still return
-4. Rate limiting is enforced (verify no 429 errors)
-5. Can run this in Python REPL:
 ```python
 import asyncio
 from src.tools.pubmed import PubMedTool
-from src.tools.websearch import WebTool
 from src.tools.search_handler import SearchHandler
 async def test():
-    handler = SearchHandler([PubMedTool(), WebTool()])
     result = await handler.execute("metformin alzheimer")
     print(f"Found {result.total_found} results")
     for e in result.evidence[:3]:
@@ -807,4 +817,6 @@ async def test():
 asyncio.run(test())
 ```
 **Proceed to Phase 3 ONLY after all checkboxes are complete.**

 **Philosophy**: "Real data, mocked connections."
 **Prerequisite**: Phase 1 complete (all tests passing)
+> **⚠️ Implementation Note (2025-01-27)**: The DuckDuckGo WebTool specified in this phase was removed in favor of the Europe PMC tool (see Phase 11). Europe PMC provides better coverage for biomedical research by including preprints, peer-reviewed articles, and patents. The current implementation uses PubMed, ClinicalTrials.gov, and Europe PMC as search sources.
 ---
 ## 1. The Slice Definition
 1. **Input**: A string query (e.g., "metformin Alzheimer's disease").
 2. **Process**:
    - Fetch from PubMed (E-utilities API).
+   - ~~Fetch from Web (DuckDuckGo).~~ **REMOVED** - Replaced by Europe PMC in Phase 11
    - Normalize results into `Evidence` models.
 3. **Output**: A list of `Evidence` objects.
 **Files to Create**:
 - `src/utils/models.py` - Pydantic models (Evidence, Citation, SearchResult)
 - `src/tools/pubmed.py` - PubMed E-utilities tool
+- ~~`src/tools/websearch.py` - DuckDuckGo search tool~~ **REMOVED** - See Phase 11 for Europe PMC replacement
 - `src/tools/search_handler.py` - Orchestrates multiple tools
 - `src/tools/__init__.py` - Exports
+**Additional Files (Post-Phase 2 Enhancements)**:
+- `src/tools/query_utils.py` - Query preprocessing (removes question words, expands medical synonyms)
 ---
 ## 2. PubMed E-utilities API Reference
 ## 8. Implementation Checklist
+- [x] Create `src/utils/models.py` with all Pydantic models (Evidence, Citation, SearchResult) - **COMPLETE**
+- [x] Create `src/tools/__init__.py` with SearchTool Protocol and exports - **COMPLETE**
+- [x] Implement `src/tools/pubmed.py` with PubMedTool class - **COMPLETE**
+- [ ] ~~Implement `src/tools/websearch.py` with WebTool class~~ - **REMOVED** (replaced by Europe PMC in Phase 11)
+- [x] Create `src/tools/search_handler.py` with SearchHandler class - **COMPLETE**
+- [x] Write tests in `tests/unit/tools/test_pubmed.py` - **COMPLETE** (basic tests)
+- [ ] Write tests in `tests/unit/tools/test_websearch.py` - **N/A** (WebTool removed)
+- [x] Write tests in `tests/unit/tools/test_search_handler.py` - **COMPLETE** (basic tests)
+- [x] Run `uv run pytest tests/unit/tools/ -v` — **ALL TESTS MUST PASS** - **PASSING**
 - [ ] (Optional) Run integration test: `uv run pytest -m integration`
+- [ ] Add edge case tests (rate limiting, error handling, timeouts) - **PENDING**
+- [ ] Commit: `git commit -m "feat: phase 2 search slice complete"` - **DONE**
+**Post-Phase 2 Enhancements**:
+- [x] Query preprocessing (`src/tools/query_utils.py`) - **ADDED**
+- [x] Europe PMC tool (Phase 11) - **ADDED**
+- [x] ClinicalTrials tool (Phase 10) - **ADDED**
 ---
 Phase 2 is **COMPLETE** when:
+1. ✅ All unit tests pass: `uv run pytest tests/unit/tools/ -v` - **PASSING**
+2. ✅ `SearchHandler` can execute with search tools - **WORKING**
+3. ✅ Graceful degradation: if one tool fails, other tools still return results - **IMPLEMENTED**
+4. ✅ Rate limiting is enforced (verify no 429 errors) - **IMPLEMENTED**
+5. ✅ Can run this in Python REPL:
 ```python
 import asyncio
 from src.tools.pubmed import PubMedTool
 from src.tools.search_handler import SearchHandler
 async def test():
+    handler = SearchHandler([PubMedTool()])
     result = await handler.execute("metformin alzheimer")
     print(f"Found {result.total_found} results")
     for e in result.evidence[:3]:
 asyncio.run(test())
 ```
+**Note**: WebTool was removed in favor of Europe PMC (Phase 11). The current implementation uses PubMed as the primary Phase 2 tool, with Europe PMC and ClinicalTrials added in later phases.
 **Proceed to Phase 3 ONLY after all checkboxes are complete.**

examples/rate_limiting_demo.py CHANGED Viewed

@@ -22,7 +22,7 @@ async def test_basic_limiter():
     for i in range(6):
         await limiter.acquire()
         elapsed = time.monotonic() - start
-        print(f"  Request {i+1} at {elapsed:.2f}s")
     total = time.monotonic() - start
     print(f"  Total time for 6 requests: {total:.2f}s (expected ~2s)")

     for i in range(6):
         await limiter.acquire()
         elapsed = time.monotonic() - start
+        print(f"  Request {i + 1} at {elapsed:.2f}s")
     total = time.monotonic() - start
     print(f"  Total time for 6 requests: {total:.2f}s (expected ~2s)")

main.py DELETED Viewed

@@ -1,6 +0,0 @@
-def main():
-    print("Hello from deepcritical!")
-if __name__ == "__main__":
-    main()

pyproject.toml CHANGED Viewed

@@ -24,8 +24,13 @@ dependencies = [
     "tenacity>=8.2", # Retry logic
     "structlog>=24.1", # Structured logging
     "requests>=2.32.5", # ClinicalTrials.gov (httpx blocked by WAF)
     "limits>=3.0", # Rate limiting
     "duckduckgo-search>=5.0", # Web search
 ]
 [project.optional-dependencies]
@@ -50,6 +55,7 @@ magentic = [
 embeddings = [
     "chromadb>=0.4.0",
     "sentence-transformers>=2.2.0",
 ]
 modal = [
     # Mario's Modal code execution + LlamaIndex RAG
@@ -59,6 +65,7 @@ modal = [
     "llama-index-embeddings-openai",
     "llama-index-vector-stores-chroma",
     "chromadb>=0.4.0",
 ]
 [build-system]
@@ -72,7 +79,13 @@ packages = ["src"]
 [tool.ruff]
 line-length = 100
 target-version = "py311"
-src = ["src", "tests"]
 [tool.ruff.lint]
 select = [
@@ -93,6 +106,7 @@ ignore = [
     "PLW0603",  # Global statement (singleton pattern for Modal)
     "PLC0415",  # Lazy imports for optional dependencies
     "E402",     # Module level import not at top (needed for pytest.importorskip)
     "RUF100",   # Unused noqa (version differences between local/CI)
 ]
@@ -107,9 +121,12 @@ ignore_missing_imports = true
 disallow_untyped_defs = true
 warn_return_any = true
 warn_unused_ignores = false
 exclude = [
     "^reference_repos/",
     "^examples/",
 ]
 # ============== PYTEST CONFIG ==============
@@ -120,11 +137,17 @@ addopts = [
     "-v",
     "--tb=short",
     "--strict-markers",
 ]
 markers = [
     "unit: Unit tests (mocked)",
     "integration: Integration tests (real APIs)",
     "slow: Slow tests",
 ]
 # ============== COVERAGE CONFIG ==============
@@ -139,5 +162,11 @@ exclude_lines = [
     "raise NotImplementedError",
 ]
 # Note: agent-framework-core is optional for magentic mode (multi-agent orchestration)
 # Version pinned to 1.0.0b* to avoid breaking changes. CI skips tests via pytest.importorskip

     "tenacity>=8.2", # Retry logic
     "structlog>=24.1", # Structured logging
     "requests>=2.32.5", # ClinicalTrials.gov (httpx blocked by WAF)
+    "pydantic-graph>=1.22.0",
     "limits>=3.0", # Rate limiting
     "duckduckgo-search>=5.0", # Web search
+    "llama-index-llms-huggingface>=0.6.1",
+    "llama-index-llms-huggingface-api>=0.6.1",
+    "llama-index-vector-stores-chroma>=0.5.3",
+    "llama-index>=0.14.8",
 ]
 [project.optional-dependencies]
 embeddings = [
     "chromadb>=0.4.0",
     "sentence-transformers>=2.2.0",
+    "numpy<2.0",  # chromadb compatibility: uses np.float_ removed in NumPy 2.0
 ]
 modal = [
     # Mario's Modal code execution + LlamaIndex RAG
     "llama-index-embeddings-openai",
     "llama-index-vector-stores-chroma",
     "chromadb>=0.4.0",
+    "numpy<2.0",  # chromadb compatibility: uses np.float_ removed in NumPy 2.0
 ]
 [build-system]
 [tool.ruff]
 line-length = 100
 target-version = "py311"
+src = ["src"]
+exclude = [
+    "tests/",
+    "examples/",
+    "reference_repos/",
+    "folder/",
+]
 [tool.ruff.lint]
 select = [
     "PLW0603",  # Global statement (singleton pattern for Modal)
     "PLC0415",  # Lazy imports for optional dependencies
     "E402",     # Module level import not at top (needed for pytest.importorskip)
+    "E501",     # Line too long (ignore line length violations)
     "RUF100",   # Unused noqa (version differences between local/CI)
 ]
 disallow_untyped_defs = true
 warn_return_any = true
 warn_unused_ignores = false
+explicit_package_bases = true
+mypy_path = "."
 exclude = [
     "^reference_repos/",
     "^examples/",
+    "^folder/",
 ]
 # ============== PYTEST CONFIG ==============
     "-v",
     "--tb=short",
     "--strict-markers",
+    "-p",
+    "no:logfire",
 ]
 markers = [
     "unit: Unit tests (mocked)",
     "integration: Integration tests (real APIs)",
     "slow: Slow tests",
+    "openai: Tests that require OpenAI API key",
+    "huggingface: Tests that require HuggingFace API key or use HuggingFace models",
+    "embedding_provider: Tests that require API-based embedding providers (OpenAI, etc.)",
+    "local_embeddings: Tests that use local embeddings (sentence-transformers, ChromaDB)",
 ]
 # ============== COVERAGE CONFIG ==============
     "raise NotImplementedError",
 ]
+[dependency-groups]
+dev = [
+    "structlog>=25.5.0",
+    "ty>=0.0.1a28",
+]
 # Note: agent-framework-core is optional for magentic mode (multi-agent orchestration)
 # Version pinned to 1.0.0b* to avoid breaking changes. CI skips tests via pytest.importorskip

requirements.txt CHANGED Viewed

@@ -3,6 +3,7 @@ pydantic>=2.7
 pydantic-settings>=2.2
 pydantic-ai>=0.0.16
 # AI Providers
 openai>=1.0.0
 anthropic>=0.18.0
@@ -34,6 +35,7 @@ modal>=0.63.0
 # Optional: LlamaIndex RAG
 llama-index>=0.11.0
 llama-index-llms-openai
 llama-index-embeddings-openai
 llama-index-vector-stores-chroma
 chromadb>=0.4.0

 pydantic-settings>=2.2
 pydantic-ai>=0.0.16
 # AI Providers
 openai>=1.0.0
 anthropic>=0.18.0
 # Optional: LlamaIndex RAG
 llama-index>=0.11.0
 llama-index-llms-openai
+llama-index-llms-huggingface  # Optional: For HuggingFace LLM support in RAG
 llama-index-embeddings-openai
 llama-index-vector-stores-chroma
 chromadb>=0.4.0

src/agent_factory/agents.py CHANGED Viewed

	@@ -0,0 +1,339 @@

+"""Agent factory functions for creating research agents.
+Provides factory functions for creating all Pydantic AI agents used in
+the research workflows, following the pattern from judges.py.
+"""
+from typing import TYPE_CHECKING, Any
+import structlog
+from src.utils.config import settings
+from src.utils.exceptions import ConfigurationError
+if TYPE_CHECKING:
+    from src.agent_factory.graph_builder import GraphBuilder
+    from src.agents.input_parser import InputParserAgent
+    from src.agents.knowledge_gap import KnowledgeGapAgent
+    from src.agents.long_writer import LongWriterAgent
+    from src.agents.proofreader import ProofreaderAgent
+    from src.agents.thinking import ThinkingAgent
+    from src.agents.tool_selector import ToolSelectorAgent
+    from src.agents.writer import WriterAgent
+    from src.orchestrator.graph_orchestrator import GraphOrchestrator
+    from src.orchestrator.planner_agent import PlannerAgent
+    from src.orchestrator.research_flow import DeepResearchFlow, IterativeResearchFlow
+logger = structlog.get_logger()
+def create_input_parser_agent(model: Any | None = None) -> "InputParserAgent":
+    """
+    Create input parser agent for query analysis and research mode detection.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured InputParserAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.input_parser import create_input_parser_agent as _create_agent
+    try:
+        logger.debug("Creating input parser agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create input parser agent", error=str(e))
+        raise ConfigurationError(f"Failed to create input parser agent: {e}") from e
+def create_planner_agent(model: Any | None = None) -> "PlannerAgent":
+    """
+    Create planner agent with web search and crawl tools.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured PlannerAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    # Lazy import to avoid circular dependencies
+    from src.orchestrator.planner_agent import create_planner_agent as _create_planner_agent
+    try:
+        logger.debug("Creating planner agent")
+        return _create_planner_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create planner agent", error=str(e))
+        raise ConfigurationError(f"Failed to create planner agent: {e}") from e
+def create_knowledge_gap_agent(model: Any | None = None) -> "KnowledgeGapAgent":
+    """
+    Create knowledge gap agent for evaluating research completeness.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured KnowledgeGapAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.knowledge_gap import create_knowledge_gap_agent as _create_agent
+    try:
+        logger.debug("Creating knowledge gap agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create knowledge gap agent", error=str(e))
+        raise ConfigurationError(f"Failed to create knowledge gap agent: {e}") from e
+def create_tool_selector_agent(model: Any | None = None) -> "ToolSelectorAgent":
+    """
+    Create tool selector agent for choosing tools to address gaps.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ToolSelectorAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.tool_selector import create_tool_selector_agent as _create_agent
+    try:
+        logger.debug("Creating tool selector agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create tool selector agent", error=str(e))
+        raise ConfigurationError(f"Failed to create tool selector agent: {e}") from e
+def create_thinking_agent(model: Any | None = None) -> "ThinkingAgent":
+    """
+    Create thinking agent for generating observations.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ThinkingAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.thinking import create_thinking_agent as _create_agent
+    try:
+        logger.debug("Creating thinking agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create thinking agent", error=str(e))
+        raise ConfigurationError(f"Failed to create thinking agent: {e}") from e
+def create_writer_agent(model: Any | None = None) -> "WriterAgent":
+    """
+    Create writer agent for generating final reports.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured WriterAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.writer import create_writer_agent as _create_agent
+    try:
+        logger.debug("Creating writer agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create writer agent", error=str(e))
+        raise ConfigurationError(f"Failed to create writer agent: {e}") from e
+def create_long_writer_agent(model: Any | None = None) -> "LongWriterAgent":
+    """
+    Create long writer agent for iteratively writing report sections.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured LongWriterAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.long_writer import create_long_writer_agent as _create_agent
+    try:
+        logger.debug("Creating long writer agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create long writer agent", error=str(e))
+        raise ConfigurationError(f"Failed to create long writer agent: {e}") from e
+def create_proofreader_agent(model: Any | None = None) -> "ProofreaderAgent":
+    """
+    Create proofreader agent for finalizing report drafts.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ProofreaderAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.proofreader import create_proofreader_agent as _create_agent
+    try:
+        logger.debug("Creating proofreader agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create proofreader agent", error=str(e))
+        raise ConfigurationError(f"Failed to create proofreader agent: {e}") from e
+def create_iterative_flow(
+    max_iterations: int = 5,
+    max_time_minutes: int = 10,
+    verbose: bool = True,
+    use_graph: bool | None = None,
+) -> "IterativeResearchFlow":
+    """
+    Create iterative research flow.
+    Args:
+        max_iterations: Maximum number of iterations
+        max_time_minutes: Maximum time in minutes
+        verbose: Whether to log progress
+        use_graph: Whether to use graph execution. If None, reads from settings.use_graph_execution
+    Returns:
+        Configured IterativeResearchFlow instance
+    """
+    from src.orchestrator.research_flow import IterativeResearchFlow
+    try:
+        # Use settings default if not explicitly provided
+        if use_graph is None:
+            use_graph = settings.use_graph_execution
+        logger.debug("Creating iterative research flow", use_graph=use_graph)
+        return IterativeResearchFlow(
+            max_iterations=max_iterations,
+            max_time_minutes=max_time_minutes,
+            verbose=verbose,
+            use_graph=use_graph,
+        )
+    except Exception as e:
+        logger.error("Failed to create iterative flow", error=str(e))
+        raise ConfigurationError(f"Failed to create iterative flow: {e}") from e
+def create_deep_flow(
+    max_iterations: int = 5,
+    max_time_minutes: int = 10,
+    verbose: bool = True,
+    use_long_writer: bool = True,
+    use_graph: bool | None = None,
+) -> "DeepResearchFlow":
+    """
+    Create deep research flow.
+    Args:
+        max_iterations: Maximum iterations per section
+        max_time_minutes: Maximum time per section
+        verbose: Whether to log progress
+        use_long_writer: Whether to use long writer (True) or proofreader (False)
+        use_graph: Whether to use graph execution. If None, reads from settings.use_graph_execution
+    Returns:
+        Configured DeepResearchFlow instance
+    """
+    from src.orchestrator.research_flow import DeepResearchFlow
+    try:
+        # Use settings default if not explicitly provided
+        if use_graph is None:
+            use_graph = settings.use_graph_execution
+        logger.debug("Creating deep research flow", use_graph=use_graph)
+        return DeepResearchFlow(
+            max_iterations=max_iterations,
+            max_time_minutes=max_time_minutes,
+            verbose=verbose,
+            use_long_writer=use_long_writer,
+            use_graph=use_graph,
+        )
+    except Exception as e:
+        logger.error("Failed to create deep flow", error=str(e))
+        raise ConfigurationError(f"Failed to create deep flow: {e}") from e
+def create_graph_orchestrator(
+    mode: str = "auto",
+    max_iterations: int = 5,
+    max_time_minutes: int = 10,
+    use_graph: bool = True,
+) -> "GraphOrchestrator":
+    """
+    Create graph orchestrator.
+    Args:
+        mode: Research mode ("iterative", "deep", or "auto")
+        max_iterations: Maximum iterations per loop
+        max_time_minutes: Maximum time per loop
+        use_graph: Whether to use graph execution (True) or agent chains (False)
+    Returns:
+        Configured GraphOrchestrator instance
+    """
+    from src.orchestrator.graph_orchestrator import create_graph_orchestrator as _create
+    try:
+        logger.debug("Creating graph orchestrator", mode=mode, use_graph=use_graph)
+        return _create(
+            mode=mode,  # type: ignore[arg-type]
+            max_iterations=max_iterations,
+            max_time_minutes=max_time_minutes,
+            use_graph=use_graph,
+        )
+    except Exception as e:
+        logger.error("Failed to create graph orchestrator", error=str(e))
+        raise ConfigurationError(f"Failed to create graph orchestrator: {e}") from e
+def create_graph_builder() -> "GraphBuilder":
+    """
+    Create a graph builder instance.
+    Returns:
+        GraphBuilder instance
+    """
+    from src.agent_factory.graph_builder import GraphBuilder
+    try:
+        logger.debug("Creating graph builder")
+        return GraphBuilder()
+    except Exception as e:
+        logger.error("Failed to create graph builder", error=str(e))
+        raise ConfigurationError(f"Failed to create graph builder: {e}") from e

src/agent_factory/graph_builder.py ADDED Viewed

	@@ -0,0 +1,608 @@

+"""Graph builder utilities for constructing research workflow graphs.
+Provides classes and utilities for building graph-based orchestration systems
+using Pydantic AI agents as nodes.
+"""
+from collections.abc import Callable
+from typing import TYPE_CHECKING, Any, Literal
+import structlog
+from pydantic import BaseModel, Field
+if TYPE_CHECKING:
+    from pydantic_ai import Agent
+    from src.middleware.state_machine import WorkflowState
+logger = structlog.get_logger()
+# ============================================================================
+# Graph Node Models
+# ============================================================================
+class GraphNode(BaseModel):
+    """Base class for graph nodes."""
+    node_id: str = Field(description="Unique identifier for the node")
+    node_type: Literal["agent", "state", "decision", "parallel"] = Field(description="Type of node")
+    description: str = Field(default="", description="Human-readable description of the node")
+    model_config = {"frozen": True}
+class AgentNode(GraphNode):
+    """Node that executes a Pydantic AI agent."""
+    node_type: Literal["agent"] = "agent"
+    agent: Any = Field(description="Pydantic AI agent to execute")
+    input_transformer: Callable[[Any], Any] | None = Field(
+        default=None, description="Transform input before passing to agent"
+    )
+    output_transformer: Callable[[Any], Any] | None = Field(
+        default=None, description="Transform output after agent execution"
+    )
+    model_config = {"arbitrary_types_allowed": True}
+class StateNode(GraphNode):
+    """Node that updates or reads workflow state."""
+    node_type: Literal["state"] = "state"
+    state_updater: Callable[[Any, Any], Any] = Field(
+        description="Function to update workflow state"
+    )
+    state_reader: Callable[[Any], Any] | None = Field(
+        default=None, description="Function to read state (optional)"
+    )
+    model_config = {"arbitrary_types_allowed": True}
+class DecisionNode(GraphNode):
+    """Node that makes routing decisions based on conditions."""
+    node_type: Literal["decision"] = "decision"
+    decision_function: Callable[[Any], str] = Field(
+        description="Function that returns next node ID based on input"
+    )
+    options: list[str] = Field(description="List of possible next node IDs", min_length=1)
+    model_config = {"arbitrary_types_allowed": True}
+class ParallelNode(GraphNode):
+    """Node that executes multiple nodes in parallel."""
+    node_type: Literal["parallel"] = "parallel"
+    parallel_nodes: list[str] = Field(
+        description="List of node IDs to run in parallel", min_length=0
+    )
+    aggregator: Callable[[list[Any]], Any] | None = Field(
+        default=None, description="Function to aggregate parallel results"
+    )
+    model_config = {"arbitrary_types_allowed": True}
+# ============================================================================
+# Graph Edge Models
+# ============================================================================
+class GraphEdge(BaseModel):
+    """Base class for graph edges."""
+    from_node: str = Field(description="Source node ID")
+    to_node: str = Field(description="Target node ID")
+    condition: Callable[[Any], bool] | None = Field(
+        default=None, description="Optional condition function"
+    )
+    weight: float = Field(default=1.0, description="Edge weight for routing decisions")
+    model_config = {"arbitrary_types_allowed": True}
+class SequentialEdge(GraphEdge):
+    """Edge that is always traversed (no condition)."""
+    condition: None = None
+class ConditionalEdge(GraphEdge):
+    """Edge that is traversed based on a condition."""
+    condition: Callable[[Any], bool] = Field(description="Required condition function")
+    condition_description: str = Field(
+        default="", description="Human-readable description of condition"
+    )
+class ParallelEdge(GraphEdge):
+    """Edge used for parallel execution branches."""
+    condition: None = None
+# ============================================================================
+# Research Graph Class
+# ============================================================================
+class ResearchGraph(BaseModel):
+    """Represents a research workflow graph with nodes and edges."""
+    nodes: dict[str, GraphNode] = Field(default_factory=dict, description="All nodes in the graph")
+    edges: dict[str, list[GraphEdge]] = Field(
+        default_factory=dict, description="Edges by source node ID"
+    )
+    entry_node: str = Field(description="Starting node ID")
+    exit_nodes: list[str] = Field(default_factory=list, description="Terminal node IDs")
+    model_config = {"arbitrary_types_allowed": True}
+    def add_node(self, node: GraphNode) -> None:
+        """Add a node to the graph.
+        Args:
+            node: The node to add
+        Raises:
+            ValueError: If node ID already exists
+        """
+        if node.node_id in self.nodes:
+            raise ValueError(f"Node {node.node_id} already exists in graph")
+        self.nodes[node.node_id] = node
+        logger.debug("Node added to graph", node_id=node.node_id, type=node.node_type)
+    def add_edge(self, edge: GraphEdge) -> None:
+        """Add an edge to the graph.
+        Args:
+            edge: The edge to add
+        Raises:
+            ValueError: If source or target node doesn't exist
+        """
+        if edge.from_node not in self.nodes:
+            raise ValueError(f"Source node {edge.from_node} not found in graph")
+        if edge.to_node not in self.nodes:
+            raise ValueError(f"Target node {edge.to_node} not found in graph")
+        if edge.from_node not in self.edges:
+            self.edges[edge.from_node] = []
+        self.edges[edge.from_node].append(edge)
+        logger.debug(
+            "Edge added to graph",
+            from_node=edge.from_node,
+            to_node=edge.to_node,
+        )
+    def get_node(self, node_id: str) -> GraphNode | None:
+        """Get a node by ID.
+        Args:
+            node_id: The node ID
+        Returns:
+            The node, or None if not found
+        """
+        return self.nodes.get(node_id)
+    def get_next_nodes(self, node_id: str, context: Any = None) -> list[tuple[str, GraphEdge]]:
+        """Get all possible next nodes from a given node.
+        Args:
+            node_id: The current node ID
+            context: Optional context for evaluating conditions
+        Returns:
+            List of (node_id, edge) tuples for valid next nodes
+        """
+        if node_id not in self.edges:
+            return []
+        next_nodes = []
+        for edge in self.edges[node_id]:
+            # Evaluate condition if present
+            if edge.condition is None or edge.condition(context):
+                next_nodes.append((edge.to_node, edge))
+        return next_nodes
+    def validate_structure(self) -> list[str]:
+        """Validate the graph structure.
+        Returns:
+            List of validation error messages (empty if valid)
+        """
+        errors = []
+        # Check entry node exists
+        if self.entry_node not in self.nodes:
+            errors.append(f"Entry node {self.entry_node} not found in graph")
+        # Check exit nodes exist and at least one is defined
+        if not self.exit_nodes:
+            errors.append("At least one exit node must be defined")
+        for exit_node in self.exit_nodes:
+            if exit_node not in self.nodes:
+                errors.append(f"Exit node {exit_node} not found in graph")
+        # Check all edges reference valid nodes
+        for from_node, edge_list in self.edges.items():
+            if from_node not in self.nodes:
+                errors.append(f"Edge source node {from_node} not found")
+            for edge in edge_list:
+                if edge.to_node not in self.nodes:
+                    errors.append(f"Edge target node {edge.to_node} not found")
+        # Check all nodes are reachable from entry node (basic check)
+        if self.entry_node in self.nodes:
+            reachable = {self.entry_node}
+            queue = [self.entry_node]
+            while queue:
+                current = queue.pop(0)
+                for next_node, _ in self.get_next_nodes(current):
+                    if next_node not in reachable:
+                        reachable.add(next_node)
+                        queue.append(next_node)
+            unreachable = set(self.nodes.keys()) - reachable
+            if unreachable:
+                errors.append(f"Unreachable nodes from entry node: {', '.join(unreachable)}")
+        return errors
+# ============================================================================
+# Graph Builder Class
+# ============================================================================
+class GraphBuilder:
+    """Builder for constructing research workflow graphs."""
+    def __init__(self) -> None:
+        """Initialize the graph builder."""
+        self.graph = ResearchGraph(entry_node="", exit_nodes=[])
+    def add_agent_node(
+        self,
+        node_id: str,
+        agent: "Agent[Any, Any]",
+        description: str = "",
+        input_transformer: Callable[[Any], Any] | None = None,
+        output_transformer: Callable[[Any], Any] | None = None,
+    ) -> "GraphBuilder":
+        """Add an agent node to the graph.
+        Args:
+            node_id: Unique identifier for the node
+            agent: Pydantic AI agent to execute
+            description: Human-readable description
+            input_transformer: Optional input transformation function
+            output_transformer: Optional output transformation function
+        Returns:
+            Self for method chaining
+        """
+        node = AgentNode(
+            node_id=node_id,
+            agent=agent,
+            description=description,
+            input_transformer=input_transformer,
+            output_transformer=output_transformer,
+        )
+        self.graph.add_node(node)
+        return self
+    def add_state_node(
+        self,
+        node_id: str,
+        state_updater: Callable[["WorkflowState", Any], "WorkflowState"],
+        description: str = "",
+        state_reader: Callable[["WorkflowState"], Any] | None = None,
+    ) -> "GraphBuilder":
+        """Add a state node to the graph.
+        Args:
+            node_id: Unique identifier for the node
+            state_updater: Function to update workflow state
+            description: Human-readable description
+            state_reader: Optional function to read state
+        Returns:
+            Self for method chaining
+        """
+        node = StateNode(
+            node_id=node_id,
+            state_updater=state_updater,
+            description=description,
+            state_reader=state_reader,
+        )
+        self.graph.add_node(node)
+        return self
+    def add_decision_node(
+        self,
+        node_id: str,
+        decision_function: Callable[[Any], str],
+        options: list[str],
+        description: str = "",
+    ) -> "GraphBuilder":
+        """Add a decision node to the graph.
+        Args:
+            node_id: Unique identifier for the node
+            decision_function: Function that returns next node ID
+            options: List of possible next node IDs
+            description: Human-readable description
+        Returns:
+            Self for method chaining
+        """
+        node = DecisionNode(
+            node_id=node_id,
+            decision_function=decision_function,
+            options=options,
+            description=description,
+        )
+        self.graph.add_node(node)
+        return self
+    def add_parallel_node(
+        self,
+        node_id: str,
+        parallel_nodes: list[str],
+        description: str = "",
+        aggregator: Callable[[list[Any]], Any] | None = None,
+    ) -> "GraphBuilder":
+        """Add a parallel node to the graph.
+        Args:
+            node_id: Unique identifier for the node
+            parallel_nodes: List of node IDs to run in parallel
+            description: Human-readable description
+            aggregator: Optional function to aggregate results
+        Returns:
+            Self for method chaining
+        """
+        node = ParallelNode(
+            node_id=node_id,
+            parallel_nodes=parallel_nodes,
+            description=description,
+            aggregator=aggregator,
+        )
+        self.graph.add_node(node)
+        return self
+    def connect_nodes(
+        self,
+        from_node: str,
+        to_node: str,
+        condition: Callable[[Any], bool] | None = None,
+        condition_description: str = "",
+    ) -> "GraphBuilder":
+        """Connect two nodes with an edge.
+        Args:
+            from_node: Source node ID
+            to_node: Target node ID
+            condition: Optional condition function
+            condition_description: Description of condition (if conditional)
+        Returns:
+            Self for method chaining
+        """
+        if condition is None:
+            edge: GraphEdge = SequentialEdge(from_node=from_node, to_node=to_node)
+        else:
+            edge = ConditionalEdge(
+                from_node=from_node,
+                to_node=to_node,
+                condition=condition,
+                condition_description=condition_description,
+            )
+        self.graph.add_edge(edge)
+        return self
+    def set_entry_node(self, node_id: str) -> "GraphBuilder":
+        """Set the entry node for the graph.
+        Args:
+            node_id: The entry node ID
+        Returns:
+            Self for method chaining
+        """
+        self.graph.entry_node = node_id
+        return self
+    def set_exit_nodes(self, node_ids: list[str]) -> "GraphBuilder":
+        """Set the exit nodes for the graph.
+        Args:
+            node_ids: List of exit node IDs
+        Returns:
+            Self for method chaining
+        """
+        self.graph.exit_nodes = node_ids
+        return self
+    def build(self) -> ResearchGraph:
+        """Finalize graph construction and validate.
+        Returns:
+            The constructed ResearchGraph
+        Raises:
+            ValueError: If graph validation fails
+        """
+        errors = self.graph.validate_structure()
+        if errors:
+            error_msg = "Graph validation failed:\n" + "\n".join(f"  - {e}" for e in errors)
+            logger.error("Graph validation failed", errors=errors)
+            raise ValueError(error_msg)
+        logger.info(
+            "Graph built successfully",
+            nodes=len(self.graph.nodes),
+            edges=sum(len(edges) for edges in self.graph.edges.values()),
+            entry_node=self.graph.entry_node,
+            exit_nodes=self.graph.exit_nodes,
+        )
+        return self.graph
+# ============================================================================
+# Factory Functions
+# ============================================================================
+def create_iterative_graph(
+    knowledge_gap_agent: "Agent[Any, Any]",
+    tool_selector_agent: "Agent[Any, Any]",
+    thinking_agent: "Agent[Any, Any]",
+    writer_agent: "Agent[Any, Any]",
+) -> ResearchGraph:
+    """Create a graph for iterative research flow.
+    Args:
+        knowledge_gap_agent: Agent for evaluating knowledge gaps
+        tool_selector_agent: Agent for selecting tools
+        thinking_agent: Agent for generating observations
+        writer_agent: Agent for writing final report
+    Returns:
+        Constructed ResearchGraph for iterative research
+    """
+    builder = GraphBuilder()
+    # Add nodes
+    builder.add_agent_node("thinking", thinking_agent, "Generate observations")
+    builder.add_agent_node("knowledge_gap", knowledge_gap_agent, "Evaluate knowledge gaps")
+    builder.add_decision_node(
+        "continue_decision",
+        decision_function=lambda result: "writer"
+        if getattr(result, "research_complete", False)
+        else "tool_selector",
+        options=["tool_selector", "writer"],
+        description="Decide whether to continue research or write report",
+    )
+    builder.add_agent_node("tool_selector", tool_selector_agent, "Select tools to address gap")
+    builder.add_state_node(
+        "execute_tools",
+        state_updater=lambda state,
+        tasks: state,  # Placeholder - actual execution handled separately
+        description="Execute selected tools",
+    )
+    builder.add_agent_node("writer", writer_agent, "Write final report")
+    # Add edges
+    builder.connect_nodes("thinking", "knowledge_gap")
+    builder.connect_nodes("knowledge_gap", "continue_decision")
+    builder.connect_nodes("continue_decision", "tool_selector")
+    builder.connect_nodes("continue_decision", "writer")
+    builder.connect_nodes("tool_selector", "execute_tools")
+    builder.connect_nodes("execute_tools", "thinking")  # Loop back
+    # Set entry and exit
+    builder.set_entry_node("thinking")
+    builder.set_exit_nodes(["writer"])
+    return builder.build()
+def create_deep_graph(
+    planner_agent: "Agent[Any, Any]",
+    knowledge_gap_agent: "Agent[Any, Any]",
+    tool_selector_agent: "Agent[Any, Any]",
+    thinking_agent: "Agent[Any, Any]",
+    writer_agent: "Agent[Any, Any]",
+    long_writer_agent: "Agent[Any, Any]",
+) -> ResearchGraph:
+    """Create a graph for deep research flow.
+    The graph structure: planner → store_plan → parallel_loops → collect_drafts → synthesizer
+    Args:
+        planner_agent: Agent for creating report plan
+        knowledge_gap_agent: Agent for evaluating knowledge gaps (not used directly, but needed for iterative flows)
+        tool_selector_agent: Agent for selecting tools (not used directly, but needed for iterative flows)
+        thinking_agent: Agent for generating observations (not used directly, but needed for iterative flows)
+        writer_agent: Agent for writing section reports (not used directly, but needed for iterative flows)
+        long_writer_agent: Agent for synthesizing final report
+    Returns:
+        Constructed ResearchGraph for deep research
+    """
+    from src.utils.models import ReportPlan
+    builder = GraphBuilder()
+    # Add nodes
+    # 1. Planner agent - creates report plan
+    builder.add_agent_node("planner", planner_agent, "Create report plan with sections")
+    # 2. State node - store report plan in workflow state
+    def store_plan(state: "WorkflowState", plan: ReportPlan) -> "WorkflowState":
+        """Store report plan in state for parallel loops to access."""
+        # Store plan in a custom attribute (we'll need to extend WorkflowState or use a dict)
+        # For now, we'll store it in the context's node_results
+        # The actual storage will happen in the graph execution
+        return state
+    builder.add_state_node(
+        "store_plan",
+        state_updater=store_plan,
+        description="Store report plan in state",
+    )
+    # 3. Parallel node - will execute iterative research flows for each section
+    # The actual execution will be handled dynamically in _execute_parallel_node()
+    # We use a special node ID that the executor will recognize
+    builder.add_parallel_node(
+        "parallel_loops",
+        parallel_nodes=[],  # Will be populated dynamically based on report plan
+        description="Execute parallel iterative research loops for each section",
+        aggregator=lambda results: results,  # Collect all section drafts
+    )
+    # 4. State node - collect section drafts into ReportDraft
+    def collect_drafts(state: "WorkflowState", section_drafts: list[str]) -> "WorkflowState":
+        """Collect section drafts into state for synthesizer."""
+        # Store drafts in state (will be accessed by synthesizer)
+        return state
+    builder.add_state_node(
+        "collect_drafts",
+        state_updater=collect_drafts,
+        description="Collect section drafts for synthesis",
+    )
+    # 5. Synthesizer agent - creates final report from drafts
+    builder.add_agent_node(
+        "synthesizer", long_writer_agent, "Synthesize final report from section drafts"
+    )
+    # Add edges
+    builder.connect_nodes("planner", "store_plan")
+    builder.connect_nodes("store_plan", "parallel_loops")
+    builder.connect_nodes("parallel_loops", "collect_drafts")
+    builder.connect_nodes("collect_drafts", "synthesizer")
+    # Set entry and exit
+    builder.set_entry_node("planner")
+    builder.set_exit_nodes(["synthesizer"])
+    return builder.build()
+# No need to rebuild models since we're using Any types
+# The models will work correctly with arbitrary_types_allowed=True

src/agent_factory/judges.py CHANGED Viewed

@@ -9,7 +9,7 @@ from huggingface_hub import InferenceClient
 from pydantic_ai import Agent
 from pydantic_ai.models.anthropic import AnthropicModel
 from pydantic_ai.models.huggingface import HuggingFaceModel
-from pydantic_ai.models.openai import OpenAIModel
 from pydantic_ai.providers.anthropic import AnthropicProvider
 from pydantic_ai.providers.huggingface import HuggingFaceProvider
 from pydantic_ai.providers.openai import OpenAIProvider
@@ -40,15 +40,21 @@ def get_model() -> Any:
     if llm_provider == "huggingface":
         # Free tier - uses HF_TOKEN from environment if available
-        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
         hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)
-    if llm_provider != "openai":
-        logger.warning("Unknown LLM provider, defaulting to OpenAI", provider=llm_provider)
-    openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
-    return OpenAIModel(settings.openai_model, provider=openai_provider)
 class JudgeHandler:
@@ -359,6 +365,15 @@ IMPORTANT: Respond with ONLY valid JSON matching this schema:
         )
 class MockJudgeHandler:
     """
     Mock JudgeHandler for demo mode without LLM calls.

 from pydantic_ai import Agent
 from pydantic_ai.models.anthropic import AnthropicModel
 from pydantic_ai.models.huggingface import HuggingFaceModel
+from pydantic_ai.models.openai import OpenAIChatModel as OpenAIModel
 from pydantic_ai.providers.anthropic import AnthropicProvider
 from pydantic_ai.providers.huggingface import HuggingFaceProvider
 from pydantic_ai.providers.openai import OpenAIProvider
     if llm_provider == "huggingface":
         # Free tier - uses HF_TOKEN from environment if available
+        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
         hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)
+    if llm_provider == "openai":
+        openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
+        return OpenAIModel(settings.openai_model, provider=openai_provider)
+    # Default to HuggingFace if provider is unknown or not specified
+    if llm_provider != "huggingface":
+        logger.warning("Unknown LLM provider, defaulting to HuggingFace", provider=llm_provider)
+    model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
+    hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
+    return HuggingFaceModel(model_name, provider=hf_provider)
 class JudgeHandler:
         )
+def create_judge_handler() -> JudgeHandler:
+    """Create a judge handler based on configuration.
+    Returns:
+        Configured JudgeHandler instance
+    """
+    return JudgeHandler()
 class MockJudgeHandler:
     """
     Mock JudgeHandler for demo mode without LLM calls.

src/agents/code_executor_agent.py CHANGED Viewed

@@ -1,13 +1,13 @@
 """Code execution agent using Modal."""
 import asyncio
 import structlog
 from agent_framework import ChatAgent, ai_function
-from agent_framework.openai import OpenAIChatClient
 from src.tools.code_execution import get_code_executor
-from src.utils.config import settings
 logger = structlog.get_logger()
@@ -40,19 +40,17 @@ async def execute_python_code(code: str) -> str:
         return f"Execution failed: {e}"
-def create_code_executor_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
     """Create a code executor agent.
     Args:
-        chat_client: Optional custom chat client.
     Returns:
         ChatAgent configured for code execution.
     """
-    client = chat_client or OpenAIChatClient(
-        model_id=settings.openai_model,
-        api_key=settings.openai_api_key,
-    )
     return ChatAgent(
         name="CodeExecutorAgent",

 """Code execution agent using Modal."""
 import asyncio
+from typing import Any
 import structlog
 from agent_framework import ChatAgent, ai_function
 from src.tools.code_execution import get_code_executor
+from src.utils.llm_factory import get_chat_client_for_agent
 logger = structlog.get_logger()
         return f"Execution failed: {e}"
+def create_code_executor_agent(chat_client: Any | None = None) -> ChatAgent:
     """Create a code executor agent.
     Args:
+        chat_client: Optional custom chat client. If None, uses factory default
+                    (HuggingFace preferred, OpenAI fallback).
     Returns:
         ChatAgent configured for code execution.
     """
+    client = chat_client or get_chat_client_for_agent()
     return ChatAgent(
         name="CodeExecutorAgent",

src/agents/input_parser.py ADDED Viewed

	@@ -0,0 +1,178 @@

+"""Input parser agent for analyzing and improving user queries.
+Determines research mode (iterative vs deep) and extracts key information
+from user queries to improve research quality.
+"""
+from typing import TYPE_CHECKING, Any, Literal
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError, JudgeError
+from src.utils.models import ParsedQuery
+if TYPE_CHECKING:
+    pass
+logger = structlog.get_logger()
+# System prompt for the input parser agent
+SYSTEM_PROMPT = """
+You are an expert research query analyzer. Your job is to analyze user queries and determine:
+1. Whether the query requires iterative research (single focused question) or deep research (multiple sections/topics)
+2. Improve and refine the query for better research results
+3. Extract key entities (drugs, diseases, targets, companies, etc.)
+4. Extract specific research questions
+Guidelines for determining research mode:
+- **Iterative mode**: Single focused question, straightforward research goal, can be answered with a focused search loop
+  Examples: "What is the mechanism of metformin?", "Find clinical trials for drug X"
+- **Deep mode**: Complex query requiring multiple sections, comprehensive report, multiple related topics
+  Examples: "Write a comprehensive report on diabetes treatment", "Analyze the market for quantum computing"
+  Indicators: words like "comprehensive", "report", "sections", "analyze", "market analysis", "overview"
+Your output must be valid JSON matching the ParsedQuery schema. Always provide:
+- original_query: The exact input query
+- improved_query: A refined, clearer version of the query
+- research_mode: Either "iterative" or "deep"
+- key_entities: List of important entities (drugs, diseases, companies, etc.)
+- research_questions: List of specific questions to answer
+Only output JSON. Do not output anything else.
+"""
+class InputParserAgent:
+    """
+    Input parser agent that analyzes queries and determines research mode.
+    Uses Pydantic AI to generate structured ParsedQuery output with research
+    mode detection, query improvement, and entity extraction.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the input parser agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=ParsedQuery,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def parse(self, query: str) -> ParsedQuery:
+        """
+        Parse and analyze a user query.
+        Args:
+            query: The user's research query
+        Returns:
+            ParsedQuery with research mode, improved query, entities, and questions
+        Raises:
+            JudgeError: If parsing fails after retries
+            ConfigurationError: If agent configuration is invalid
+        """
+        self.logger.info("Parsing user query", query=query[:100])
+        user_message = f"QUERY: {query}"
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            parsed_query = result.output
+            # Validate parsed query
+            if not parsed_query.original_query:
+                self.logger.warning("Parsed query missing original_query", query=query[:100])
+                raise JudgeError("Parsed query must have original_query")
+            if not parsed_query.improved_query:
+                self.logger.warning("Parsed query missing improved_query", query=query[:100])
+                # Use original as fallback
+                parsed_query = ParsedQuery(
+                    original_query=parsed_query.original_query,
+                    improved_query=parsed_query.original_query,
+                    research_mode=parsed_query.research_mode,
+                    key_entities=parsed_query.key_entities,
+                    research_questions=parsed_query.research_questions,
+                )
+            self.logger.info(
+                "Query parsed successfully",
+                mode=parsed_query.research_mode,
+                entities=len(parsed_query.key_entities),
+                questions=len(parsed_query.research_questions),
+            )
+            return parsed_query
+        except Exception as e:
+            self.logger.error("Query parsing failed", error=str(e), query=query[:100])
+            # Fallback: return basic parsed query with heuristic mode detection
+            if isinstance(e, JudgeError | ConfigurationError):
+                raise
+            # Heuristic fallback
+            query_lower = query.lower()
+            research_mode: Literal["iterative", "deep"] = "iterative"
+            if any(
+                keyword in query_lower
+                for keyword in [
+                    "comprehensive",
+                    "report",
+                    "sections",
+                    "analyze",
+                    "analysis",
+                    "overview",
+                    "market",
+                ]
+            ):
+                research_mode = "deep"
+            return ParsedQuery(
+                original_query=query,
+                improved_query=query,
+                research_mode=research_mode,
+                key_entities=[],
+                research_questions=[],
+            )
+def create_input_parser_agent(model: Any | None = None) -> InputParserAgent:
+    """
+    Factory function to create an input parser agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured InputParserAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        # Get model from settings if not provided
+        if model is None:
+            model = get_model()
+        # Create and return input parser agent
+        return InputParserAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create input parser agent", error=str(e))
+        raise ConfigurationError(f"Failed to create input parser agent: {e}") from e

src/agents/judge_agent.py CHANGED Viewed

@@ -12,7 +12,7 @@ from agent_framework import (
     Role,
 )
-from src.orchestrator import JudgeHandlerProtocol
 from src.utils.models import Evidence, JudgeAssessment

     Role,
 )
+from src.legacy_orchestrator import JudgeHandlerProtocol
 from src.utils.models import Evidence, JudgeAssessment

src/agents/knowledge_gap.py ADDED Viewed

	@@ -0,0 +1,156 @@

+"""Knowledge gap agent for evaluating research completeness.
+Converts the folder/knowledge_gap_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import KnowledgeGapOutput
+logger = structlog.get_logger()
+# System prompt for the knowledge gap agent
+SYSTEM_PROMPT = f"""
+You are a Research State Evaluator. Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+Your job is to critically analyze the current state of a research report,
+identify what knowledge gaps still exist and determine the best next step to take.
+You will be given:
+1. The original user query and any relevant background context to the query
+2. A full history of the tasks, actions, findings and thoughts you've made up until this point in the research process
+Your task is to:
+1. Carefully review the findings and thoughts, particularly from the latest iteration, and assess their completeness in answering the original query
+2. Determine if the findings are sufficiently complete to end the research loop
+3. If not, identify up to 3 knowledge gaps that need to be addressed in sequence in order to continue with research - these should be relevant to the original query
+Be specific in the gaps you identify and include relevant information as this will be passed onto another agent to process without additional context.
+Only output JSON. Follow the JSON schema for KnowledgeGapOutput. Do not output anything else.
+"""
+class KnowledgeGapAgent:
+    """
+    Agent that evaluates research state and identifies knowledge gaps.
+    Uses Pydantic AI to generate structured KnowledgeGapOutput indicating
+    whether research is complete and what gaps remain.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the knowledge gap agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=KnowledgeGapOutput,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def evaluate(
+        self,
+        query: str,
+        background_context: str = "",
+        conversation_history: str = "",
+        iteration: int = 0,
+        time_elapsed_minutes: float = 0.0,
+        max_time_minutes: int = 10,
+    ) -> KnowledgeGapOutput:
+        """
+        Evaluate research state and identify knowledge gaps.
+        Args:
+            query: The original research query
+            background_context: Optional background context
+            conversation_history: History of actions, findings, and thoughts
+            iteration: Current iteration number
+            time_elapsed_minutes: Time elapsed so far
+            max_time_minutes: Maximum time allowed
+        Returns:
+            KnowledgeGapOutput with research completeness and outstanding gaps
+        Raises:
+            JudgeError: If evaluation fails after retries
+        """
+        self.logger.info(
+            "Evaluating knowledge gaps",
+            query=query[:100],
+            iteration=iteration,
+        )
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        user_message = f"""
+Current Iteration Number: {iteration}
+Time Elapsed: {time_elapsed_minutes:.2f} minutes of maximum {max_time_minutes} minutes
+ORIGINAL QUERY:
+{query}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            evaluation = result.output
+            self.logger.info(
+                "Knowledge gap evaluation complete",
+                research_complete=evaluation.research_complete,
+                gaps_count=len(evaluation.outstanding_gaps),
+            )
+            return evaluation
+        except Exception as e:
+            self.logger.error("Knowledge gap evaluation failed", error=str(e))
+            # Return fallback: research not complete, suggest continuing
+            return KnowledgeGapOutput(
+                research_complete=False,
+                outstanding_gaps=[f"Continue research on: {query}"],
+            )
+def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent:
+    """
+    Factory function to create a knowledge gap agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured KnowledgeGapAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return KnowledgeGapAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create knowledge gap agent", error=str(e))
+        raise ConfigurationError(f"Failed to create knowledge gap agent: {e}") from e

src/agents/long_writer.py ADDED Viewed

	@@ -0,0 +1,431 @@

+"""Long writer agent for iteratively writing report sections.
+Converts the folder/long_writer_agent.py implementation to use Pydantic AI.
+"""
+import re
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic import BaseModel, Field
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import ReportDraft
+logger = structlog.get_logger()
+# LongWriterOutput model for structured output
+class LongWriterOutput(BaseModel):
+    """Output from the long writer agent for a single section."""
+    next_section_markdown: str = Field(
+        description="The final draft of the next section in markdown format"
+    )
+    references: list[str] = Field(
+        description="A list of URLs and their corresponding reference numbers for the section"
+    )
+    model_config = {"frozen": True}
+# System prompt for the long writer agent
+SYSTEM_PROMPT = f"""
+You are an expert report writer tasked with iteratively writing each section of a report.
+Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+You will be provided with:
+1. The original research query
+2. A final draft of the report containing the table of contents and all sections written up until this point (in the first iteration there will be no sections written yet)
+3. A first draft of the next section of the report to be written
+OBJECTIVE:
+1. Write a final draft of the next section of the report with numbered citations in square brackets in the body of the report
+2. Produce a list of references to be appended to the end of the report
+CITATIONS/REFERENCES:
+The citations should be in numerical order, written in numbered square brackets in the body of the report.
+Separately, a list of all URLs and their corresponding reference numbers will be included at the end of the report.
+Follow the example below for formatting.
+LongWriterOutput(
+    next_section_markdown="The company specializes in IT consulting [1]. It operates in the software services market which is expected to grow at 10% per year [2].",
+    references=["[1] https://example.com/first-source-url", "[2] https://example.com/second-source-url"]
+)
+GUIDELINES:
+- You can reformat and reorganize the flow of the content and headings within a section to flow logically, but DO NOT remove details that were included in the first draft
+- Only remove text from the first draft if it is already mentioned earlier in the report, or if it should be covered in a later section per the table of contents
+- Ensure the heading for the section matches the table of contents
+- Format the final output and references section as markdown
+- Do not include a title for the reference section, just a list of numbered references
+Only output JSON. Follow the JSON schema for LongWriterOutput. Do not output anything else.
+"""
+class LongWriterAgent:
+    """
+    Agent that iteratively writes report sections with proper citations.
+    Uses Pydantic AI to generate structured LongWriterOutput for each section.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the long writer agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=LongWriterOutput,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def write_next_section(
+        self,
+        original_query: str,
+        report_draft: str,
+        next_section_title: str,
+        next_section_draft: str,
+    ) -> LongWriterOutput:
+        """
+        Write the next section of the report.
+        Args:
+            original_query: The original research query
+            report_draft: Current report draft (all sections written so far)
+            next_section_title: Title of the section to write
+            next_section_draft: Draft content for the next section
+        Returns:
+            LongWriterOutput with formatted section and references
+        Raises:
+            ConfigurationError: If writing fails
+        """
+        # Input validation
+        if not original_query or not original_query.strip():
+            self.logger.warning("Empty query provided, using default")
+            original_query = "Research query"
+        if not next_section_title or not next_section_title.strip():
+            self.logger.warning("Empty section title provided, using default")
+            next_section_title = "Section"
+        if next_section_draft is None:
+            next_section_draft = ""
+        if report_draft is None:
+            report_draft = ""
+        # Truncate very long inputs
+        max_draft_length = 30000
+        if len(report_draft) > max_draft_length:
+            self.logger.warning(
+                "Report draft too long, truncating",
+                original_length=len(report_draft),
+            )
+            report_draft = report_draft[:max_draft_length] + "\n\n[Content truncated]"
+        if len(next_section_draft) > max_draft_length:
+            self.logger.warning(
+                "Section draft too long, truncating",
+                original_length=len(next_section_draft),
+            )
+            next_section_draft = next_section_draft[:max_draft_length] + "\n\n[Content truncated]"
+        self.logger.info(
+            "Writing next section",
+            section_title=next_section_title,
+            query=original_query[:100],
+        )
+        user_message = f"""
+<ORIGINAL QUERY>
+{original_query}
+</ORIGINAL QUERY>
+<CURRENT REPORT DRAFT>
+{report_draft or "No draft yet"}
+</CURRENT REPORT DRAFT>
+<TITLE OF NEXT SECTION TO WRITE>
+{next_section_title}
+</TITLE OF NEXT SECTION TO WRITE>
+<DRAFT OF NEXT SECTION>
+{next_section_draft}
+</DRAFT OF NEXT SECTION>
+"""
+        # Retry logic for transient failures
+        max_retries = 3
+        last_exception: Exception | None = None
+        for attempt in range(max_retries):
+            try:
+                # Run the agent
+                result = await self.agent.run(user_message)
+                output = result.output
+                # Validate output
+                if not output or not isinstance(output, LongWriterOutput):
+                    raise ValueError("Invalid output format")
+                if not output.next_section_markdown or not output.next_section_markdown.strip():
+                    self.logger.warning("Empty section generated, using fallback")
+                    raise ValueError("Empty section generated")
+                self.logger.info(
+                    "Section written",
+                    section_title=next_section_title,
+                    references_count=len(output.references),
+                    attempt=attempt + 1,
+                )
+                return output
+            except (TimeoutError, ConnectionError) as e:
+                # Transient errors - retry
+                last_exception = e
+                if attempt < max_retries - 1:
+                    self.logger.warning(
+                        "Transient error, retrying",
+                        error=str(e),
+                        attempt=attempt + 1,
+                        max_retries=max_retries,
+                    )
+                    continue
+                else:
+                    self.logger.error("Max retries exceeded for transient error", error=str(e))
+                    break
+            except Exception as e:
+                # Non-transient errors - don't retry
+                last_exception = e
+                self.logger.error(
+                    "Section writing failed",
+                    error=str(e),
+                    error_type=type(e).__name__,
+                )
+                break
+        # Return fallback section if all attempts failed
+        self.logger.error(
+            "Section writing failed after all attempts",
+            error=str(last_exception) if last_exception else "Unknown error",
+        )
+        return LongWriterOutput(
+            next_section_markdown=f"## {next_section_title}\n\n{next_section_draft}",
+            references=[],
+        )
+    async def write_report(
+        self,
+        original_query: str,
+        report_title: str,
+        report_draft: ReportDraft,
+    ) -> str:
+        """
+        Write the final report by iteratively writing each section.
+        Args:
+            original_query: The original research query
+            report_title: Title of the report
+            report_draft: ReportDraft with all sections
+        Returns:
+            Complete markdown report string
+        Raises:
+            ConfigurationError: If writing fails
+        """
+        # Input validation
+        if not original_query or not original_query.strip():
+            self.logger.warning("Empty query provided, using default")
+            original_query = "Research query"
+        if not report_title or not report_title.strip():
+            self.logger.warning("Empty report title provided, using default")
+            report_title = "Research Report"
+        if not report_draft or not report_draft.sections:
+            self.logger.warning("Empty report draft provided, returning minimal report")
+            return f"# {report_title}\n\n## Query\n{original_query}\n\n*No sections available.*"
+        self.logger.info(
+            "Writing full report",
+            report_title=report_title,
+            sections_count=len(report_draft.sections),
+        )
+        # Initialize the final draft with title and table of contents
+        final_draft = (
+            f"# {report_title}\n\n## Table of Contents\n\n"
+            + "\n".join(
+                [
+                    f"{i + 1}. {section.section_title}"
+                    for i, section in enumerate(report_draft.sections)
+                ]
+            )
+            + "\n\n"
+        )
+        all_references: list[str] = []
+        for section in report_draft.sections:
+            # Write each section
+            next_section_output = await self.write_next_section(
+                original_query,
+                final_draft,
+                section.section_title,
+                section.section_content,
+            )
+            # Reformat references and update section markdown
+            section_markdown, all_references = self._reformat_references(
+                next_section_output.next_section_markdown,
+                next_section_output.references,
+                all_references,
+            )
+            # Reformat section headings
+            section_markdown = self._reformat_section_headings(section_markdown)
+            # Add to final draft
+            final_draft += section_markdown + "\n\n"
+        # Add final references
+        final_draft += "## References:\n\n" + "  \n".join(all_references)
+        self.logger.info("Full report written", length=len(final_draft))
+        return final_draft
+    def _reformat_references(
+        self,
+        section_markdown: str,
+        section_references: list[str],
+        all_references: list[str],
+    ) -> tuple[str, list[str]]:
+        """
+        Reformat references: re-number, de-duplicate, and update markdown.
+        Args:
+            section_markdown: Markdown content with inline references [1], [2]
+            section_references: List of references for this section
+            all_references: Accumulated references from previous sections
+        Returns:
+            Tuple of (updated markdown, updated all_references)
+        """
+        # Convert reference lists to maps (URL -> ref_num)
+        def convert_ref_list_to_map(ref_list: list[str]) -> dict[str, int]:
+            ref_map: dict[str, int] = {}
+            for ref in ref_list:
+                try:
+                    # Parse "[1] https://example.com" format
+                    parts = ref.split("]", 1)
+                    if len(parts) == 2:
+                        ref_num = int(parts[0].strip("["))
+                        url = parts[1].strip()
+                        ref_map[url] = ref_num
+                except (ValueError, IndexError):
+                    logger.warning("Invalid reference format", ref=ref)
+                    continue
+            return ref_map
+        section_ref_map = convert_ref_list_to_map(section_references)
+        report_ref_map = convert_ref_list_to_map(all_references)
+        section_to_report_ref_map: dict[int, int] = {}
+        report_urls = set(report_ref_map.keys())
+        ref_count = max(report_ref_map.values() or [0])
+        # Map section references to report references
+        for url, section_ref_num in section_ref_map.items():
+            if url in report_urls:
+                # URL already exists - reuse its reference number
+                section_to_report_ref_map[section_ref_num] = report_ref_map[url]
+            else:
+                # New URL - assign next reference number
+                ref_count += 1
+                section_to_report_ref_map[section_ref_num] = ref_count
+                all_references.append(f"[{ref_count}] {url}")
+        # Replace reference numbers in markdown
+        def replace_reference(match: re.Match[str]) -> str:
+            ref_num = int(match.group(1))
+            mapped_ref_num = section_to_report_ref_map.get(ref_num)
+            if mapped_ref_num:
+                return f"[{mapped_ref_num}]"
+            return ""
+        updated_markdown = re.sub(r"\[(\d+)\]", replace_reference, section_markdown)
+        return updated_markdown, all_references
+    def _reformat_section_headings(self, section_markdown: str) -> str:
+        """
+        Reformat section headings to be consistent (level-2 for main heading).
+        Args:
+            section_markdown: Markdown content with headings
+        Returns:
+            Updated markdown with adjusted heading levels
+        """
+        if not section_markdown.strip():
+            return section_markdown
+        # Find first heading level
+        first_heading_match = re.search(r"^(#+)\s", section_markdown, re.MULTILINE)
+        if not first_heading_match:
+            return section_markdown
+        # Calculate level adjustment needed (target is level 2)
+        first_heading_level = len(first_heading_match.group(1))
+        level_adjustment = 2 - first_heading_level
+        def adjust_heading_level(match: re.Match[str]) -> str:
+            hashes = match.group(1)
+            content = match.group(2)
+            new_level = max(2, len(hashes) + level_adjustment)
+            return "#" * new_level + " " + content
+        # Apply heading adjustment
+        return re.sub(r"^(#+)\s(.+)$", adjust_heading_level, section_markdown, flags=re.MULTILINE)
+def create_long_writer_agent(model: Any | None = None) -> LongWriterAgent:
+    """
+    Factory function to create a long writer agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured LongWriterAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return LongWriterAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create long writer agent", error=str(e))
+        raise ConfigurationError(f"Failed to create long writer agent: {e}") from e

src/agents/magentic_agents.py CHANGED Viewed

@@ -1,7 +1,8 @@
 """Magentic-compatible agents using ChatAgent pattern."""
 from agent_framework import ChatAgent
-from agent_framework.openai import OpenAIChatClient
 from src.agents.tools import (
     get_bibliography,
@@ -9,22 +10,20 @@ from src.agents.tools import (
     search_preprints,
     search_pubmed,
 )
-from src.utils.config import settings
-def create_search_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
     """Create a search agent with internal LLM and search tools.
     Args:
-        chat_client: Optional custom chat client. If None, uses default.
     Returns:
         ChatAgent configured for biomedical search
     """
-    client = chat_client or OpenAIChatClient(
-        model_id=settings.openai_model,  # Use configured model
-        api_key=settings.openai_api_key,
-    )
     return ChatAgent(
         name="SearchAgent",
@@ -50,19 +49,17 @@ Focus on finding: mechanisms of action, clinical evidence, and specific drug can
     )
-def create_judge_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
     """Create a judge agent that evaluates evidence quality.
     Args:
-        chat_client: Optional custom chat client. If None, uses default.
     Returns:
         ChatAgent configured for evidence assessment
     """
-    client = chat_client or OpenAIChatClient(
-        model_id=settings.openai_model,
-        api_key=settings.openai_api_key,
-    )
     return ChatAgent(
         name="JudgeAgent",
@@ -89,19 +86,17 @@ Be rigorous but fair. Look for:
     )
-def create_hypothesis_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
     """Create a hypothesis generation agent.
     Args:
-        chat_client: Optional custom chat client. If None, uses default.
     Returns:
         ChatAgent configured for hypothesis generation
     """
-    client = chat_client or OpenAIChatClient(
-        model_id=settings.openai_model,
-        api_key=settings.openai_api_key,
-    )
     return ChatAgent(
         name="HypothesisAgent",
@@ -126,19 +121,17 @@ Focus on mechanistic plausibility and existing evidence.""",
     )
-def create_report_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
     """Create a report synthesis agent.
     Args:
-        chat_client: Optional custom chat client. If None, uses default.
     Returns:
         ChatAgent configured for report generation
     """
-    client = chat_client or OpenAIChatClient(
-        model_id=settings.openai_model,
-        api_key=settings.openai_api_key,
-    )
     return ChatAgent(
         name="ReportAgent",

 """Magentic-compatible agents using ChatAgent pattern."""
+from typing import Any
 from agent_framework import ChatAgent
 from src.agents.tools import (
     get_bibliography,
     search_preprints,
     search_pubmed,
 )
+from src.utils.llm_factory import get_chat_client_for_agent
+def create_search_agent(chat_client: Any | None = None) -> ChatAgent:
     """Create a search agent with internal LLM and search tools.
     Args:
+        chat_client: Optional custom chat client. If None, uses factory default
+                    (HuggingFace preferred, OpenAI fallback).
     Returns:
         ChatAgent configured for biomedical search
     """
+    client = chat_client or get_chat_client_for_agent()
     return ChatAgent(
         name="SearchAgent",
     )
+def create_judge_agent(chat_client: Any | None = None) -> ChatAgent:
     """Create a judge agent that evaluates evidence quality.
     Args:
+        chat_client: Optional custom chat client. If None, uses factory default
+                    (HuggingFace preferred, OpenAI fallback).
     Returns:
         ChatAgent configured for evidence assessment
     """
+    client = chat_client or get_chat_client_for_agent()
     return ChatAgent(
         name="JudgeAgent",
     )
+def create_hypothesis_agent(chat_client: Any | None = None) -> ChatAgent:
     """Create a hypothesis generation agent.
     Args:
+        chat_client: Optional custom chat client. If None, uses factory default
+                    (HuggingFace preferred, OpenAI fallback).
     Returns:
         ChatAgent configured for hypothesis generation
     """
+    client = chat_client or get_chat_client_for_agent()
     return ChatAgent(
         name="HypothesisAgent",
     )
+def create_report_agent(chat_client: Any | None = None) -> ChatAgent:
     """Create a report synthesis agent.
     Args:
+        chat_client: Optional custom chat client. If None, uses factory default
+                    (HuggingFace preferred, OpenAI fallback).
     Returns:
         ChatAgent configured for report generation
     """
+    client = chat_client or get_chat_client_for_agent()
     return ChatAgent(
         name="ReportAgent",

src/agents/proofreader.py ADDED Viewed

	@@ -0,0 +1,205 @@

+"""Proofreader agent for finalizing report drafts.
+Converts the folder/proofreader_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import ReportDraft
+logger = structlog.get_logger()
+# System prompt for the proofreader agent
+SYSTEM_PROMPT = f"""
+You are a research expert who proofreads and edits research reports.
+Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+You are given:
+1. The original query topic for the report
+2. A first draft of the report in ReportDraft format containing each section in sequence
+Your task is to:
+1. **Combine sections:** Concatenate the sections into a single string
+2. **Add section titles:** Add the section titles to the beginning of each section in markdown format, as well as a main title for the report
+3. **De-duplicate:** Remove duplicate content across sections to avoid repetition
+4. **Remove irrelevant sections:** If any sections or sub-sections are completely irrelevant to the query, remove them
+5. **Refine wording:** Edit the wording of the report to be polished, concise and punchy, but **without eliminating any detail** or large chunks of text
+6. **Add a summary:** Add a short report summary / outline to the beginning of the report to provide an overview of the sections and what is discussed
+7. **Preserve sources:** Preserve all sources / references - move the long list of references to the end of the report
+8. **Update reference numbers:** Continue to include reference numbers in square brackets  ([1], [2], [3], etc.) in the main body of the report, but update the numbering to match the new order of references at the end of the report
+9. **Output final report:** Output the final report in markdown format (do not wrap it in a code block)
+Guidelines:
+- Do not add any new facts or data to the report
+- Do not remove any content from the report unless it is very clearly wrong, contradictory or irrelevant
+- Remove or reformat any redundant or excessive headings, and ensure that the final nesting of heading levels is correct
+- Ensure that the final report flows well and has a logical structure
+- Include all sources and references that are present in the final report
+"""
+class ProofreaderAgent:
+    """
+    Agent that proofreads and finalizes report drafts.
+    Uses Pydantic AI to generate polished markdown reports from draft sections.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the proofreader agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent (no structured output - returns markdown text)
+        self.agent = Agent(
+            model=self.model,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def proofread(
+        self,
+        query: str,
+        report_draft: ReportDraft,
+    ) -> str:
+        """
+        Proofread and finalize a report draft.
+        Args:
+            query: The original research query
+            report_draft: ReportDraft with all sections
+        Returns:
+            Final polished markdown report string
+        Raises:
+            ConfigurationError: If proofreading fails
+        """
+        # Input validation
+        if not query or not query.strip():
+            self.logger.warning("Empty query provided, using default")
+            query = "Research query"
+        if not report_draft or not report_draft.sections:
+            self.logger.warning("Empty report draft provided, returning minimal report")
+            return f"# Research Report\n\n## Query\n{query}\n\n*No sections available.*"
+        # Validate section structure
+        valid_sections = []
+        for section in report_draft.sections:
+            if section.section_title and section.section_title.strip():
+                valid_sections.append(section)
+            else:
+                self.logger.warning("Skipping section with empty title")
+        if not valid_sections:
+            self.logger.warning("No valid sections in draft, returning minimal report")
+            return f"# Research Report\n\n## Query\n{query}\n\n*No valid sections available.*"
+        self.logger.info(
+            "Proofreading report",
+            query=query[:100],
+            sections_count=len(valid_sections),
+        )
+        # Create validated draft
+        validated_draft = ReportDraft(sections=valid_sections)
+        user_message = f"""
+QUERY:
+{query}
+REPORT DRAFT:
+{validated_draft.model_dump_json()}
+"""
+        # Retry logic for transient failures
+        max_retries = 3
+        last_exception: Exception | None = None
+        for attempt in range(max_retries):
+            try:
+                # Run the agent
+                result = await self.agent.run(user_message)
+                final_report = result.output
+                # Validate output
+                if not final_report or not final_report.strip():
+                    self.logger.warning("Empty report generated, using fallback")
+                    raise ValueError("Empty report generated")
+                self.logger.info("Report proofread", length=len(final_report), attempt=attempt + 1)
+                return final_report
+            except (TimeoutError, ConnectionError) as e:
+                # Transient errors - retry
+                last_exception = e
+                if attempt < max_retries - 1:
+                    self.logger.warning(
+                        "Transient error, retrying",
+                        error=str(e),
+                        attempt=attempt + 1,
+                        max_retries=max_retries,
+                    )
+                    continue
+                else:
+                    self.logger.error("Max retries exceeded for transient error", error=str(e))
+                    break
+            except Exception as e:
+                # Non-transient errors - don't retry
+                last_exception = e
+                self.logger.error(
+                    "Proofreading failed",
+                    error=str(e),
+                    error_type=type(e).__name__,
+                )
+                break
+        # Return fallback: combine sections manually
+        self.logger.error(
+            "Proofreading failed after all attempts",
+            error=str(last_exception) if last_exception else "Unknown error",
+        )
+        sections = [
+            f"## {section.section_title}\n\n{section.section_content or 'Content unavailable.'}"
+            for section in valid_sections
+        ]
+        return f"# Research Report\n\n## Query\n{query}\n\n" + "\n\n".join(sections)
+def create_proofreader_agent(model: Any | None = None) -> ProofreaderAgent:
+    """
+    Factory function to create a proofreader agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ProofreaderAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return ProofreaderAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create proofreader agent", error=str(e))
+        raise ConfigurationError(f"Failed to create proofreader agent: {e}") from e

src/agents/retrieval_agent.py CHANGED Viewed

@@ -1,12 +1,13 @@
 """Retrieval agent for web search and context management."""
 import structlog
 from agent_framework import ChatAgent, ai_function
-from agent_framework.openai import OpenAIChatClient
-from src.state import get_magentic_state
 from src.tools.web_search import WebSearchTool
-from src.utils.config import settings
 logger = structlog.get_logger()
@@ -56,19 +57,17 @@ async def search_web(query: str, max_results: int = 10) -> str:
     return "\n".join(output)
-def create_retrieval_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
     """Create a retrieval agent.
     Args:
-        chat_client: Optional custom chat client.
     Returns:
         ChatAgent configured for retrieval.
     """
-    client = chat_client or OpenAIChatClient(
-        model_id=settings.openai_model,
-        api_key=settings.openai_api_key,
-    )
     return ChatAgent(
         name="RetrievalAgent",

 """Retrieval agent for web search and context management."""
+from typing import Any
 import structlog
 from agent_framework import ChatAgent, ai_function
+from src.agents.state import get_magentic_state
 from src.tools.web_search import WebSearchTool
+from src.utils.llm_factory import get_chat_client_for_agent
 logger = structlog.get_logger()
     return "\n".join(output)
+def create_retrieval_agent(chat_client: Any | None = None) -> ChatAgent:
     """Create a retrieval agent.
     Args:
+        chat_client: Optional custom chat client. If None, uses factory default
+                    (HuggingFace preferred, OpenAI fallback).
     Returns:
         ChatAgent configured for retrieval.
     """
+    client = chat_client or get_chat_client_for_agent()
     return ChatAgent(
         name="RetrievalAgent",

src/agents/search_agent.py CHANGED Viewed

@@ -10,7 +10,7 @@ from agent_framework import (
     Role,
 )
-from src.orchestrator import SearchHandlerProtocol
 from src.utils.models import Citation, Evidence, SearchResult
 if TYPE_CHECKING:

     Role,
 )
+from src.legacy_orchestrator import SearchHandlerProtocol
 from src.utils.models import Citation, Evidence, SearchResult
 if TYPE_CHECKING:

src/agents/state.py CHANGED Viewed

@@ -1,9 +1,11 @@
 """Thread-safe state management for Magentic agents.
-Uses contextvars to ensure isolation between concurrent requests (e.g., multiple users
-searching simultaneously via Gradio).
 """
 from contextvars import ContextVar
 from typing import TYPE_CHECKING, Any
@@ -15,8 +17,20 @@ if TYPE_CHECKING:
     from src.services.embeddings import EmbeddingService
 class MagenticState(BaseModel):
-    """Mutable state for a Magentic workflow session."""
     evidence: list[Evidence] = Field(default_factory=list)
     # Type as Any to avoid circular imports/runtime resolution issues
@@ -75,14 +89,22 @@ _magentic_state_var: ContextVar[MagenticState | None] = ContextVar("magentic_sta
 def init_magentic_state(embedding_service: "EmbeddingService | None" = None) -> MagenticState:
-    """Initialize a new state for the current context."""
     state = MagenticState(embedding_service=embedding_service)
     _magentic_state_var.set(state)
     return state
 def get_magentic_state() -> MagenticState:
-    """Get the current state. Raises RuntimeError if not initialized."""
     state = _magentic_state_var.get()
     if state is None:
         # Auto-initialize if missing (e.g. during tests or simple scripts)

 """Thread-safe state management for Magentic agents.
+DEPRECATED: This module is deprecated. Use src.middleware.state_machine instead.
+This file is kept for backward compatibility and will be removed in a future version.
 """
+import warnings
 from contextvars import ContextVar
 from typing import TYPE_CHECKING, Any
     from src.services.embeddings import EmbeddingService
+def _deprecation_warning() -> None:
+    """Emit deprecation warning for this module."""
+    warnings.warn(
+        "src.agents.state is deprecated. Use src.middleware.state_machine instead.",
+        DeprecationWarning,
+        stacklevel=3,
+    )
 class MagenticState(BaseModel):
+    """Mutable state for a Magentic workflow session.
+    DEPRECATED: Use WorkflowState from src.middleware.state_machine instead.
+    """
     evidence: list[Evidence] = Field(default_factory=list)
     # Type as Any to avoid circular imports/runtime resolution issues
 def init_magentic_state(embedding_service: "EmbeddingService | None" = None) -> MagenticState:
+    """Initialize a new state for the current context.
+    DEPRECATED: Use init_workflow_state from src.middleware.state_machine instead.
+    """
+    _deprecation_warning()
     state = MagenticState(embedding_service=embedding_service)
     _magentic_state_var.set(state)
     return state
 def get_magentic_state() -> MagenticState:
+    """Get the current state. Raises RuntimeError if not initialized.
+    DEPRECATED: Use get_workflow_state from src.middleware.state_machine instead.
+    """
+    _deprecation_warning()
     state = _magentic_state_var.get()
     if state is None:
         # Auto-initialize if missing (e.g. during tests or simple scripts)

src/agents/thinking.py ADDED Viewed

	@@ -0,0 +1,148 @@

+"""Thinking agent for generating observations and reflections.
+Converts the folder/thinking_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+logger = structlog.get_logger()
+# System prompt for the thinking agent
+SYSTEM_PROMPT = f"""
+You are a research expert who is managing a research process in iterations. Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+You are given:
+1. The original research query along with some supporting background context
+2. A history of the tasks, actions, findings and thoughts you've made up until this point in the research process (on iteration 1 you will be at the start of the research process, so this will be empty)
+Your objective is to reflect on the research process so far and share your latest thoughts.
+Specifically, your thoughts should include reflections on questions such as:
+- What have you learned from the last iteration?
+- What new areas would you like to explore next, or existing topics you'd like to go deeper into?
+- Were you able to retrieve the information you were looking for in the last iteration?
+- If not, should we change our approach or move to the next topic?
+- Is there any info that is contradictory or conflicting?
+Guidelines:
+- Share your stream of consciousness on the above questions as raw text
+- Keep your response concise and informal
+- Focus most of your thoughts on the most recent iteration and how that influences this next iteration
+- Our aim is to do very deep and thorough research - bear this in mind when reflecting on the research process
+- DO NOT produce a draft of the final report. This is not your job.
+- If this is the first iteration (i.e. no data from prior iterations), provide thoughts on what info we need to gather in the first iteration to get started
+"""
+class ThinkingAgent:
+    """
+    Agent that generates observations and reflections on the research process.
+    Uses Pydantic AI to generate unstructured text observations about
+    the current state of research and next steps.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the thinking agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent (no structured output - returns text)
+        self.agent = Agent(
+            model=self.model,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def generate_observations(
+        self,
+        query: str,
+        background_context: str = "",
+        conversation_history: str = "",
+        iteration: int = 1,
+    ) -> str:
+        """
+        Generate observations about the research process.
+        Args:
+            query: The original research query
+            background_context: Optional background context
+            conversation_history: History of actions, findings, and thoughts
+            iteration: Current iteration number
+        Returns:
+            String containing observations and reflections
+        Raises:
+            ConfigurationError: If generation fails
+        """
+        self.logger.info(
+            "Generating observations",
+            query=query[:100],
+            iteration=iteration,
+        )
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        user_message = f"""
+You are starting iteration {iteration} of your research process.
+ORIGINAL QUERY:
+{query}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            observations = result.output
+            self.logger.info("Observations generated", length=len(observations))
+            return observations
+        except Exception as e:
+            self.logger.error("Observation generation failed", error=str(e))
+            # Return fallback observations
+            return f"Starting iteration {iteration}. Need to gather information about: {query}"
+def create_thinking_agent(model: Any | None = None) -> ThinkingAgent:
+    """
+    Factory function to create a thinking agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ThinkingAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return ThinkingAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create thinking agent", error=str(e))
+        raise ConfigurationError(f"Failed to create thinking agent: {e}") from e

src/agents/tool_selector.py ADDED Viewed

	@@ -0,0 +1,168 @@

+"""Tool selector agent for choosing which tools to use for knowledge gaps.
+Converts the folder/tool_selector_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import AgentSelectionPlan
+logger = structlog.get_logger()
+# System prompt for the tool selector agent
+SYSTEM_PROMPT = f"""
+You are a Tool Selector responsible for determining which specialized agents should address a knowledge gap in a research project.
+Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+You will be given:
+1. The original user query
+2. A knowledge gap identified in the research
+3. A full history of the tasks, actions, findings and thoughts you've made up until this point in the research process
+Your task is to decide:
+1. Which specialized agents are best suited to address the gap
+2. What specific queries should be given to the agents (keep this short - 3-6 words)
+Available specialized agents:
+- WebSearchAgent: General web search for broad topics (can be called multiple times with different queries)
+- SiteCrawlerAgent: Crawl the pages of a specific website to retrieve information about it - use this if you want to find out something about a particular company, entity or product
+- RAGAgent: Semantic search within previously collected evidence - use when you need to find information from evidence already gathered in this research session. Best for finding connections, summarizing collected evidence, or retrieving specific details from earlier findings.
+Guidelines:
+- Aim to call at most 3 agents at a time in your final output
+- You can list the WebSearchAgent multiple times with different queries if needed to cover the full scope of the knowledge gap
+- Be specific and concise (3-6 words) with the agent queries - they should target exactly what information is needed
+- If you know the website or domain name of an entity being researched, always include it in the query
+- Use RAGAgent when: (1) You need to search within evidence already collected, (2) You want to find connections between different findings, (3) You need to retrieve specific details from earlier research iterations
+- Use WebSearchAgent or SiteCrawlerAgent when: (1) You need fresh information from the web, (2) You're starting a new research direction, (3) You need information not yet in the collected evidence
+- If a gap doesn't clearly match any agent's capability, default to the WebSearchAgent
+- Use the history of actions / tool calls as a guide - try not to repeat yourself if an approach didn't work previously
+Only output JSON. Follow the JSON schema for AgentSelectionPlan. Do not output anything else.
+"""
+class ToolSelectorAgent:
+    """
+    Agent that selects appropriate tools to address knowledge gaps.
+    Uses Pydantic AI to generate structured AgentSelectionPlan with
+    specific tasks for web search and crawl agents.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the tool selector agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=AgentSelectionPlan,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def select_tools(
+        self,
+        gap: str,
+        query: str,
+        background_context: str = "",
+        conversation_history: str = "",
+    ) -> AgentSelectionPlan:
+        """
+        Select tools to address a knowledge gap.
+        Args:
+            gap: The knowledge gap to address
+            query: The original research query
+            background_context: Optional background context
+            conversation_history: History of actions, findings, and thoughts
+        Returns:
+            AgentSelectionPlan with tasks for selected agents
+        Raises:
+            ConfigurationError: If selection fails
+        """
+        self.logger.info("Selecting tools for gap", gap=gap[:100], query=query[:100])
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        user_message = f"""
+ORIGINAL QUERY:
+{query}
+KNOWLEDGE GAP TO ADDRESS:
+{gap}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            selection_plan = result.output
+            self.logger.info(
+                "Tool selection complete",
+                tasks_count=len(selection_plan.tasks),
+                agents=[task.agent for task in selection_plan.tasks],
+            )
+            return selection_plan
+        except Exception as e:
+            self.logger.error("Tool selection failed", error=str(e))
+            # Return fallback: use web search
+            from src.utils.models import AgentTask
+            return AgentSelectionPlan(
+                tasks=[
+                    AgentTask(
+                        gap=gap,
+                        agent="WebSearchAgent",
+                        query=gap[:50],  # Use gap as query
+                        entity_website=None,
+                    )
+                ]
+            )
+def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent:
+    """
+    Factory function to create a tool selector agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ToolSelectorAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return ToolSelectorAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create tool selector agent", error=str(e))
+        raise ConfigurationError(f"Failed to create tool selector agent: {e}") from e

src/agents/writer.py ADDED Viewed

	@@ -0,0 +1,209 @@

+"""Writer agent for generating final reports from findings.
+Converts the folder/writer_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+logger = structlog.get_logger()
+# System prompt for the writer agent
+SYSTEM_PROMPT = f"""
+You are a senior researcher tasked with comprehensively answering a research query.
+Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+You will be provided with the original query along with research findings put together by a research assistant.
+Your objective is to generate the final response in markdown format.
+The response should be as lengthy and detailed as possible with the information provided, focusing on answering the original query.
+In your final output, include references to the source URLs for all information and data gathered.
+This should be formatted in the form of a numbered square bracket next to the relevant information,
+followed by a list of URLs at the end of the response, per the example below.
+EXAMPLE REFERENCE FORMAT:
+The company has XYZ products [1]. It operates in the software services market which is expected to grow at 10% per year [2].
+References:
+[1] https://example.com/first-source-url
+[2] https://example.com/second-source-url
+GUIDELINES:
+* Answer the query directly, do not include unrelated or tangential information.
+* Adhere to any instructions on the length of your final response if provided in the user prompt.
+* If any additional guidelines are provided in the user prompt, follow them exactly and give them precedence over these system instructions.
+"""
+class WriterAgent:
+    """
+    Agent that generates final reports from research findings.
+    Uses Pydantic AI to generate markdown reports with citations.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the writer agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent (no structured output - returns markdown text)
+        self.agent = Agent(
+            model=self.model,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def write_report(
+        self,
+        query: str,
+        findings: str,
+        output_length: str = "",
+        output_instructions: str = "",
+    ) -> str:
+        """
+        Write a final report from findings.
+        Args:
+            query: The original research query
+            findings: All findings collected during research
+            output_length: Optional description of desired output length
+            output_instructions: Optional additional instructions
+        Returns:
+            Markdown formatted report string
+        Raises:
+            ConfigurationError: If writing fails
+        """
+        # Input validation
+        if not query or not query.strip():
+            self.logger.warning("Empty query provided, using default")
+            query = "Research query"
+        if findings is None:
+            self.logger.warning("None findings provided, using empty string")
+            findings = "No findings available."
+        # Truncate very long inputs to prevent context overflow
+        max_findings_length = 50000  # ~12k tokens
+        if len(findings) > max_findings_length:
+            self.logger.warning(
+                "Findings too long, truncating",
+                original_length=len(findings),
+                truncated_length=max_findings_length,
+            )
+            findings = findings[:max_findings_length] + "\n\n[Content truncated due to length]"
+        self.logger.info("Writing final report", query=query[:100], findings_length=len(findings))
+        length_str = (
+            f"* The full response should be approximately {output_length}.\n"
+            if output_length
+            else ""
+        )
+        instructions_str = f"* {output_instructions}" if output_instructions else ""
+        guidelines_str = (
+            ("\n\nGUIDELINES:\n" + length_str + instructions_str).strip("\n")
+            if length_str or instructions_str
+            else ""
+        )
+        user_message = f"""
+Provide a response based on the query and findings below with as much detail as possible. {guidelines_str}
+QUERY: {query}
+FINDINGS:
+{findings}
+"""
+        # Retry logic for transient failures
+        max_retries = 3
+        last_exception: Exception | None = None
+        for attempt in range(max_retries):
+            try:
+                # Run the agent
+                result = await self.agent.run(user_message)
+                report = result.output
+                # Validate output
+                if not report or not report.strip():
+                    self.logger.warning("Empty report generated, using fallback")
+                    raise ValueError("Empty report generated")
+                self.logger.info("Report written", length=len(report), attempt=attempt + 1)
+                return report
+            except (TimeoutError, ConnectionError) as e:
+                # Transient errors - retry
+                last_exception = e
+                if attempt < max_retries - 1:
+                    self.logger.warning(
+                        "Transient error, retrying",
+                        error=str(e),
+                        attempt=attempt + 1,
+                        max_retries=max_retries,
+                    )
+                    continue
+                else:
+                    self.logger.error("Max retries exceeded for transient error", error=str(e))
+                    break
+            except Exception as e:
+                # Non-transient errors - don't retry
+                last_exception = e
+                self.logger.error(
+                    "Report writing failed", error=str(e), error_type=type(e).__name__
+                )
+                break
+        # Return fallback report if all attempts failed
+        self.logger.error(
+            "Report writing failed after all attempts",
+            error=str(last_exception) if last_exception else "Unknown error",
+        )
+        # Truncate findings in fallback if too long
+        fallback_findings = findings[:500] + "..." if len(findings) > 500 else findings
+        return (
+            f"# Research Report\n\n"
+            f"## Query\n{query}\n\n"
+            f"## Findings\n{fallback_findings}\n\n"
+            f"*Note: Report generation encountered an error. This is a fallback report.*"
+        )
+def create_writer_agent(model: Any | None = None) -> WriterAgent:
+    """
+    Factory function to create a writer agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured WriterAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return WriterAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create writer agent", error=str(e))
+        raise ConfigurationError(f"Failed to create writer agent: {e}") from e

src/app.py CHANGED Viewed

@@ -6,8 +6,10 @@ from typing import Any
 import gradio as gr
 from pydantic_ai.models.anthropic import AnthropicModel
-from pydantic_ai.models.openai import OpenAIModel
 from pydantic_ai.providers.anthropic import AnthropicProvider
 from pydantic_ai.providers.openai import OpenAIProvider
 from src.agent_factory.judges import HFInferenceJudgeHandler, JudgeHandler, MockJudgeHandler
@@ -24,7 +26,7 @@ def configure_orchestrator(
     use_mock: bool = False,
     mode: str = "simple",
     user_api_key: str | None = None,
-    api_provider: str = "openai",
 ) -> tuple[Any, str]:
     """
     Create an orchestrator instance.
@@ -33,7 +35,7 @@ def configure_orchestrator(
         use_mock: If True, use MockJudgeHandler (no API key needed)
         mode: Orchestrator mode ("simple" or "advanced")
         user_api_key: Optional user-provided API key (BYOK)
-        api_provider: API provider ("openai" or "anthropic")
     Returns:
         Tuple of (Orchestrator instance, backend_name)
@@ -59,13 +61,17 @@ def configure_orchestrator(
         judge_handler = MockJudgeHandler()
         backend_info = "Mock (Testing)"
-    # 2. Paid API Key (User provided or Env)
     elif (
         user_api_key
         or (api_provider == "openai" and os.getenv("OPENAI_API_KEY"))
         or (api_provider == "anthropic" and os.getenv("ANTHROPIC_API_KEY"))
     ):
-        model: AnthropicModel | OpenAIModel | None = None
         if user_api_key:
             # Validate key/provider match to prevent silent auth failures
             if api_provider == "openai" and user_api_key.startswith("sk-ant-"):
@@ -75,15 +81,19 @@ def configure_orchestrator(
             )
             if api_provider == "anthropic" and is_openai_key:
                 raise ValueError("OpenAI key provided but Anthropic provider selected")
-            if api_provider == "anthropic":
                 anthropic_provider = AnthropicProvider(api_key=user_api_key)
                 model = AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
             elif api_provider == "openai":
                 openai_provider = OpenAIProvider(api_key=user_api_key)
                 model = OpenAIModel(settings.openai_model, provider=openai_provider)
-            backend_info = f"Paid API ({api_provider.upper()})"
         else:
-            backend_info = "Paid API (Env Config)"
         judge_handler = JudgeHandler(model=model)
@@ -107,7 +117,7 @@ async def research_agent(
     history: list[dict[str, Any]],
     mode: str = "simple",
     api_key: str = "",
-    api_provider: str = "openai",
 ) -> AsyncGenerator[str, None]:
     """
     Gradio chat function that runs the research agent.
@@ -117,7 +127,7 @@ async def research_agent(
         history: Chat history (Gradio format)
         mode: Orchestrator mode ("simple" or "advanced")
         api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
-        api_provider: API provider ("openai" or "anthropic")
     Yields:
         Markdown-formatted responses for streaming
@@ -130,6 +140,7 @@ async def research_agent(
     user_api_key = api_key.strip() if api_key else None
     # Check available keys
     has_openai = bool(os.getenv("OPENAI_API_KEY"))
     has_anthropic = bool(os.getenv("ANTHROPIC_API_KEY"))
     has_user_key = bool(user_api_key)
@@ -149,11 +160,11 @@ async def research_agent(
             f"🔑 **Using your {api_provider.upper()} API key** - "
             "Your key is used only for this session and is never stored.\n\n"
         )
-    elif not has_paid_key:
-        # No paid keys - will use FREE HuggingFace Inference
         yield (
             "🤗 **Free Tier**: Using HuggingFace Inference (Llama 3.1 / Mistral) for AI analysis.\n"
-            "For premium models, enter an OpenAI or Anthropic API key below.\n\n"
         )
     # Run the agent and stream events
@@ -232,8 +243,7 @@ def create_demo() -> gr.ChatInterface:
                 value="simple",
                 label="Orchestrator Mode",
                 info=(
-                    "Simple: Linear (Free Tier Friendly) | "
-                    "Advanced: Multi-Agent (Requires OpenAI)"
                 ),
             ),
             gr.Textbox(
@@ -243,10 +253,10 @@ def create_demo() -> gr.ChatInterface:
                 info="Enter your own API key. Never stored.",
             ),
             gr.Radio(
-                choices=["openai", "anthropic"],
-                value="openai",
                 label="API Provider",
-                info="Select the provider for your API key",
             ),
         ],
     )

 import gradio as gr
 from pydantic_ai.models.anthropic import AnthropicModel
+from pydantic_ai.models.huggingface import HuggingFaceModel
+from pydantic_ai.models.openai import OpenAIChatModel as OpenAIModel
 from pydantic_ai.providers.anthropic import AnthropicProvider
+from pydantic_ai.providers.huggingface import HuggingFaceProvider
 from pydantic_ai.providers.openai import OpenAIProvider
 from src.agent_factory.judges import HFInferenceJudgeHandler, JudgeHandler, MockJudgeHandler
     use_mock: bool = False,
     mode: str = "simple",
     user_api_key: str | None = None,
+    api_provider: str = "huggingface",
 ) -> tuple[Any, str]:
     """
     Create an orchestrator instance.
         use_mock: If True, use MockJudgeHandler (no API key needed)
         mode: Orchestrator mode ("simple" or "advanced")
         user_api_key: Optional user-provided API key (BYOK)
+        api_provider: API provider ("huggingface", "openai", or "anthropic")
     Returns:
         Tuple of (Orchestrator instance, backend_name)
         judge_handler = MockJudgeHandler()
         backend_info = "Mock (Testing)"
+    # 2. API Key (User provided or Env) - HuggingFace, OpenAI, or Anthropic
     elif (
         user_api_key
+        or (
+            api_provider == "huggingface"
+            and (os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACE_API_KEY"))
+        )
         or (api_provider == "openai" and os.getenv("OPENAI_API_KEY"))
         or (api_provider == "anthropic" and os.getenv("ANTHROPIC_API_KEY"))
     ):
+        model: AnthropicModel | HuggingFaceModel | OpenAIModel | None = None
         if user_api_key:
             # Validate key/provider match to prevent silent auth failures
             if api_provider == "openai" and user_api_key.startswith("sk-ant-"):
             )
             if api_provider == "anthropic" and is_openai_key:
                 raise ValueError("OpenAI key provided but Anthropic provider selected")
+            if api_provider == "huggingface":
+                model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
+                hf_provider = HuggingFaceProvider(api_key=user_api_key)
+                model = HuggingFaceModel(model_name, provider=hf_provider)
+            elif api_provider == "anthropic":
                 anthropic_provider = AnthropicProvider(api_key=user_api_key)
                 model = AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
             elif api_provider == "openai":
                 openai_provider = OpenAIProvider(api_key=user_api_key)
                 model = OpenAIModel(settings.openai_model, provider=openai_provider)
+            backend_info = f"API ({api_provider.upper()})"
         else:
+            backend_info = "API (Env Config)"
         judge_handler = JudgeHandler(model=model)
     history: list[dict[str, Any]],
     mode: str = "simple",
     api_key: str = "",
+    api_provider: str = "huggingface",
 ) -> AsyncGenerator[str, None]:
     """
     Gradio chat function that runs the research agent.
         history: Chat history (Gradio format)
         mode: Orchestrator mode ("simple" or "advanced")
         api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
+        api_provider: API provider ("huggingface", "openai", or "anthropic")
     Yields:
         Markdown-formatted responses for streaming
     user_api_key = api_key.strip() if api_key else None
     # Check available keys
+    has_huggingface = bool(os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACE_API_KEY"))
     has_openai = bool(os.getenv("OPENAI_API_KEY"))
     has_anthropic = bool(os.getenv("ANTHROPIC_API_KEY"))
     has_user_key = bool(user_api_key)
             f"🔑 **Using your {api_provider.upper()} API key** - "
             "Your key is used only for this session and is never stored.\n\n"
         )
+    elif not has_paid_key and not has_huggingface:
+        # No keys at all - will use FREE HuggingFace Inference (public models)
         yield (
             "🤗 **Free Tier**: Using HuggingFace Inference (Llama 3.1 / Mistral) for AI analysis.\n"
+            "For premium models or higher rate limits, enter a HuggingFace, OpenAI, or Anthropic API key below.\n\n"
         )
     # Run the agent and stream events
                 value="simple",
                 label="Orchestrator Mode",
                 info=(
+                    "Simple: Linear (Free Tier Friendly) | Advanced: Multi-Agent (Requires OpenAI)"
                 ),
             ),
             gr.Textbox(
                 info="Enter your own API key. Never stored.",
             ),
             gr.Radio(
+                choices=["huggingface", "openai", "anthropic"],
+                value="huggingface",
                 label="API Provider",
+                info="Select the provider for your API key (HuggingFace is default and free)",
             ),
         ],
     )

src/{orchestrator.py → legacy_orchestrator.py} RENAMED Viewed

File without changes

src/middleware/__init__.py CHANGED Viewed

	@@ -1 +1,30 @@
1	- """Middleware ~~components~~ for ~~orchestration~~.~~"""~~

+"""Middleware for workflow state management, parallel loop coordination, and budget tracking.
+This module provides:
+- WorkflowState: Thread-safe state management using ContextVar
+- WorkflowManager: Coordination of parallel research loops
+- BudgetTracker: Token, time, and iteration budget tracking
+"""
+from src.middleware.budget_tracker import BudgetStatus, BudgetTracker
+from src.middleware.state_machine import (
+    WorkflowState,
+    get_workflow_state,
+    init_workflow_state,
+)
+from src.middleware.workflow_manager import (
+    LoopStatus,
+    ResearchLoop,
+    WorkflowManager,
+)
+__all__ = [
+    "BudgetStatus",
+    "BudgetTracker",
+    "LoopStatus",
+    "ResearchLoop",
+    "WorkflowManager",
+    "WorkflowState",
+    "get_workflow_state",
+    "init_workflow_state",
+]

src/middleware/budget_tracker.py ADDED Viewed

	@@ -0,0 +1,390 @@

+"""Budget tracking for research loops.
+Tracks token usage, time elapsed, and iteration counts per loop and globally.
+Enforces budget constraints to prevent infinite loops and excessive resource usage.
+"""
+import time
+import structlog
+from pydantic import BaseModel, Field
+logger = structlog.get_logger()
+class BudgetStatus(BaseModel):
+    """Status of a budget (tokens, time, iterations)."""
+    tokens_used: int = Field(default=0, description="Total tokens used")
+    tokens_limit: int = Field(default=100000, description="Token budget limit", ge=0)
+    time_elapsed_seconds: float = Field(default=0.0, description="Time elapsed", ge=0.0)
+    time_limit_seconds: float = Field(
+        default=600.0, description="Time budget limit (10 min default)", ge=0.0
+    )
+    iterations: int = Field(default=0, description="Number of iterations completed", ge=0)
+    iterations_limit: int = Field(default=10, description="Maximum iterations", ge=1)
+    iteration_tokens: dict[int, int] = Field(
+        default_factory=dict,
+        description="Tokens used per iteration (iteration number -> token count)",
+    )
+    def is_exceeded(self) -> bool:
+        """Check if any budget limit has been exceeded.
+        Returns:
+            True if any limit is exceeded, False otherwise.
+        """
+        return (
+            self.tokens_used >= self.tokens_limit
+            or self.time_elapsed_seconds >= self.time_limit_seconds
+            or self.iterations >= self.iterations_limit
+        )
+    def remaining_tokens(self) -> int:
+        """Get remaining token budget.
+        Returns:
+            Remaining tokens (may be negative if exceeded).
+        """
+        return self.tokens_limit - self.tokens_used
+    def remaining_time_seconds(self) -> float:
+        """Get remaining time budget.
+        Returns:
+            Remaining time in seconds (may be negative if exceeded).
+        """
+        return self.time_limit_seconds - self.time_elapsed_seconds
+    def remaining_iterations(self) -> int:
+        """Get remaining iteration budget.
+        Returns:
+            Remaining iterations (may be negative if exceeded).
+        """
+        return self.iterations_limit - self.iterations
+    def add_iteration_tokens(self, iteration: int, tokens: int) -> None:
+        """Add tokens for a specific iteration.
+        Args:
+            iteration: Iteration number (1-indexed).
+            tokens: Number of tokens to add.
+        """
+        if iteration not in self.iteration_tokens:
+            self.iteration_tokens[iteration] = 0
+        self.iteration_tokens[iteration] += tokens
+        # Also add to total tokens
+        self.tokens_used += tokens
+    def get_iteration_tokens(self, iteration: int) -> int:
+        """Get tokens used for a specific iteration.
+        Args:
+            iteration: Iteration number.
+        Returns:
+            Token count for the iteration, or 0 if not found.
+        """
+        return self.iteration_tokens.get(iteration, 0)
+class BudgetTracker:
+    """Tracks budgets per loop and globally."""
+    def __init__(self) -> None:
+        """Initialize the budget tracker."""
+        self._budgets: dict[str, BudgetStatus] = {}
+        self._start_times: dict[str, float] = {}
+        self._global_budget: BudgetStatus | None = None
+    def create_budget(
+        self,
+        loop_id: str,
+        tokens_limit: int = 100000,
+        time_limit_seconds: float = 600.0,
+        iterations_limit: int = 10,
+    ) -> BudgetStatus:
+        """Create a budget for a specific loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+            tokens_limit: Maximum tokens allowed.
+            time_limit_seconds: Maximum time allowed in seconds.
+            iterations_limit: Maximum iterations allowed.
+        Returns:
+            The created BudgetStatus instance.
+        """
+        budget = BudgetStatus(
+            tokens_limit=tokens_limit,
+            time_limit_seconds=time_limit_seconds,
+            iterations_limit=iterations_limit,
+        )
+        self._budgets[loop_id] = budget
+        logger.debug(
+            "Budget created",
+            loop_id=loop_id,
+            tokens_limit=tokens_limit,
+            time_limit=time_limit_seconds,
+            iterations_limit=iterations_limit,
+        )
+        return budget
+    def get_budget(self, loop_id: str) -> BudgetStatus | None:
+        """Get the budget for a specific loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            The BudgetStatus instance, or None if not found.
+        """
+        return self._budgets.get(loop_id)
+    def add_tokens(self, loop_id: str, tokens: int) -> None:
+        """Add tokens to a loop's budget.
+        Args:
+            loop_id: Unique identifier for the loop.
+            tokens: Number of tokens to add (can be negative).
+        """
+        if loop_id not in self._budgets:
+            logger.warning("Budget not found for loop", loop_id=loop_id)
+            return
+        self._budgets[loop_id].tokens_used += tokens
+        logger.debug("Tokens added", loop_id=loop_id, tokens=tokens)
+    def add_iteration_tokens(self, loop_id: str, iteration: int, tokens: int) -> None:
+        """Add tokens for a specific iteration.
+        Args:
+            loop_id: Loop identifier.
+            iteration: Iteration number (1-indexed).
+            tokens: Number of tokens to add.
+        """
+        if loop_id not in self._budgets:
+            logger.warning("Budget not found for loop", loop_id=loop_id)
+            return
+        budget = self._budgets[loop_id]
+        budget.add_iteration_tokens(iteration, tokens)
+        logger.debug(
+            "Iteration tokens added",
+            loop_id=loop_id,
+            iteration=iteration,
+            tokens=tokens,
+            total_iteration=budget.get_iteration_tokens(iteration),
+        )
+    def get_iteration_tokens(self, loop_id: str, iteration: int) -> int:
+        """Get tokens used for a specific iteration.
+        Args:
+            loop_id: Loop identifier.
+            iteration: Iteration number.
+        Returns:
+            Token count for the iteration, or 0 if not found.
+        """
+        if loop_id not in self._budgets:
+            return 0
+        return self._budgets[loop_id].get_iteration_tokens(iteration)
+    def start_timer(self, loop_id: str) -> None:
+        """Start the timer for a loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        self._start_times[loop_id] = time.time()
+        logger.debug("Timer started", loop_id=loop_id)
+    def update_timer(self, loop_id: str) -> None:
+        """Update the elapsed time for a loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id not in self._start_times:
+            logger.warning("Timer not started for loop", loop_id=loop_id)
+            return
+        if loop_id not in self._budgets:
+            logger.warning("Budget not found for loop", loop_id=loop_id)
+            return
+        elapsed = time.time() - self._start_times[loop_id]
+        self._budgets[loop_id].time_elapsed_seconds = elapsed
+        logger.debug("Timer updated", loop_id=loop_id, elapsed=elapsed)
+    def increment_iteration(self, loop_id: str) -> None:
+        """Increment the iteration count for a loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id not in self._budgets:
+            logger.warning("Budget not found for loop", loop_id=loop_id)
+            return
+        self._budgets[loop_id].iterations += 1
+        logger.debug(
+            "Iteration incremented",
+            loop_id=loop_id,
+            iterations=self._budgets[loop_id].iterations,
+        )
+    def check_budget(self, loop_id: str) -> tuple[bool, str]:
+        """Check if a loop's budget has been exceeded.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            Tuple of (exceeded: bool, reason: str). Reason is empty if not exceeded.
+        """
+        if loop_id not in self._budgets:
+            return False, ""
+        budget = self._budgets[loop_id]
+        self.update_timer(loop_id)  # Update time before checking
+        if budget.is_exceeded():
+            reasons = []
+            if budget.tokens_used >= budget.tokens_limit:
+                reasons.append("tokens")
+            if budget.time_elapsed_seconds >= budget.time_limit_seconds:
+                reasons.append("time")
+            if budget.iterations >= budget.iterations_limit:
+                reasons.append("iterations")
+            reason = f"Budget exceeded: {', '.join(reasons)}"
+            logger.warning("Budget exceeded", loop_id=loop_id, reason=reason)
+            return True, reason
+        return False, ""
+    def can_continue(self, loop_id: str) -> bool:
+        """Check if a loop can continue based on budget.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            True if the loop can continue, False if budget is exceeded.
+        """
+        exceeded, _ = self.check_budget(loop_id)
+        return not exceeded
+    def get_budget_summary(self, loop_id: str) -> str:
+        """Get a formatted summary of a loop's budget status.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            Formatted string summary.
+        """
+        if loop_id not in self._budgets:
+            return f"Budget not found for loop: {loop_id}"
+        budget = self._budgets[loop_id]
+        self.update_timer(loop_id)
+        return (
+            f"Loop {loop_id}: "
+            f"Tokens: {budget.tokens_used}/{budget.tokens_limit} "
+            f"({budget.remaining_tokens()} remaining), "
+            f"Time: {budget.time_elapsed_seconds:.1f}/{budget.time_limit_seconds:.1f}s "
+            f"({budget.remaining_time_seconds():.1f}s remaining), "
+            f"Iterations: {budget.iterations}/{budget.iterations_limit} "
+            f"({budget.remaining_iterations()} remaining)"
+        )
+    def reset_budget(self, loop_id: str) -> None:
+        """Reset the budget for a loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id in self._budgets:
+            old_budget = self._budgets[loop_id]
+            # Preserve iteration_tokens when resetting
+            old_iteration_tokens = old_budget.iteration_tokens
+            self._budgets[loop_id] = BudgetStatus(
+                tokens_limit=old_budget.tokens_limit,
+                time_limit_seconds=old_budget.time_limit_seconds,
+                iterations_limit=old_budget.iterations_limit,
+                iteration_tokens=old_iteration_tokens,  # Restore old iteration tokens
+            )
+            if loop_id in self._start_times:
+                self._start_times[loop_id] = time.time()
+            logger.debug("Budget reset", loop_id=loop_id)
+    def set_global_budget(
+        self,
+        tokens_limit: int = 100000,
+        time_limit_seconds: float = 600.0,
+        iterations_limit: int = 10,
+    ) -> None:
+        """Set a global budget that applies to all loops.
+        Args:
+            tokens_limit: Maximum tokens allowed globally.
+            time_limit_seconds: Maximum time allowed in seconds.
+            iterations_limit: Maximum iterations allowed globally.
+        """
+        self._global_budget = BudgetStatus(
+            tokens_limit=tokens_limit,
+            time_limit_seconds=time_limit_seconds,
+            iterations_limit=iterations_limit,
+        )
+        logger.debug(
+            "Global budget set",
+            tokens_limit=tokens_limit,
+            time_limit=time_limit_seconds,
+            iterations_limit=iterations_limit,
+        )
+    def get_global_budget(self) -> BudgetStatus | None:
+        """Get the global budget.
+        Returns:
+            The global BudgetStatus instance, or None if not set.
+        """
+        return self._global_budget
+    def add_global_tokens(self, tokens: int) -> None:
+        """Add tokens to the global budget.
+        Args:
+            tokens: Number of tokens to add (can be negative).
+        """
+        if self._global_budget is None:
+            logger.warning("Global budget not set")
+            return
+        self._global_budget.tokens_used += tokens
+        logger.debug("Global tokens added", tokens=tokens)
+    def estimate_tokens(self, text: str) -> int:
+        """Estimate token count from text (rough estimate: ~4 chars per token).
+        Args:
+            text: Text to estimate tokens for.
+        Returns:
+            Estimated token count.
+        """
+        return len(text) // 4
+    def estimate_llm_call_tokens(self, prompt: str, response: str) -> int:
+        """Estimate token count for an LLM call.
+        Args:
+            prompt: The prompt text.
+            response: The response text.
+        Returns:
+            Estimated total token count (prompt + response).
+        """
+        return self.estimate_tokens(prompt) + self.estimate_tokens(response)

src/middleware/state_machine.py ADDED Viewed

	@@ -0,0 +1,129 @@

+"""Thread-safe state management for workflow agents.
+Uses contextvars to ensure isolation between concurrent requests (e.g., multiple users
+searching simultaneously via Gradio). Refactored from MagenticState to support both
+iterative and deep research patterns.
+"""
+from contextvars import ContextVar
+from typing import TYPE_CHECKING, Any
+import structlog
+from pydantic import BaseModel, Field
+from src.utils.models import Citation, Conversation, Evidence
+if TYPE_CHECKING:
+    from src.services.embeddings import EmbeddingService
+logger = structlog.get_logger()
+class WorkflowState(BaseModel):
+    """Mutable state for a workflow session.
+    Supports both iterative and deep research patterns by tracking evidence,
+    conversation history, and providing semantic search capabilities.
+    """
+    evidence: list[Evidence] = Field(default_factory=list)
+    conversation: Conversation = Field(default_factory=Conversation)
+    # Type as Any to avoid circular imports/runtime resolution issues
+    # The actual object injected will be an EmbeddingService instance
+    embedding_service: Any = Field(default=None)
+    model_config = {"arbitrary_types_allowed": True}
+    def add_evidence(self, new_evidence: list[Evidence]) -> int:
+        """Add new evidence, deduplicating by URL.
+        Args:
+            new_evidence: List of Evidence objects to add.
+        Returns:
+            Number of *new* items added (excluding duplicates).
+        """
+        existing_urls = {e.citation.url for e in self.evidence}
+        count = 0
+        for item in new_evidence:
+            if item.citation.url not in existing_urls:
+                self.evidence.append(item)
+                existing_urls.add(item.citation.url)
+                count += 1
+        return count
+    async def search_related(self, query: str, n_results: int = 5) -> list[Evidence]:
+        """Search for semantically related evidence using the embedding service.
+        Args:
+            query: Search query string.
+            n_results: Maximum number of results to return.
+        Returns:
+            List of Evidence objects, ordered by relevance.
+        """
+        if not self.embedding_service:
+            logger.warning("Embedding service not available, returning empty results")
+            return []
+        results = await self.embedding_service.search_similar(query, n_results=n_results)
+        # Convert dict results back to Evidence objects
+        evidence_list = []
+        for item in results:
+            meta = item.get("metadata", {})
+            authors_str = meta.get("authors", "")
+            authors = [a.strip() for a in authors_str.split(",") if a.strip()]
+            ev = Evidence(
+                content=item["content"],
+                citation=Citation(
+                    title=meta.get("title", "Related Evidence"),
+                    url=item["id"],
+                    source="pubmed",  # Defaulting to pubmed if unknown
+                    date=meta.get("date", "n.d."),
+                    authors=authors,
+                ),
+                relevance=max(0.0, 1.0 - item.get("distance", 0.5)),
+            )
+            evidence_list.append(ev)
+        return evidence_list
+# The ContextVar holds the WorkflowState for the current execution context
+_workflow_state_var: ContextVar[WorkflowState | None] = ContextVar("workflow_state", default=None)
+def init_workflow_state(
+    embedding_service: "EmbeddingService | None" = None,
+) -> WorkflowState:
+    """Initialize a new state for the current context.
+    Args:
+        embedding_service: Optional embedding service for semantic search.
+    Returns:
+        The initialized WorkflowState instance.
+    """
+    state = WorkflowState(embedding_service=embedding_service)
+    _workflow_state_var.set(state)
+    logger.debug("Workflow state initialized", has_embeddings=embedding_service is not None)
+    return state
+def get_workflow_state() -> WorkflowState:
+    """Get the current state. Auto-initializes if not set.
+    Returns:
+        The current WorkflowState instance.
+    Raises:
+        RuntimeError: If state is not initialized and auto-initialization fails.
+    """
+    state = _workflow_state_var.get()
+    if state is None:
+        # Auto-initialize if missing (e.g. during tests or simple scripts)
+        logger.debug("Workflow state not found, auto-initializing")
+        return init_workflow_state()
+    return state

src/middleware/sub_iteration.py CHANGED Viewed

@@ -125,8 +125,7 @@ class SubIterationMiddleware:
                     AgentEvent(
                         type="looping",
                         message=(
-                            f"Sub-iteration {i} result insufficient. "
-                            f"Feedback: {feedback[:100]}..."
                         ),
                         iteration=i,
                     )

                     AgentEvent(
                         type="looping",
                         message=(
+                            f"Sub-iteration {i} result insufficient. Feedback: {feedback[:100]}..."
                         ),
                         iteration=i,
                     )

src/middleware/workflow_manager.py ADDED Viewed

	@@ -0,0 +1,322 @@

+"""Workflow manager for coordinating parallel research loops.
+Manages multiple research loops running in parallel, tracks their status,
+and synchronizes evidence between loops and the global state.
+"""
+import asyncio
+from collections.abc import Callable
+from typing import Any, Literal
+import structlog
+from pydantic import BaseModel, Field
+from src.middleware.state_machine import get_workflow_state
+from src.utils.models import Evidence
+logger = structlog.get_logger()
+LoopStatus = Literal["pending", "running", "completed", "failed", "cancelled"]
+class ResearchLoop(BaseModel):
+    """Represents a single research loop."""
+    loop_id: str = Field(description="Unique identifier for the loop")
+    query: str = Field(description="The research query for this loop")
+    status: LoopStatus = Field(default="pending")
+    evidence: list[Evidence] = Field(default_factory=list)
+    iteration_count: int = Field(default=0, ge=0)
+    error: str | None = Field(default=None)
+    model_config = {"frozen": False}  # Mutable for status updates
+class WorkflowManager:
+    """Manages parallel research loops and state synchronization."""
+    def __init__(self) -> None:
+        """Initialize the workflow manager."""
+        self._loops: dict[str, ResearchLoop] = {}
+    async def add_loop(self, loop_id: str, query: str) -> ResearchLoop:
+        """Add a new research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+            query: The research query for this loop.
+        Returns:
+            The created ResearchLoop instance.
+        """
+        loop = ResearchLoop(loop_id=loop_id, query=query, status="pending")
+        self._loops[loop_id] = loop
+        logger.info("Loop added", loop_id=loop_id, query=query)
+        return loop
+    async def get_loop(self, loop_id: str) -> ResearchLoop | None:
+        """Get a research loop by ID.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            The ResearchLoop instance, or None if not found.
+        """
+        return self._loops.get(loop_id)
+    async def update_loop_status(
+        self, loop_id: str, status: LoopStatus, error: str | None = None
+    ) -> None:
+        """Update the status of a research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+            status: New status for the loop.
+            error: Optional error message if status is "failed".
+        """
+        if loop_id not in self._loops:
+            logger.warning("Loop not found", loop_id=loop_id)
+            return
+        self._loops[loop_id].status = status
+        if error:
+            self._loops[loop_id].error = error
+        logger.info("Loop status updated", loop_id=loop_id, status=status)
+    async def add_loop_evidence(self, loop_id: str, evidence: list[Evidence]) -> None:
+        """Add evidence to a research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+            evidence: List of Evidence objects to add.
+        """
+        if loop_id not in self._loops:
+            logger.warning("Loop not found", loop_id=loop_id)
+            return
+        self._loops[loop_id].evidence.extend(evidence)
+        logger.debug(
+            "Evidence added to loop",
+            loop_id=loop_id,
+            evidence_count=len(evidence),
+        )
+    async def increment_loop_iteration(self, loop_id: str) -> None:
+        """Increment the iteration count for a research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id not in self._loops:
+            logger.warning("Loop not found", loop_id=loop_id)
+            return
+        self._loops[loop_id].iteration_count += 1
+        logger.debug(
+            "Iteration incremented",
+            loop_id=loop_id,
+            iteration=self._loops[loop_id].iteration_count,
+        )
+    async def run_loops_parallel(
+        self,
+        loop_configs: list[dict[str, Any]],
+        loop_func: Callable[[dict[str, Any]], Any],
+        judge_handler: Any | None = None,
+        budget_tracker: Any | None = None,
+    ) -> list[Any]:
+        """Run multiple research loops in parallel.
+        Args:
+            loop_configs: List of configuration dicts, each must contain 'loop_id' and 'query'.
+            loop_func: Async function that takes a config dict and returns loop results.
+            judge_handler: Optional JudgeHandler for early termination based on evidence sufficiency.
+            budget_tracker: Optional BudgetTracker for budget enforcement.
+        Returns:
+            List of results from each loop (in order of completion, not original order).
+        """
+        logger.info("Starting parallel loops", loop_count=len(loop_configs))
+        # Create loops
+        for config in loop_configs:
+            loop_id = config.get("loop_id")
+            query = config.get("query", "")
+            if loop_id:
+                await self.add_loop(loop_id, query)
+                await self.update_loop_status(loop_id, "running")
+        # Run loops in parallel
+        async def run_single_loop(config: dict[str, Any]) -> Any:
+            loop_id = config.get("loop_id", "unknown")
+            query = config.get("query", "")
+            try:
+                # Check budget before starting
+                if budget_tracker:
+                    exceeded, reason = budget_tracker.check_budget(loop_id)
+                    if exceeded:
+                        await self.update_loop_status(loop_id, "cancelled", error=reason)
+                        logger.warning(
+                            "Loop cancelled due to budget", loop_id=loop_id, reason=reason
+                        )
+                        return None
+                # If loop_func supports periodic checkpoints, we could check judge here
+                # For now, the loop_func itself handles judge checks internally
+                result = await loop_func(config)
+                # Final check with judge if available
+                if judge_handler and query:
+                    should_complete, reason = await self.check_loop_completion(
+                        loop_id, query, judge_handler
+                    )
+                    if should_complete:
+                        logger.info(
+                            "Loop completed early based on judge assessment",
+                            loop_id=loop_id,
+                            reason=reason,
+                        )
+                await self.update_loop_status(loop_id, "completed")
+                return result
+            except Exception as e:
+                error_msg = str(e)
+                await self.update_loop_status(loop_id, "failed", error=error_msg)
+                logger.error("Loop failed", loop_id=loop_id, error=error_msg)
+                raise
+        results = await asyncio.gather(
+            *(run_single_loop(config) for config in loop_configs),
+            return_exceptions=True,
+        )
+        # Log completion
+        completed = sum(1 for r in results if not isinstance(r, Exception))
+        failed = len(results) - completed
+        logger.info(
+            "Parallel loops completed",
+            total=len(loop_configs),
+            completed=completed,
+            failed=failed,
+        )
+        return results
+    async def wait_for_loops(
+        self, loop_ids: list[str], timeout: float | None = None
+    ) -> list[ResearchLoop]:
+        """Wait for loops to complete.
+        Args:
+            loop_ids: List of loop IDs to wait for.
+            timeout: Optional timeout in seconds.
+        Returns:
+            List of ResearchLoop instances (may be incomplete if timeout occurs).
+        """
+        start_time = asyncio.get_event_loop().time()
+        while True:
+            loops = [self._loops.get(loop_id) for loop_id in loop_ids]
+            all_complete = all(
+                loop and loop.status in ("completed", "failed", "cancelled") for loop in loops
+            )
+            if all_complete:
+                return [loop for loop in loops if loop is not None]
+            if timeout is not None:
+                elapsed = asyncio.get_event_loop().time() - start_time
+                if elapsed >= timeout:
+                    logger.warning("Timeout waiting for loops", timeout=timeout)
+                    return [loop for loop in loops if loop is not None]
+            await asyncio.sleep(0.1)  # Small delay to avoid busy waiting
+    async def cancel_loop(self, loop_id: str) -> None:
+        """Cancel a research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        await self.update_loop_status(loop_id, "cancelled")
+        logger.info("Loop cancelled", loop_id=loop_id)
+    async def get_all_loops(self) -> list[ResearchLoop]:
+        """Get all research loops.
+        Returns:
+            List of all ResearchLoop instances.
+        """
+        return list(self._loops.values())
+    async def sync_loop_evidence_to_state(self, loop_id: str) -> None:
+        """Synchronize evidence from a loop to the global state.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id not in self._loops:
+            logger.warning("Loop not found", loop_id=loop_id)
+            return
+        loop = self._loops[loop_id]
+        state = get_workflow_state()
+        added_count = state.add_evidence(loop.evidence)
+        logger.debug(
+            "Loop evidence synced to state",
+            loop_id=loop_id,
+            evidence_count=len(loop.evidence),
+            added_count=added_count,
+        )
+    async def get_shared_evidence(self) -> list[Evidence]:
+        """Get evidence from the global state.
+        Returns:
+            List of Evidence objects from the global state.
+        """
+        state = get_workflow_state()
+        return state.evidence
+    async def get_loop_evidence(self, loop_id: str) -> list[Evidence]:
+        """Get evidence collected by a specific loop.
+        Args:
+            loop_id: Loop identifier.
+        Returns:
+            List of Evidence objects from the loop.
+        """
+        if loop_id not in self._loops:
+            return []
+        return self._loops[loop_id].evidence
+    async def check_loop_completion(
+        self, loop_id: str, query: str, judge_handler: Any
+    ) -> tuple[bool, str]:
+        """Check if a loop should complete using judge assessment.
+        Args:
+            loop_id: Loop identifier.
+            query: Research query.
+            judge_handler: JudgeHandler instance.
+        Returns:
+            Tuple of (should_complete: bool, reason: str).
+        """
+        evidence = await self.get_loop_evidence(loop_id)
+        if not evidence:
+            return False, "No evidence collected yet"
+        try:
+            assessment = await judge_handler.assess(query, evidence)
+            if assessment.sufficient:
+                return True, f"Judge assessment: {assessment.reasoning}"
+            return False, f"Judge assessment: {assessment.reasoning}"
+        except Exception as e:
+            logger.error("Judge assessment failed", error=str(e), loop_id=loop_id)
+            return False, f"Judge assessment failed: {e!s}"

src/orchestrator/__init__.py ADDED Viewed

	@@ -0,0 +1,48 @@

+"""Orchestrator module for research flows and planner agent.
+This module provides:
+- PlannerAgent: Creates report plans with sections
+- IterativeResearchFlow: Single research loop pattern
+- DeepResearchFlow: Parallel research loops pattern
+- GraphOrchestrator: Stub for Phase 4 (uses agent chains for now)
+- Protocols: SearchHandlerProtocol, JudgeHandlerProtocol (re-exported from legacy_orchestrator)
+- Orchestrator: Legacy orchestrator class (re-exported from legacy_orchestrator)
+"""
+from typing import TYPE_CHECKING
+# Re-export protocols and Orchestrator from legacy_orchestrator for backward compatibility
+from src.legacy_orchestrator import (
+    JudgeHandlerProtocol,
+    Orchestrator,
+    SearchHandlerProtocol,
+)
+# Lazy imports to avoid circular dependencies
+if TYPE_CHECKING:
+    from src.orchestrator.graph_orchestrator import GraphOrchestrator
+    from src.orchestrator.planner_agent import PlannerAgent, create_planner_agent
+    from src.orchestrator.research_flow import (
+        DeepResearchFlow,
+        IterativeResearchFlow,
+    )
+# Public exports
+from src.orchestrator.graph_orchestrator import (
+    GraphOrchestrator,
+    create_graph_orchestrator,
+)
+from src.orchestrator.planner_agent import PlannerAgent, create_planner_agent
+from src.orchestrator.research_flow import DeepResearchFlow, IterativeResearchFlow
+__all__ = [
+    "DeepResearchFlow",
+    "GraphOrchestrator",
+    "IterativeResearchFlow",
+    "JudgeHandlerProtocol",
+    "Orchestrator",
+    "PlannerAgent",
+    "SearchHandlerProtocol",
+    "create_graph_orchestrator",
+    "create_planner_agent",
+]

src/orchestrator/graph_orchestrator.py ADDED Viewed

	@@ -0,0 +1,974 @@

+"""Graph orchestrator for Phase 4.
+Implements graph-based orchestration using Pydantic AI agents as nodes.
+Supports both iterative and deep research patterns with parallel execution.
+"""
+import asyncio
+from collections.abc import AsyncGenerator, Callable
+from typing import TYPE_CHECKING, Any, Literal
+import structlog
+from src.agent_factory.agents import (
+    create_input_parser_agent,
+    create_knowledge_gap_agent,
+    create_long_writer_agent,
+    create_planner_agent,
+    create_thinking_agent,
+    create_tool_selector_agent,
+    create_writer_agent,
+)
+from src.agent_factory.graph_builder import (
+    AgentNode,
+    DecisionNode,
+    ParallelNode,
+    ResearchGraph,
+    StateNode,
+    create_deep_graph,
+    create_iterative_graph,
+)
+from src.middleware.budget_tracker import BudgetTracker
+from src.middleware.state_machine import WorkflowState, init_workflow_state
+from src.orchestrator.research_flow import DeepResearchFlow, IterativeResearchFlow
+from src.utils.models import AgentEvent
+if TYPE_CHECKING:
+    pass
+logger = structlog.get_logger()
+class GraphExecutionContext:
+    """Context for managing graph execution state."""
+    def __init__(self, state: WorkflowState, budget_tracker: BudgetTracker) -> None:
+        """Initialize execution context.
+        Args:
+            state: Current workflow state
+            budget_tracker: Budget tracker instance
+        """
+        self.current_node: str = ""
+        self.visited_nodes: set[str] = set()
+        self.node_results: dict[str, Any] = {}
+        self.state = state
+        self.budget_tracker = budget_tracker
+        self.iteration_count = 0
+    def set_node_result(self, node_id: str, result: Any) -> None:
+        """Store result from node execution.
+        Args:
+            node_id: The node ID
+            result: The execution result
+        """
+        self.node_results[node_id] = result
+    def get_node_result(self, node_id: str) -> Any:
+        """Get result from node execution.
+        Args:
+            node_id: The node ID
+        Returns:
+            The stored result, or None if not found
+        """
+        return self.node_results.get(node_id)
+    def has_visited(self, node_id: str) -> bool:
+        """Check if node was visited.
+        Args:
+            node_id: The node ID
+        Returns:
+            True if visited, False otherwise
+        """
+        return node_id in self.visited_nodes
+    def mark_visited(self, node_id: str) -> None:
+        """Mark node as visited.
+        Args:
+            node_id: The node ID
+        """
+        self.visited_nodes.add(node_id)
+    def update_state(
+        self, updater: Callable[[WorkflowState, Any], WorkflowState], data: Any
+    ) -> None:
+        """Update workflow state.
+        Args:
+            updater: Function to update state
+            data: Data to pass to updater
+        """
+        self.state = updater(self.state, data)
+class GraphOrchestrator:
+    """
+    Graph orchestrator using Pydantic AI Graphs.
+    Executes research workflows as graphs with nodes (agents) and edges (transitions).
+    Supports parallel execution, conditional routing, and state management.
+    """
+    def __init__(
+        self,
+        mode: Literal["iterative", "deep", "auto"] = "auto",
+        max_iterations: int = 5,
+        max_time_minutes: int = 10,
+        use_graph: bool = True,
+    ) -> None:
+        """
+        Initialize graph orchestrator.
+        Args:
+            mode: Research mode ("iterative", "deep", or "auto" to detect)
+            max_iterations: Maximum iterations per loop
+            max_time_minutes: Maximum time per loop
+            use_graph: Whether to use graph execution (True) or agent chains (False)
+        """
+        self.mode = mode
+        self.max_iterations = max_iterations
+        self.max_time_minutes = max_time_minutes
+        self.use_graph = use_graph
+        self.logger = logger
+        # Initialize flows (for backward compatibility)
+        self._iterative_flow: IterativeResearchFlow | None = None
+        self._deep_flow: DeepResearchFlow | None = None
+        # Graph execution components (lazy initialization)
+        self._graph: ResearchGraph | None = None
+        self._budget_tracker: BudgetTracker | None = None
+    async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
+        """
+        Run the research workflow.
+        Args:
+            query: The user's research query
+        Yields:
+            AgentEvent objects for real-time UI updates
+        """
+        self.logger.info(
+            "Starting graph orchestrator",
+            query=query[:100],
+            mode=self.mode,
+            use_graph=self.use_graph,
+        )
+        yield AgentEvent(
+            type="started",
+            message=f"Starting research ({self.mode} mode): {query}",
+            iteration=0,
+        )
+        try:
+            # Determine research mode
+            research_mode = self.mode
+            if research_mode == "auto":
+                research_mode = await self._detect_research_mode(query)
+            # Use graph execution if enabled, otherwise fall back to agent chains
+            if self.use_graph:
+                async for event in self._run_with_graph(query, research_mode):
+                    yield event
+            else:
+                async for event in self._run_with_chains(query, research_mode):
+                    yield event
+        except Exception as e:
+            self.logger.error("Graph orchestrator failed", error=str(e), exc_info=True)
+            yield AgentEvent(
+                type="error",
+                message=f"Research failed: {e!s}",
+                iteration=0,
+            )
+    async def _run_with_graph(
+        self, query: str, research_mode: Literal["iterative", "deep"]
+    ) -> AsyncGenerator[AgentEvent, None]:
+        """Run workflow using graph execution.
+        Args:
+            query: The research query
+            research_mode: The research mode
+        Yields:
+            AgentEvent objects
+        """
+        # Initialize state and budget tracker
+        from src.services.embeddings import get_embedding_service
+        embedding_service = get_embedding_service()
+        state = init_workflow_state(embedding_service=embedding_service)
+        budget_tracker = BudgetTracker()
+        budget_tracker.create_budget(
+            loop_id="graph_execution",
+            tokens_limit=100000,
+            time_limit_seconds=self.max_time_minutes * 60,
+            iterations_limit=self.max_iterations,
+        )
+        budget_tracker.start_timer("graph_execution")
+        context = GraphExecutionContext(state, budget_tracker)
+        # Build graph
+        self._graph = await self._build_graph(research_mode)
+        # Execute graph
+        async for event in self._execute_graph(query, context):
+            yield event
+    async def _run_with_chains(
+        self, query: str, research_mode: Literal["iterative", "deep"]
+    ) -> AsyncGenerator[AgentEvent, None]:
+        """Run workflow using agent chains (backward compatibility).
+        Args:
+            query: The research query
+            research_mode: The research mode
+        Yields:
+            AgentEvent objects
+        """
+        if research_mode == "iterative":
+            yield AgentEvent(
+                type="searching",
+                message="Running iterative research flow...",
+                iteration=1,
+            )
+            if self._iterative_flow is None:
+                self._iterative_flow = IterativeResearchFlow(
+                    max_iterations=self.max_iterations,
+                    max_time_minutes=self.max_time_minutes,
+                )
+            try:
+                final_report = await self._iterative_flow.run(query)
+            except Exception as e:
+                self.logger.error("Iterative flow failed", error=str(e), exc_info=True)
+                # Yield error event - outer handler will also catch and yield error event
+                yield AgentEvent(
+                    type="error",
+                    message=f"Iterative research failed: {e!s}",
+                    iteration=1,
+                )
+                # Re-raise so outer handler can also yield error event for consistency
+                raise
+            yield AgentEvent(
+                type="complete",
+                message=final_report,
+                data={"mode": "iterative"},
+                iteration=1,
+            )
+        elif research_mode == "deep":
+            yield AgentEvent(
+                type="searching",
+                message="Running deep research flow...",
+                iteration=1,
+            )
+            if self._deep_flow is None:
+                self._deep_flow = DeepResearchFlow(
+                    max_iterations=self.max_iterations,
+                    max_time_minutes=self.max_time_minutes,
+                )
+            try:
+                final_report = await self._deep_flow.run(query)
+            except Exception as e:
+                self.logger.error("Deep flow failed", error=str(e), exc_info=True)
+                # Yield error event before re-raising so test can capture it
+                yield AgentEvent(
+                    type="error",
+                    message=f"Deep research failed: {e!s}",
+                    iteration=1,
+                )
+                raise
+            yield AgentEvent(
+                type="complete",
+                message=final_report,
+                data={"mode": "deep"},
+                iteration=1,
+            )
+    async def _build_graph(self, mode: Literal["iterative", "deep"]) -> ResearchGraph:
+        """Build graph for the specified mode.
+        Args:
+            mode: Research mode
+        Returns:
+            Constructed ResearchGraph
+        """
+        if mode == "iterative":
+            # Get agents
+            knowledge_gap_agent = create_knowledge_gap_agent()
+            tool_selector_agent = create_tool_selector_agent()
+            thinking_agent = create_thinking_agent()
+            writer_agent = create_writer_agent()
+            # Create graph
+            graph = create_iterative_graph(
+                knowledge_gap_agent=knowledge_gap_agent.agent,
+                tool_selector_agent=tool_selector_agent.agent,
+                thinking_agent=thinking_agent.agent,
+                writer_agent=writer_agent.agent,
+            )
+        else:  # deep
+            # Get agents
+            planner_agent = create_planner_agent()
+            knowledge_gap_agent = create_knowledge_gap_agent()
+            tool_selector_agent = create_tool_selector_agent()
+            thinking_agent = create_thinking_agent()
+            writer_agent = create_writer_agent()
+            long_writer_agent = create_long_writer_agent()
+            # Create graph
+            graph = create_deep_graph(
+                planner_agent=planner_agent.agent,
+                knowledge_gap_agent=knowledge_gap_agent.agent,
+                tool_selector_agent=tool_selector_agent.agent,
+                thinking_agent=thinking_agent.agent,
+                writer_agent=writer_agent.agent,
+                long_writer_agent=long_writer_agent.agent,
+            )
+        return graph
+    def _emit_start_event(
+        self, node: Any, current_node_id: str, iteration: int, context: GraphExecutionContext
+    ) -> AgentEvent:
+        """Emit start event for a node.
+        Args:
+            node: The node being executed
+            current_node_id: Current node ID
+            iteration: Current iteration number
+            context: Execution context
+        Returns:
+            AgentEvent for the start of node execution
+        """
+        if node and node.node_id == "planner":
+            return AgentEvent(
+                type="searching",
+                message="Creating report plan...",
+                iteration=iteration,
+            )
+        elif node and node.node_id == "parallel_loops":
+            # Get report plan to show section count
+            report_plan = context.get_node_result("planner")
+            if report_plan and hasattr(report_plan, "report_outline"):
+                section_count = len(report_plan.report_outline)
+                return AgentEvent(
+                    type="looping",
+                    message=f"Running parallel research loops for {section_count} sections...",
+                    iteration=iteration,
+                    data={"sections": section_count},
+                )
+            return AgentEvent(
+                type="looping",
+                message="Running parallel research loops...",
+                iteration=iteration,
+            )
+        elif node and node.node_id == "synthesizer":
+            return AgentEvent(
+                type="synthesizing",
+                message="Synthesizing final report from section drafts...",
+                iteration=iteration,
+            )
+        return AgentEvent(
+            type="looping",
+            message=f"Executing node: {current_node_id}",
+            iteration=iteration,
+        )
+    def _emit_completion_event(
+        self, node: Any, current_node_id: str, result: Any, iteration: int
+    ) -> AgentEvent:
+        """Emit completion event for a node.
+        Args:
+            node: The node that was executed
+            current_node_id: Current node ID
+            result: Node execution result
+            iteration: Current iteration number
+        Returns:
+            AgentEvent for the completion of node execution
+        """
+        if not node:
+            return AgentEvent(
+                type="looping",
+                message=f"Completed node: {current_node_id}",
+                iteration=iteration,
+            )
+        if node.node_id == "planner":
+            if isinstance(result, dict) and "report_outline" in result:
+                section_count = len(result["report_outline"])
+                return AgentEvent(
+                    type="search_complete",
+                    message=f"Report plan created with {section_count} sections",
+                    iteration=iteration,
+                    data={"sections": section_count},
+                )
+            return AgentEvent(
+                type="search_complete",
+                message="Report plan created",
+                iteration=iteration,
+            )
+        elif node.node_id == "parallel_loops":
+            if isinstance(result, list):
+                return AgentEvent(
+                    type="search_complete",
+                    message=f"Completed parallel research for {len(result)} sections",
+                    iteration=iteration,
+                    data={"sections_completed": len(result)},
+                )
+            return AgentEvent(
+                type="search_complete",
+                message="Parallel research loops completed",
+                iteration=iteration,
+            )
+        elif node.node_id == "synthesizer":
+            return AgentEvent(
+                type="synthesizing",
+                message="Final report synthesis completed",
+                iteration=iteration,
+            )
+        return AgentEvent(
+            type="searching" if node.node_type == "agent" else "looping",
+            message=f"Completed {node.node_type} node: {current_node_id}",
+            iteration=iteration,
+        )
+    async def _execute_graph(
+        self, query: str, context: GraphExecutionContext
+    ) -> AsyncGenerator[AgentEvent, None]:
+        """Execute the graph from entry node.
+        Args:
+            query: The research query
+            context: Execution context
+        Yields:
+            AgentEvent objects
+        """
+        if not self._graph:
+            raise ValueError("Graph not built")
+        current_node_id = self._graph.entry_node
+        iteration = 0
+        while current_node_id and current_node_id not in self._graph.exit_nodes:
+            # Check budget
+            if not context.budget_tracker.can_continue("graph_execution"):
+                self.logger.warning("Budget exceeded, exiting graph execution")
+                break
+            # Execute current node
+            iteration += 1
+            context.current_node = current_node_id
+            node = self._graph.get_node(current_node_id)
+            # Emit start event
+            yield self._emit_start_event(node, current_node_id, iteration, context)
+            try:
+                result = await self._execute_node(current_node_id, query, context)
+                context.set_node_result(current_node_id, result)
+                context.mark_visited(current_node_id)
+                # Yield completion event
+                yield self._emit_completion_event(node, current_node_id, result, iteration)
+            except Exception as e:
+                self.logger.error("Node execution failed", node_id=current_node_id, error=str(e))
+                yield AgentEvent(
+                    type="error",
+                    message=f"Node {current_node_id} failed: {e!s}",
+                    iteration=iteration,
+                )
+                break
+            # Get next node(s)
+            next_nodes = self._get_next_node(current_node_id, context)
+            if not next_nodes:
+                # No more nodes, check if we're at exit
+                if current_node_id in self._graph.exit_nodes:
+                    break
+                # Otherwise, we've reached a dead end
+                self.logger.warning("Reached dead end in graph", node_id=current_node_id)
+                break
+            current_node_id = next_nodes[0]  # For now, take first next node (handle parallel later)
+        # Final event
+        final_result = context.get_node_result(current_node_id) if current_node_id else None
+        yield AgentEvent(
+            type="complete",
+            message=final_result if isinstance(final_result, str) else "Research completed",
+            data={"mode": self.mode, "iterations": iteration},
+            iteration=iteration,
+        )
+    async def _execute_node(self, node_id: str, query: str, context: GraphExecutionContext) -> Any:
+        """Execute a single node.
+        Args:
+            node_id: The node ID
+            query: The research query
+            context: Execution context
+        Returns:
+            Node execution result
+        """
+        if not self._graph:
+            raise ValueError("Graph not built")
+        node = self._graph.get_node(node_id)
+        if not node:
+            raise ValueError(f"Node {node_id} not found")
+        if isinstance(node, AgentNode):
+            return await self._execute_agent_node(node, query, context)
+        elif isinstance(node, StateNode):
+            return await self._execute_state_node(node, query, context)
+        elif isinstance(node, DecisionNode):
+            return await self._execute_decision_node(node, query, context)
+        elif isinstance(node, ParallelNode):
+            return await self._execute_parallel_node(node, query, context)
+        else:
+            raise ValueError(f"Unknown node type: {type(node)}")
+    async def _execute_agent_node(
+        self, node: AgentNode, query: str, context: GraphExecutionContext
+    ) -> Any:
+        """Execute an agent node.
+        Special handling for deep research nodes:
+        - "planner": Takes query string, returns ReportPlan
+        - "synthesizer": Takes query + ReportPlan + section drafts, returns final report
+        Args:
+            node: The agent node
+            query: The research query
+            context: Execution context
+        Returns:
+            Agent execution result
+        """
+        # Special handling for synthesizer node
+        if node.node_id == "synthesizer":
+            # Call LongWriterAgent.write_report() directly instead of using agent.run()
+            from src.agent_factory.agents import create_long_writer_agent
+            from src.utils.models import ReportDraft, ReportDraftSection, ReportPlan
+            report_plan = context.get_node_result("planner")
+            section_drafts = context.get_node_result("parallel_loops") or []
+            if not isinstance(report_plan, ReportPlan):
+                raise ValueError("ReportPlan not found for synthesizer")
+            if not section_drafts:
+                raise ValueError("Section drafts not found for synthesizer")
+            # Create ReportDraft from section drafts
+            report_draft = ReportDraft(
+                sections=[
+                    ReportDraftSection(
+                        section_title=section.title,
+                        section_content=draft,
+                    )
+                    for section, draft in zip(
+                        report_plan.report_outline, section_drafts, strict=False
+                    )
+                ]
+            )
+            # Get LongWriterAgent instance and call write_report directly
+            long_writer_agent = create_long_writer_agent()
+            final_report = await long_writer_agent.write_report(
+                original_query=query,
+                report_title=report_plan.report_title,
+                report_draft=report_draft,
+            )
+            # Estimate tokens (rough estimate)
+            estimated_tokens = len(final_report) // 4  # Rough token estimate
+            context.budget_tracker.add_tokens("graph_execution", estimated_tokens)
+            return final_report
+        # Standard agent execution
+        # Prepare input based on node type
+        if node.node_id == "planner":
+            # Planner takes the original query
+            input_data = query
+        else:
+            # Standard: use previous node result or query
+            prev_result = context.get_node_result(context.current_node)
+            input_data = prev_result if prev_result is not None else query
+        # Apply input transformer if provided
+        if node.input_transformer:
+            input_data = node.input_transformer(input_data)
+        # Execute agent
+        result = await node.agent.run(input_data)
+        # Transform output if needed
+        output = result.output
+        if node.output_transformer:
+            output = node.output_transformer(output)
+        # Estimate and track tokens
+        if hasattr(result, "usage") and result.usage:
+            tokens = result.usage.total_tokens if hasattr(result.usage, "total_tokens") else 0
+            context.budget_tracker.add_tokens("graph_execution", tokens)
+        return output
+    async def _execute_state_node(
+        self, node: StateNode, query: str, context: GraphExecutionContext
+    ) -> Any:
+        """Execute a state node.
+        Special handling for deep research state nodes:
+        - "store_plan": Stores ReportPlan in context for parallel loops
+        - "collect_drafts": Stores section drafts in context for synthesizer
+        Args:
+            node: The state node
+            query: The research query
+            context: Execution context
+        Returns:
+            State update result
+        """
+        # Get previous result for state update
+        # For "store_plan", get from planner node
+        # For "collect_drafts", get from parallel_loops node
+        if node.node_id == "store_plan":
+            prev_result = context.get_node_result("planner")
+        elif node.node_id == "collect_drafts":
+            prev_result = context.get_node_result("parallel_loops")
+        else:
+            prev_result = context.get_node_result(context.current_node)
+        # Update state
+        updated_state = node.state_updater(context.state, prev_result)
+        context.state = updated_state
+        # Store result in context for next nodes to access
+        context.set_node_result(node.node_id, prev_result)
+        # Read state if needed
+        if node.state_reader:
+            return node.state_reader(context.state)
+        return prev_result  # Return the stored result for next nodes
+    async def _execute_decision_node(
+        self, node: DecisionNode, query: str, context: GraphExecutionContext
+    ) -> str:
+        """Execute a decision node.
+        Args:
+            node: The decision node
+            query: The research query
+            context: Execution context
+        Returns:
+            Next node ID
+        """
+        # Get previous result for decision
+        prev_result = context.get_node_result(context.current_node)
+        # Make decision
+        next_node_id = node.decision_function(prev_result)
+        # Validate decision
+        if next_node_id not in node.options:
+            self.logger.warning(
+                "Decision function returned invalid node",
+                node_id=node.node_id,
+                returned=next_node_id,
+                options=node.options,
+            )
+            # Default to first option
+            next_node_id = node.options[0]
+        return next_node_id
+    async def _execute_parallel_node(
+        self, node: ParallelNode, query: str, context: GraphExecutionContext
+    ) -> list[Any]:
+        """Execute a parallel node.
+        Special handling for deep research "parallel_loops" node:
+        - Extracts report plan from previous node result
+        - Creates IterativeResearchFlow instances for each section
+        - Executes them in parallel
+        - Returns section drafts
+        Args:
+            node: The parallel node
+            query: The research query
+            context: Execution context
+        Returns:
+            List of results from parallel nodes
+        """
+        # Special handling for deep research parallel_loops node
+        if node.node_id == "parallel_loops":
+            return await self._execute_deep_research_parallel_loops(node, query, context)
+        # Standard parallel node execution
+        # Execute all parallel nodes concurrently
+        tasks = [
+            self._execute_node(parallel_node_id, query, context)
+            for parallel_node_id in node.parallel_nodes
+        ]
+        results = await asyncio.gather(*tasks, return_exceptions=True)
+        # Handle exceptions
+        for i, result in enumerate(results):
+            if isinstance(result, Exception):
+                self.logger.error(
+                    "Parallel node execution failed",
+                    node_id=node.parallel_nodes[i] if i < len(node.parallel_nodes) else "unknown",
+                    error=str(result),
+                )
+                results[i] = None
+        # Aggregate if needed
+        if node.aggregator:
+            aggregated = node.aggregator(results)
+            # Type cast: aggregator returns Any, but we expect list[Any]
+            return list(aggregated) if isinstance(aggregated, list) else [aggregated]
+        return results
+    async def _execute_deep_research_parallel_loops(
+        self, node: ParallelNode, query: str, context: GraphExecutionContext
+    ) -> list[str]:
+        """Execute parallel iterative research loops for deep research.
+        Args:
+            node: The parallel node (should be "parallel_loops")
+            query: The research query
+            context: Execution context
+        Returns:
+            List of section draft strings
+        """
+        from src.agent_factory.judges import create_judge_handler
+        from src.orchestrator.research_flow import IterativeResearchFlow
+        from src.utils.models import ReportPlan
+        # Get report plan from previous node (store_plan)
+        # The plan should be stored in context.node_results from the planner node
+        planner_result = context.get_node_result("planner")
+        if not isinstance(planner_result, ReportPlan):
+            self.logger.error(
+                "Planner result is not a ReportPlan",
+                type=type(planner_result),
+            )
+            raise ValueError("Planner must return ReportPlan for deep research")
+        report_plan: ReportPlan = planner_result
+        self.logger.info(
+            "Executing parallel loops for deep research",
+            sections=len(report_plan.report_outline),
+        )
+        # Create judge handler for iterative flows
+        judge_handler = create_judge_handler()
+        # Create and execute iterative research flows for each section
+        async def run_section_research(section_index: int) -> str:
+            """Run iterative research for a single section."""
+            section = report_plan.report_outline[section_index]
+            try:
+                # Create iterative research flow
+                flow = IterativeResearchFlow(
+                    max_iterations=self.max_iterations,
+                    max_time_minutes=self.max_time_minutes,
+                    verbose=False,  # Less verbose in parallel execution
+                    use_graph=False,  # Use agent chains for section research
+                    judge_handler=judge_handler,
+                )
+                # Run research for this section
+                section_draft = await flow.run(
+                    query=section.key_question,
+                    background_context=report_plan.background_context,
+                )
+                self.logger.info(
+                    "Section research completed",
+                    section_index=section_index,
+                    section_title=section.title,
+                    draft_length=len(section_draft),
+                )
+                return section_draft
+            except Exception as e:
+                self.logger.error(
+                    "Section research failed",
+                    section_index=section_index,
+                    section_title=section.title,
+                    error=str(e),
+                )
+                # Return empty string for failed sections
+                return f"# {section.title}\n\n[Research failed: {e!s}]"
+        # Execute all sections in parallel
+        section_drafts = await asyncio.gather(
+            *(run_section_research(i) for i in range(len(report_plan.report_outline))),
+            return_exceptions=True,
+        )
+        # Handle exceptions and filter None results
+        filtered_drafts: list[str] = []
+        for i, draft in enumerate(section_drafts):
+            if isinstance(draft, Exception):
+                self.logger.error(
+                    "Section research exception",
+                    section_index=i,
+                    error=str(draft),
+                )
+                filtered_drafts.append(
+                    f"# {report_plan.report_outline[i].title}\n\n[Research failed: {draft!s}]"
+                )
+            elif draft is not None:
+                # Type narrowing: after Exception check, draft is str | None
+                assert isinstance(draft, str), "Expected str after Exception check"
+                filtered_drafts.append(draft)
+        self.logger.info(
+            "Parallel loops completed",
+            sections=len(filtered_drafts),
+            total_sections=len(report_plan.report_outline),
+        )
+        return filtered_drafts
+    def _get_next_node(self, node_id: str, context: GraphExecutionContext) -> list[str]:
+        """Get next node(s) from current node.
+        Args:
+            node_id: Current node ID
+            context: Execution context
+        Returns:
+            List of next node IDs
+        """
+        if not self._graph:
+            return []
+        # Get node result for condition evaluation
+        node_result = context.get_node_result(node_id)
+        # Get next nodes
+        next_nodes = self._graph.get_next_nodes(node_id, context=node_result)
+        # If this was a decision node, use its result
+        node = self._graph.get_node(node_id)
+        if isinstance(node, DecisionNode):
+            decision_result = node_result
+            if isinstance(decision_result, str):
+                return [decision_result]
+        # Return next node IDs
+        return [next_node_id for next_node_id, _ in next_nodes]
+    async def _detect_research_mode(self, query: str) -> Literal["iterative", "deep"]:
+        """
+        Detect research mode from query using input parser agent.
+        Uses input parser agent to analyze query and determine research mode.
+        Falls back to heuristic if parser fails.
+        Args:
+            query: The research query
+        Returns:
+            Detected research mode
+        """
+        try:
+            # Use input parser agent for intelligent mode detection
+            input_parser = create_input_parser_agent()
+            parsed_query = await input_parser.parse(query)
+            self.logger.info(
+                "Research mode detected by input parser",
+                mode=parsed_query.research_mode,
+                query=query[:100],
+            )
+            return parsed_query.research_mode
+        except Exception as e:
+            # Fallback to heuristic if parser fails
+            self.logger.warning(
+                "Input parser failed, using heuristic",
+                error=str(e),
+                query=query[:100],
+            )
+            query_lower = query.lower()
+            if any(
+                keyword in query_lower
+                for keyword in [
+                    "section",
+                    "sections",
+                    "report",
+                    "outline",
+                    "structure",
+                    "comprehensive",
+                    "analyze",
+                    "analysis",
+                ]
+            ):
+                return "deep"
+            return "iterative"
+def create_graph_orchestrator(
+    mode: Literal["iterative", "deep", "auto"] = "auto",
+    max_iterations: int = 5,
+    max_time_minutes: int = 10,
+    use_graph: bool = True,
+) -> GraphOrchestrator:
+    """
+    Factory function to create a graph orchestrator.
+    Args:
+        mode: Research mode
+        max_iterations: Maximum iterations per loop
+        max_time_minutes: Maximum time per loop
+        use_graph: Whether to use graph execution (True) or agent chains (False)
+    Returns:
+        Configured GraphOrchestrator instance
+    """
+    return GraphOrchestrator(
+        mode=mode,
+        max_iterations=max_iterations,
+        max_time_minutes=max_time_minutes,
+        use_graph=use_graph,
+    )

src/orchestrator/planner_agent.py ADDED Viewed

	@@ -0,0 +1,184 @@

+"""Planner agent for creating report plans with sections and background context.
+Converts the folder/planner_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.tools.crawl_adapter import crawl_website
+from src.tools.web_search_adapter import web_search
+from src.utils.exceptions import ConfigurationError, JudgeError
+from src.utils.models import ReportPlan, ReportPlanSection
+logger = structlog.get_logger()
+# System prompt for the planner agent
+SYSTEM_PROMPT = f"""
+You are a research manager, managing a team of research agents. Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+Given a research query, your job is to produce an initial outline of the report (section titles and key questions),
+as well as some background context. Each section will be assigned to a different researcher in your team who will then
+carry out research on the section.
+You will be given:
+- An initial research query
+Your task is to:
+1. Produce 1-2 paragraphs of initial background context (if needed) on the query by running web searches or crawling websites
+2. Produce an outline of the report that includes a list of section titles and the key question to be addressed in each section
+3. Provide a title for the report that will be used as the main heading
+Guidelines:
+- Each section should cover a single topic/question that is independent of other sections
+- The key question for each section should include both the NAME and DOMAIN NAME / WEBSITE (if available and applicable) if it is related to a company, product or similar
+- The background_context should not be more than 2 paragraphs
+- The background_context should be very specific to the query and include any information that is relevant for researchers across all sections of the report
+- The background_context should be drawn only from web search or crawl results rather than prior knowledge (i.e. it should only be included if you have called tools)
+- For example, if the query is about a company, the background context should include some basic information about what the company does
+- DO NOT do more than 2 tool calls
+Only output JSON. Follow the JSON schema for ReportPlan. Do not output anything else.
+"""
+class PlannerAgent:
+    """
+    Planner agent that creates report plans with sections and background context.
+    Uses Pydantic AI to generate structured ReportPlan output with optional
+    web search and crawl tool usage for background context.
+    """
+    def __init__(
+        self,
+        model: Any | None = None,
+        web_search_tool: Any | None = None,
+        crawl_tool: Any | None = None,
+    ) -> None:
+        """
+        Initialize the planner agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+            web_search_tool: Optional web search tool function. If None, uses default.
+            crawl_tool: Optional crawl tool function. If None, uses default.
+        """
+        self.model = model or get_model()
+        self.web_search_tool = web_search_tool or web_search
+        self.crawl_tool = crawl_tool or crawl_website
+        self.logger = logger
+        # Validate tools are callable
+        if not callable(self.web_search_tool):
+            raise ConfigurationError("web_search_tool must be callable")
+        if not callable(self.crawl_tool):
+            raise ConfigurationError("crawl_tool must be callable")
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=ReportPlan,
+            system_prompt=SYSTEM_PROMPT,
+            tools=[self.web_search_tool, self.crawl_tool],
+            retries=3,
+        )
+    async def run(self, query: str) -> ReportPlan:
+        """
+        Run the planner agent to generate a report plan.
+        Args:
+            query: The user's research query
+        Returns:
+            ReportPlan with sections, background context, and report title
+        Raises:
+            JudgeError: If planning fails after retries
+            ConfigurationError: If agent configuration is invalid
+        """
+        self.logger.info("Starting report planning", query=query[:100])
+        user_message = f"QUERY: {query}"
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            report_plan = result.output
+            # Validate report plan
+            if not report_plan.report_outline:
+                self.logger.warning("Report plan has no sections", query=query[:100])
+                # Return fallback plan instead of raising error
+                return ReportPlan(
+                    background_context=report_plan.background_context or "",
+                    report_outline=[
+                        ReportPlanSection(
+                            title="Overview",
+                            key_question=query,
+                        )
+                    ],
+                    report_title=report_plan.report_title or f"Research Report: {query[:50]}",
+                )
+            if not report_plan.report_title:
+                self.logger.warning("Report plan has no title", query=query[:100])
+                raise JudgeError("Report plan must have a title")
+            self.logger.info(
+                "Report plan created",
+                sections=len(report_plan.report_outline),
+                has_background=bool(report_plan.background_context),
+            )
+            return report_plan
+        except Exception as e:
+            self.logger.error("Planning failed", error=str(e), query=query[:100])
+            # Fallback: return minimal report plan
+            if isinstance(e, JudgeError | ConfigurationError):
+                raise
+            # For other errors, return a minimal plan
+            return ReportPlan(
+                background_context="",
+                report_outline=[
+                    ReportPlanSection(
+                        title="Research Findings",
+                        key_question=query,
+                    )
+                ],
+                report_title=f"Research Report: {query[:50]}",
+            )
+def create_planner_agent(model: Any | None = None) -> PlannerAgent:
+    """
+    Factory function to create a planner agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured PlannerAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        # Get model from settings if not provided
+        if model is None:
+            model = get_model()
+        # Create and return planner agent
+        return PlannerAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create planner agent", error=str(e))
+        raise ConfigurationError(f"Failed to create planner agent: {e}") from e

src/orchestrator/research_flow.py ADDED Viewed

	@@ -0,0 +1,999 @@

+"""Research flow implementations for iterative and deep research patterns.
+Converts the folder/iterative_research.py and folder/deep_research.py
+implementations to use Pydantic AI agents.
+"""
+import asyncio
+import time
+from typing import Any
+import structlog
+from src.agent_factory.agents import (
+    create_graph_orchestrator,
+    create_knowledge_gap_agent,
+    create_long_writer_agent,
+    create_planner_agent,
+    create_proofreader_agent,
+    create_thinking_agent,
+    create_tool_selector_agent,
+    create_writer_agent,
+)
+from src.agent_factory.judges import create_judge_handler
+from src.middleware.budget_tracker import BudgetTracker
+from src.middleware.state_machine import get_workflow_state, init_workflow_state
+from src.middleware.workflow_manager import WorkflowManager
+from src.services.llamaindex_rag import LlamaIndexRAGService, get_rag_service
+from src.tools.tool_executor import execute_tool_tasks
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import (
+    AgentSelectionPlan,
+    AgentTask,
+    Citation,
+    Conversation,
+    Evidence,
+    JudgeAssessment,
+    KnowledgeGapOutput,
+    ReportDraft,
+    ReportDraftSection,
+    ReportPlan,
+    SourceName,
+    ToolAgentOutput,
+)
+logger = structlog.get_logger()
+class IterativeResearchFlow:
+    """
+    Iterative research flow that runs a single research loop.
+    Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Repeat
+    until research is complete or constraints are met.
+    """
+    def __init__(
+        self,
+        max_iterations: int = 5,
+        max_time_minutes: int = 10,
+        verbose: bool = True,
+        use_graph: bool = False,
+        judge_handler: Any | None = None,
+    ) -> None:
+        """
+        Initialize iterative research flow.
+        Args:
+            max_iterations: Maximum number of iterations
+            max_time_minutes: Maximum time in minutes
+            verbose: Whether to log progress
+            use_graph: Whether to use graph-based execution (True) or agent chains (False)
+        """
+        self.max_iterations = max_iterations
+        self.max_time_minutes = max_time_minutes
+        self.verbose = verbose
+        self.use_graph = use_graph
+        self.logger = logger
+        # Initialize agents (only needed for agent chain execution)
+        if not use_graph:
+            self.knowledge_gap_agent = create_knowledge_gap_agent()
+            self.tool_selector_agent = create_tool_selector_agent()
+            self.thinking_agent = create_thinking_agent()
+            self.writer_agent = create_writer_agent()
+            # Initialize judge handler (use provided or create new)
+            self.judge_handler = judge_handler or create_judge_handler()
+        # Initialize state (only needed for agent chain execution)
+        if not use_graph:
+            self.conversation = Conversation()
+            self.iteration = 0
+            self.start_time: float | None = None
+            self.should_continue = True
+            # Initialize budget tracker
+            self.budget_tracker = BudgetTracker()
+            self.loop_id = "iterative_flow"
+            self.budget_tracker.create_budget(
+                loop_id=self.loop_id,
+                tokens_limit=100000,
+                time_limit_seconds=max_time_minutes * 60,
+                iterations_limit=max_iterations,
+            )
+            self.budget_tracker.start_timer(self.loop_id)
+            # Initialize RAG service (lazy, may be None if unavailable)
+            self._rag_service: LlamaIndexRAGService | None = None
+        # Graph orchestrator (lazy initialization)
+        self._graph_orchestrator: Any = None
+    async def run(
+        self,
+        query: str,
+        background_context: str = "",
+        output_length: str = "",
+        output_instructions: str = "",
+    ) -> str:
+        """
+        Run the iterative research flow.
+        Args:
+            query: The research query
+            background_context: Optional background context
+            output_length: Optional description of desired output length
+            output_instructions: Optional additional instructions
+        Returns:
+            Final report string
+        """
+        if self.use_graph:
+            return await self._run_with_graph(
+                query, background_context, output_length, output_instructions
+            )
+        else:
+            return await self._run_with_chains(
+                query, background_context, output_length, output_instructions
+            )
+    async def _run_with_chains(
+        self,
+        query: str,
+        background_context: str = "",
+        output_length: str = "",
+        output_instructions: str = "",
+    ) -> str:
+        """
+        Run the iterative research flow using agent chains.
+        Args:
+            query: The research query
+            background_context: Optional background context
+            output_length: Optional description of desired output length
+            output_instructions: Optional additional instructions
+        Returns:
+            Final report string
+        """
+        self.start_time = time.time()
+        self.logger.info("Starting iterative research (agent chains)", query=query[:100])
+        # Initialize conversation with first iteration
+        self.conversation.add_iteration()
+        # Main research loop
+        while self.should_continue and self._check_constraints():
+            self.iteration += 1
+            self.logger.info("Starting iteration", iteration=self.iteration)
+            # Add new iteration to conversation
+            self.conversation.add_iteration()
+            # 1. Generate observations
+            await self._generate_observations(query, background_context)
+            # 2. Evaluate gaps
+            evaluation = await self._evaluate_gaps(query, background_context)
+            # 3. Assess with judge (after tools execute, we'll assess again)
+            # For now, check knowledge gap evaluation
+            # After tool execution, we'll do a full judge assessment
+            # Check if research is complete (knowledge gap agent says complete)
+            if evaluation.research_complete:
+                self.should_continue = False
+                self.logger.info("Research marked as complete by knowledge gap agent")
+                break
+            # 4. Select tools for next gap
+            next_gap = evaluation.outstanding_gaps[0] if evaluation.outstanding_gaps else query
+            selection_plan = await self._select_agents(next_gap, query, background_context)
+            # 5. Execute tools
+            await self._execute_tools(selection_plan.tasks)
+            # 6. Assess evidence sufficiency with judge
+            judge_assessment = await self._assess_with_judge(query)
+            # Check if judge says evidence is sufficient
+            if judge_assessment.sufficient:
+                self.should_continue = False
+                self.logger.info(
+                    "Research marked as complete by judge",
+                    confidence=judge_assessment.confidence,
+                    reasoning=judge_assessment.reasoning[:100],
+                )
+                break
+            # Update budget tracker
+            self.budget_tracker.increment_iteration(self.loop_id)
+            self.budget_tracker.update_timer(self.loop_id)
+        # Create final report
+        report = await self._create_final_report(query, output_length, output_instructions)
+        elapsed = time.time() - (self.start_time or time.time())
+        self.logger.info(
+            "Iterative research completed",
+            iterations=self.iteration,
+            elapsed_minutes=elapsed / 60,
+        )
+        return report
+    async def _run_with_graph(
+        self,
+        query: str,
+        background_context: str = "",
+        output_length: str = "",
+        output_instructions: str = "",
+    ) -> str:
+        """
+        Run the iterative research flow using graph execution.
+        Args:
+            query: The research query
+            background_context: Optional background context (currently ignored in graph execution)
+            output_length: Optional description of desired output length (currently ignored in graph execution)
+            output_instructions: Optional additional instructions (currently ignored in graph execution)
+        Returns:
+            Final report string
+        """
+        self.logger.info("Starting iterative research (graph execution)", query=query[:100])
+        # Create graph orchestrator (lazy initialization)
+        if self._graph_orchestrator is None:
+            self._graph_orchestrator = create_graph_orchestrator(
+                mode="iterative",
+                max_iterations=self.max_iterations,
+                max_time_minutes=self.max_time_minutes,
+                use_graph=True,
+            )
+        # Run orchestrator and collect events
+        final_report = ""
+        async for event in self._graph_orchestrator.run(query):
+            if event.type == "complete":
+                final_report = event.message
+                break
+            elif event.type == "error":
+                self.logger.error("Graph execution error", error=event.message)
+                raise RuntimeError(f"Graph execution failed: {event.message}")
+        if not final_report:
+            self.logger.warning("No complete event received from graph orchestrator")
+            final_report = "Research completed but no report was generated."
+        self.logger.info("Iterative research completed (graph execution)")
+        return final_report
+    def _check_constraints(self) -> bool:
+        """Check if we've exceeded constraints."""
+        if self.iteration >= self.max_iterations:
+            self.logger.info("Max iterations reached", max=self.max_iterations)
+            return False
+        if self.start_time:
+            elapsed_minutes = (time.time() - self.start_time) / 60
+            if elapsed_minutes >= self.max_time_minutes:
+                self.logger.info("Max time reached", max=self.max_time_minutes)
+                return False
+        # Check budget tracker
+        self.budget_tracker.update_timer(self.loop_id)
+        exceeded, reason = self.budget_tracker.check_budget(self.loop_id)
+        if exceeded:
+            self.logger.info("Budget exceeded", reason=reason)
+            return False
+        return True
+    async def _generate_observations(self, query: str, background_context: str = "") -> str:
+        """Generate observations from current research state."""
+        # Build input prompt for token estimation
+        conversation_history = self.conversation.compile_conversation_history()
+        # Build background context section separately to avoid backslash in f-string
+        background_section = (
+            f"BACKGROUND CONTEXT:\n{background_context}\n\n" if background_context else ""
+        )
+        input_prompt = f"""
+You are starting iteration {self.iteration} of your research process.
+ORIGINAL QUERY:
+{query}
+{background_section}HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        observations = await self.thinking_agent.generate_observations(
+            query=query,
+            background_context=background_context,
+            conversation_history=conversation_history,
+            iteration=self.iteration,
+        )
+        # Track tokens for this iteration
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(input_prompt, observations)
+        self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
+        self.logger.debug(
+            "Tokens tracked for thinking agent",
+            iteration=self.iteration,
+            tokens=estimated_tokens,
+        )
+        self.conversation.set_latest_thought(observations)
+        return observations
+    async def _evaluate_gaps(self, query: str, background_context: str = "") -> KnowledgeGapOutput:
+        """Evaluate knowledge gaps in current research."""
+        if self.start_time:
+            elapsed_minutes = (time.time() - self.start_time) / 60
+        else:
+            elapsed_minutes = 0.0
+        # Build input prompt for token estimation
+        conversation_history = self.conversation.compile_conversation_history()
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        input_prompt = f"""
+Current Iteration Number: {self.iteration}
+Time Elapsed: {elapsed_minutes:.2f} minutes of maximum {self.max_time_minutes} minutes
+ORIGINAL QUERY:
+{query}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        evaluation = await self.knowledge_gap_agent.evaluate(
+            query=query,
+            background_context=background_context,
+            conversation_history=conversation_history,
+            iteration=self.iteration,
+            time_elapsed_minutes=elapsed_minutes,
+            max_time_minutes=self.max_time_minutes,
+        )
+        # Track tokens for this iteration
+        evaluation_text = f"research_complete={evaluation.research_complete}, gaps={len(evaluation.outstanding_gaps)}"
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
+            input_prompt, evaluation_text
+        )
+        self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
+        self.logger.debug(
+            "Tokens tracked for knowledge gap agent",
+            iteration=self.iteration,
+            tokens=estimated_tokens,
+        )
+        if not evaluation.research_complete and evaluation.outstanding_gaps:
+            self.conversation.set_latest_gap(evaluation.outstanding_gaps[0])
+        return evaluation
+    async def _assess_with_judge(self, query: str) -> JudgeAssessment:
+        """Assess evidence sufficiency using JudgeHandler.
+        Args:
+            query: The research query
+        Returns:
+            JudgeAssessment with sufficiency evaluation
+        """
+        state = get_workflow_state()
+        evidence = state.evidence  # Get all collected evidence
+        self.logger.info(
+            "Assessing evidence with judge",
+            query=query[:100],
+            evidence_count=len(evidence),
+        )
+        assessment = await self.judge_handler.assess(query, evidence)
+        # Track tokens for judge call
+        # Estimate tokens from query + evidence + assessment
+        evidence_text = "\n".join([e.content[:500] for e in evidence[:10]])  # Sample
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
+            query + evidence_text, str(assessment.reasoning)
+        )
+        self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
+        self.logger.info(
+            "Judge assessment complete",
+            sufficient=assessment.sufficient,
+            confidence=assessment.confidence,
+            recommendation=assessment.recommendation,
+        )
+        return assessment
+    async def _select_agents(
+        self, gap: str, query: str, background_context: str = ""
+    ) -> AgentSelectionPlan:
+        """Select tools to address knowledge gap."""
+        # Build input prompt for token estimation
+        conversation_history = self.conversation.compile_conversation_history()
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        input_prompt = f"""
+ORIGINAL QUERY:
+{query}
+KNOWLEDGE GAP TO ADDRESS:
+{gap}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        selection_plan = await self.tool_selector_agent.select_tools(
+            gap=gap,
+            query=query,
+            background_context=background_context,
+            conversation_history=conversation_history,
+        )
+        # Track tokens for this iteration
+        selection_text = f"tasks={len(selection_plan.tasks)}, agents={[task.agent for task in selection_plan.tasks]}"
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
+            input_prompt, selection_text
+        )
+        self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
+        self.logger.debug(
+            "Tokens tracked for tool selector agent",
+            iteration=self.iteration,
+            tokens=estimated_tokens,
+        )
+        # Store tool calls in conversation
+        tool_calls = [
+            f"[Agent] {task.agent} [Query] {task.query} [Entity] {task.entity_website or 'null'}"
+            for task in selection_plan.tasks
+        ]
+        self.conversation.set_latest_tool_calls(tool_calls)
+        return selection_plan
+    def _get_rag_service(self) -> LlamaIndexRAGService | None:
+        """
+        Get or create RAG service instance.
+        Returns:
+            RAG service instance, or None if unavailable
+        """
+        if self._rag_service is None:
+            try:
+                self._rag_service = get_rag_service()
+                self.logger.info("RAG service initialized for research flow")
+            except (ConfigurationError, ImportError) as e:
+                self.logger.warning(
+                    "RAG service unavailable", error=str(e), hint="OPENAI_API_KEY required"
+                )
+                return None
+        return self._rag_service
+    async def _execute_tools(self, tasks: list[AgentTask]) -> dict[str, ToolAgentOutput]:
+        """Execute selected tools concurrently."""
+        try:
+            results = await execute_tool_tasks(tasks)
+        except Exception as e:
+            # Handle tool execution errors gracefully
+            self.logger.error(
+                "Tool execution failed",
+                error=str(e),
+                task_count=len(tasks),
+                exc_info=True,
+            )
+            # Return empty results to allow research flow to continue
+            # The flow can still generate a report based on previous iterations
+            results = {}
+        # Store findings in conversation (only if we have results)
+        evidence_list: list[Evidence] = []
+        if results:
+            findings = [result.output for result in results.values()]
+            self.conversation.set_latest_findings(findings)
+            # Convert tool outputs to Evidence objects and store in workflow state
+            evidence_list = self._convert_tool_outputs_to_evidence(results)
+        if evidence_list:
+            state = get_workflow_state()
+            added_count = state.add_evidence(evidence_list)
+            self.logger.info(
+                "Evidence added to workflow state",
+                count=added_count,
+                total_evidence=len(state.evidence),
+            )
+            # Ingest evidence into RAG if available (Phase 6 requirement)
+            rag_service = self._get_rag_service()
+            if rag_service is not None:
+                try:
+                    # ingest_evidence is synchronous, run in executor to avoid blocking
+                    loop = asyncio.get_event_loop()
+                    await loop.run_in_executor(None, rag_service.ingest_evidence, evidence_list)
+                    self.logger.info(
+                        "Evidence ingested into RAG",
+                        count=len(evidence_list),
+                    )
+                except Exception as e:
+                    # Don't fail the research loop if RAG ingestion fails
+                    self.logger.warning(
+                        "Failed to ingest evidence into RAG",
+                        error=str(e),
+                        count=len(evidence_list),
+                    )
+        return results
+    def _convert_tool_outputs_to_evidence(
+        self, tool_results: dict[str, ToolAgentOutput]
+    ) -> list[Evidence]:
+        """Convert ToolAgentOutput to Evidence objects.
+        Args:
+            tool_results: Dictionary of tool execution results
+        Returns:
+            List of Evidence objects
+        """
+        evidence_list = []
+        for key, result in tool_results.items():
+            # Extract URLs from sources
+            if result.sources:
+                # Create one Evidence object per source URL
+                for url in result.sources:
+                    # Determine source type from URL or tool name
+                    # Default to "web" for unknown web sources
+                    source_type: SourceName = "web"
+                    if "pubmed" in url.lower() or "ncbi" in url.lower():
+                        source_type = "pubmed"
+                    elif "clinicaltrials" in url.lower():
+                        source_type = "clinicaltrials"
+                    elif "europepmc" in url.lower():
+                        source_type = "europepmc"
+                    elif "biorxiv" in url.lower():
+                        source_type = "biorxiv"
+                    elif "arxiv" in url.lower() or "preprint" in url.lower():
+                        source_type = "preprint"
+                    # Note: "web" is now a valid SourceName for general web sources
+                    citation = Citation(
+                        title=f"Tool Result: {key}",
+                        url=url,
+                        source=source_type,
+                        date="n.d.",
+                        authors=[],
+                    )
+                    # Truncate content to reasonable length for judge (1500 chars)
+                    content = result.output[:1500]
+                    if len(result.output) > 1500:
+                        content += "... [truncated]"
+                    evidence = Evidence(
+                        content=content,
+                        citation=citation,
+                        relevance=0.5,  # Default relevance
+                    )
+                    evidence_list.append(evidence)
+            else:
+                # No URLs, create a single Evidence object with tool output
+                # Use a placeholder URL based on the tool name
+                # Determine source type from tool name
+                tool_source_type: SourceName = "web"  # Default for unknown sources
+                if "RAG" in key:
+                    tool_source_type = "rag"
+                elif "WebSearch" in key or "SiteCrawler" in key:
+                    tool_source_type = "web"
+                # "web" is now a valid SourceName for general web sources
+                citation = Citation(
+                    title=f"Tool Result: {key}",
+                    url=f"tool://{key}",
+                    source=tool_source_type,
+                    date="n.d.",
+                    authors=[],
+                )
+                content = result.output[:1500]
+                if len(result.output) > 1500:
+                    content += "... [truncated]"
+                evidence = Evidence(
+                    content=content,
+                    citation=citation,
+                    relevance=0.5,
+                )
+                evidence_list.append(evidence)
+        return evidence_list
+    async def _create_final_report(
+        self, query: str, length: str = "", instructions: str = ""
+    ) -> str:
+        """Create final report from all findings."""
+        all_findings = "\n\n".join(self.conversation.get_all_findings())
+        if not all_findings:
+            all_findings = "No findings available yet."
+        # Build input prompt for token estimation
+        length_str = f"* The full response should be approximately {length}.\n" if length else ""
+        instructions_str = f"* {instructions}" if instructions else ""
+        guidelines_str = (
+            ("\n\nGUIDELINES:\n" + length_str + instructions_str).strip("\n")
+            if length or instructions
+            else ""
+        )
+        input_prompt = f"""
+Provide a response based on the query and findings below with as much detail as possible. {guidelines_str}
+QUERY: {query}
+FINDINGS:
+{all_findings}
+"""
+        report = await self.writer_agent.write_report(
+            query=query,
+            findings=all_findings,
+            output_length=length,
+            output_instructions=instructions,
+        )
+        # Track tokens for final report (not per iteration, just total)
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(input_prompt, report)
+        self.budget_tracker.add_tokens(self.loop_id, estimated_tokens)
+        self.logger.debug(
+            "Tokens tracked for writer agent (final report)",
+            tokens=estimated_tokens,
+        )
+        # Note: Citation validation for markdown reports would require Evidence objects
+        # Currently, findings are strings, not Evidence objects. For full validation,
+        # consider using ResearchReport format or passing Evidence objects separately.
+        # See src/utils/citation_validator.py for markdown citation validation utilities.
+        return report
+class DeepResearchFlow:
+    """
+    Deep research flow that runs parallel iterative loops per section.
+    Pattern: Plan → Parallel Iterative Loops (one per section) → Synthesis
+    """
+    def __init__(
+        self,
+        max_iterations: int = 5,
+        max_time_minutes: int = 10,
+        verbose: bool = True,
+        use_long_writer: bool = True,
+        use_graph: bool = False,
+    ) -> None:
+        """
+        Initialize deep research flow.
+        Args:
+            max_iterations: Maximum iterations per section
+            max_time_minutes: Maximum time per section
+            verbose: Whether to log progress
+            use_long_writer: Whether to use long writer (True) or proofreader (False)
+            use_graph: Whether to use graph-based execution (True) or agent chains (False)
+        """
+        self.max_iterations = max_iterations
+        self.max_time_minutes = max_time_minutes
+        self.verbose = verbose
+        self.use_long_writer = use_long_writer
+        self.use_graph = use_graph
+        self.logger = logger
+        # Initialize agents (only needed for agent chain execution)
+        if not use_graph:
+            self.planner_agent = create_planner_agent()
+            self.long_writer_agent = create_long_writer_agent()
+            self.proofreader_agent = create_proofreader_agent()
+            # Initialize judge handler for section loop completion
+            self.judge_handler = create_judge_handler()
+            # Initialize budget tracker for token tracking
+            self.budget_tracker = BudgetTracker()
+            self.loop_id = "deep_research_flow"
+            self.budget_tracker.create_budget(
+                loop_id=self.loop_id,
+                tokens_limit=200000,  # Higher limit for deep research
+                time_limit_seconds=max_time_minutes
+                * 60
+                * 2,  # Allow more time for parallel sections
+                iterations_limit=max_iterations * 10,  # Allow for multiple sections
+            )
+            self.budget_tracker.start_timer(self.loop_id)
+        # Graph orchestrator (lazy initialization)
+        self._graph_orchestrator: Any = None
+    async def run(self, query: str) -> str:
+        """
+        Run the deep research flow.
+        Args:
+            query: The research query
+        Returns:
+            Final report string
+        """
+        if self.use_graph:
+            return await self._run_with_graph(query)
+        else:
+            return await self._run_with_chains(query)
+    async def _run_with_chains(self, query: str) -> str:
+        """
+        Run the deep research flow using agent chains.
+        Args:
+            query: The research query
+        Returns:
+            Final report string
+        """
+        self.logger.info("Starting deep research (agent chains)", query=query[:100])
+        # Initialize workflow state for deep research
+        try:
+            from src.services.embeddings import get_embedding_service
+            embedding_service = get_embedding_service()
+        except (ImportError, Exception):
+            # If embedding service is unavailable, initialize without it
+            embedding_service = None
+            self.logger.debug("Embedding service unavailable, initializing state without it")
+        init_workflow_state(embedding_service=embedding_service)
+        self.logger.debug("Workflow state initialized for deep research")
+        # 1. Build report plan
+        report_plan = await self._build_report_plan(query)
+        self.logger.info(
+            "Report plan created",
+            sections=len(report_plan.report_outline),
+            title=report_plan.report_title,
+        )
+        # 2. Run parallel research loops with state synchronization
+        section_drafts = await self._run_research_loops(report_plan)
+        # Verify state synchronization - log evidence count
+        state = get_workflow_state()
+        self.logger.info(
+            "State synchronization complete",
+            total_evidence=len(state.evidence),
+            sections_completed=len(section_drafts),
+        )
+        # 3. Create final report
+        final_report = await self._create_final_report(query, report_plan, section_drafts)
+        self.logger.info(
+            "Deep research completed",
+            sections=len(section_drafts),
+            final_report_length=len(final_report),
+        )
+        return final_report
+    async def _run_with_graph(self, query: str) -> str:
+        """
+        Run the deep research flow using graph execution.
+        Args:
+            query: The research query
+        Returns:
+            Final report string
+        """
+        self.logger.info("Starting deep research (graph execution)", query=query[:100])
+        # Create graph orchestrator (lazy initialization)
+        if self._graph_orchestrator is None:
+            self._graph_orchestrator = create_graph_orchestrator(
+                mode="deep",
+                max_iterations=self.max_iterations,
+                max_time_minutes=self.max_time_minutes,
+                use_graph=True,
+            )
+        # Run orchestrator and collect events
+        final_report = ""
+        async for event in self._graph_orchestrator.run(query):
+            if event.type == "complete":
+                final_report = event.message
+                break
+            elif event.type == "error":
+                self.logger.error("Graph execution error", error=event.message)
+                raise RuntimeError(f"Graph execution failed: {event.message}")
+        if not final_report:
+            self.logger.warning("No complete event received from graph orchestrator")
+            final_report = "Research completed but no report was generated."
+        self.logger.info("Deep research completed (graph execution)")
+        return final_report
+    async def _build_report_plan(self, query: str) -> ReportPlan:
+        """Build the initial report plan."""
+        self.logger.info("Building report plan")
+        # Build input prompt for token estimation
+        input_prompt = f"QUERY: {query}"
+        report_plan = await self.planner_agent.run(query)
+        # Track tokens for planner agent
+        if not self.use_graph and hasattr(self, "budget_tracker"):
+            plan_text = (
+                f"title={report_plan.report_title}, sections={len(report_plan.report_outline)}"
+            )
+            estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(input_prompt, plan_text)
+            self.budget_tracker.add_tokens(self.loop_id, estimated_tokens)
+            self.logger.debug(
+                "Tokens tracked for planner agent",
+                tokens=estimated_tokens,
+            )
+        self.logger.info(
+            "Report plan created",
+            sections=len(report_plan.report_outline),
+            has_background=bool(report_plan.background_context),
+        )
+        return report_plan
+    async def _run_research_loops(self, report_plan: ReportPlan) -> list[str]:
+        """Run parallel iterative research loops for each section."""
+        self.logger.info("Running research loops", sections=len(report_plan.report_outline))
+        # Create workflow manager for parallel execution
+        workflow_manager = WorkflowManager()
+        # Create loop configurations
+        loop_configs = [
+            {
+                "loop_id": f"section_{i}",
+                "query": section.key_question,
+                "section_title": section.title,
+                "background_context": report_plan.background_context,
+            }
+            for i, section in enumerate(report_plan.report_outline)
+        ]
+        async def run_research_for_section(config: dict[str, Any]) -> str:
+            """Run iterative research for a single section."""
+            loop_id = config.get("loop_id", "unknown")
+            query = config.get("query", "")
+            background_context = config.get("background_context", "")
+            try:
+                # Update loop status
+                await workflow_manager.update_loop_status(loop_id, "running")
+                # Create iterative research flow
+                flow = IterativeResearchFlow(
+                    max_iterations=self.max_iterations,
+                    max_time_minutes=self.max_time_minutes,
+                    verbose=self.verbose,
+                    use_graph=self.use_graph,
+                    judge_handler=self.judge_handler if not self.use_graph else None,
+                )
+                # Run research
+                result = await flow.run(
+                    query=query,
+                    background_context=background_context,
+                )
+                # Sync evidence from flow to loop
+                state = get_workflow_state()
+                if state.evidence:
+                    await workflow_manager.add_loop_evidence(loop_id, state.evidence)
+                # Update loop status
+                await workflow_manager.update_loop_status(loop_id, "completed")
+                return result
+            except Exception as e:
+                error_msg = str(e)
+                await workflow_manager.update_loop_status(loop_id, "failed", error=error_msg)
+                self.logger.error(
+                    "Section research failed",
+                    loop_id=loop_id,
+                    error=error_msg,
+                )
+                raise
+        # Run all sections in parallel using workflow manager
+        section_drafts = await workflow_manager.run_loops_parallel(
+            loop_configs=loop_configs,
+            loop_func=run_research_for_section,
+            judge_handler=self.judge_handler if not self.use_graph else None,
+            budget_tracker=self.budget_tracker if not self.use_graph else None,
+        )
+        # Sync evidence from all loops to global state
+        for config in loop_configs:
+            loop_id = config.get("loop_id")
+            if loop_id:
+                await workflow_manager.sync_loop_evidence_to_state(loop_id)
+        # Filter out None results (failed loops)
+        section_drafts = [draft for draft in section_drafts if draft is not None]
+        self.logger.info(
+            "Research loops completed",
+            drafts=len(section_drafts),
+            total_sections=len(report_plan.report_outline),
+        )
+        return section_drafts
+    async def _create_final_report(
+        self, query: str, report_plan: ReportPlan, section_drafts: list[str]
+    ) -> str:
+        """Create final report from section drafts."""
+        self.logger.info("Creating final report")
+        # Create ReportDraft from section drafts
+        report_draft = ReportDraft(
+            sections=[
+                ReportDraftSection(
+                    section_title=section.title,
+                    section_content=draft,
+                )
+                for section, draft in zip(report_plan.report_outline, section_drafts, strict=False)
+            ]
+        )
+        # Build input prompt for token estimation
+        draft_text = "\n".join(
+            [s.section_content[:500] for s in report_draft.sections[:5]]
+        )  # Sample
+        input_prompt = f"QUERY: {query}\nTITLE: {report_plan.report_title}\nDRAFT: {draft_text}"
+        if self.use_long_writer:
+            # Use long writer agent
+            final_report = await self.long_writer_agent.write_report(
+                original_query=query,
+                report_title=report_plan.report_title,
+                report_draft=report_draft,
+            )
+        else:
+            # Use proofreader agent
+            final_report = await self.proofreader_agent.proofread(
+                query=query,
+                report_draft=report_draft,
+            )
+        # Track tokens for final report synthesis
+        if not self.use_graph and hasattr(self, "budget_tracker"):
+            estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
+                input_prompt, final_report
+            )
+            self.budget_tracker.add_tokens(self.loop_id, estimated_tokens)
+            self.logger.debug(
+                "Tokens tracked for final report synthesis",
+                tokens=estimated_tokens,
+                agent="long_writer" if self.use_long_writer else "proofreader",
+            )
+        self.logger.info("Final report created", length=len(final_report))
+        return final_report