File size: 5,903 Bytes
7c07ade
 
 
 
 
 
 
 
77627ff
7c07ade
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77627ff
7c07ade
 
77627ff
7c07ade
77627ff
d0b14c0
481bdd7
77627ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c07ade
 
 
 
77627ff
20ba79b
77627ff
953b850
7c07ade
953b850
77627ff
 
 
 
 
 
 
 
953b850
7c07ade
953b850
77627ff
 
 
 
 
 
 
 
953b850
7c07ade
953b850
77627ff
 
 
 
7c07ade
77627ff
953b850
7c07ade
953b850
77627ff
 
 
7c07ade
 
 
d7e5abb
 
 
 
9760706
 
 
 
d7e5abb
 
 
 
 
 
77627ff
20ba79b
77627ff
 
 
 
d7e5abb
7c07ade
77627ff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# Implementation Roadmap: DeepCritical (Vertical Slices)

**Philosophy:** AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).

This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.

---

## The 2025 "Gucci" Tooling Stack

We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.

| Category | Tool | Why? |
|----------|------|------|
| **Package Manager** | **`uv`** | Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. |
| **Linting/Format** | **`ruff`** | Rust-based, instant. Replaces black, isort, flake8. |
| **Type Checking** | **`mypy`** | Strict static typing. Run via `uv run mypy`. |
| **Testing** | **`pytest`** | The standard. |
| **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
| **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
| **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
| **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |

---

## Architecture: Vertical Slices

Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
Each slice implements a feature from **Entry Point (UI/API) -> Logic -> Data/External**.

### Directory Structure (Maintainer's Structure)

```bash
src/
β”œβ”€β”€ app.py                      # Entry point (Gradio UI)
β”œβ”€β”€ orchestrator.py             # Agent loop (Search -> Judge -> Loop)
β”œβ”€β”€ agent_factory/              # Agent creation and judges
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ agents.py               # PydanticAI agent definitions
β”‚   └── judges.py               # JudgeHandler for evidence assessment
β”œβ”€β”€ tools/                      # Search tools
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ pubmed.py               # PubMed E-utilities tool
β”‚   β”œβ”€β”€ websearch.py            # DuckDuckGo search tool
β”‚   └── search_handler.py       # Orchestrates multiple tools
β”œβ”€β”€ prompts/                    # Prompt templates
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── judge.py                # Judge prompts
β”œβ”€β”€ utils/                      # Shared utilities
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ config.py               # Settings/configuration
β”‚   β”œβ”€β”€ exceptions.py           # Custom exceptions
β”‚   β”œβ”€β”€ models.py               # Shared Pydantic models
β”‚   β”œβ”€β”€ dataloaders.py          # Data loading utilities
β”‚   └── parsers.py              # Parsing utilities
β”œβ”€β”€ middleware/                 # (Future: middleware components)
β”œβ”€β”€ database_services/          # (Future: database integrations)
└── retrieval_factory/          # (Future: RAG components)

tests/
β”œβ”€β”€ unit/
β”‚   β”œβ”€β”€ tools/
β”‚   β”‚   β”œβ”€β”€ test_pubmed.py
β”‚   β”‚   β”œβ”€β”€ test_websearch.py
β”‚   β”‚   └── test_search_handler.py
β”‚   β”œβ”€β”€ agent_factory/
β”‚   β”‚   └── test_judges.py
β”‚   └── test_orchestrator.py
└── integration/
    └── test_pubmed_live.py
```

---

## Phased Execution Plan

### **Phase 1: Foundation & Tooling (Day 1)**

*Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*

- [ ] Initialize `pyproject.toml` with `uv`.
- [ ] Configure `ruff` (strict) and `mypy` (strict).
- [ ] Set up `pytest` with sugar and coverage.
- [ ] Implement `src/utils/config.py` (Configuration Slice).
- [ ] Implement `src/utils/exceptions.py` (Custom exceptions).
- **Deliverable**: A repo that passes CI with `uv run pytest`.

### **Phase 2: The "Search" Vertical Slice (Day 2)**

*Goal: Agent can receive a query and get raw results from PubMed/Web.*

- [ ] **TDD**: Write test for `SearchHandler`.
- [ ] Implement `src/tools/pubmed.py` (PubMed E-utilities).
- [ ] Implement `src/tools/websearch.py` (DuckDuckGo).
- [ ] Implement `src/tools/search_handler.py` (Orchestrates tools).
- [ ] Implement `src/utils/models.py` (Evidence, Citation, SearchResult).
- **Deliverable**: Function that takes "long covid" -> returns `List[Evidence]`.

### **Phase 3: The "Judge" Vertical Slice (Day 3)**

*Goal: Agent can decide if evidence is sufficient.*

- [ ] **TDD**: Write test for `JudgeHandler` (Mocked LLM).
- [ ] Implement `src/prompts/judge.py` (Structured outputs).
- [ ] Implement `src/agent_factory/judges.py` (LLM interaction).
- **Deliverable**: Function that takes `List[Evidence]` -> returns `JudgeAssessment`.

### **Phase 4: The "Loop" & UI Slice (Day 4)**

*Goal: End-to-End User Value.*

- [ ] Implement `src/orchestrator.py` (Connects Search + Judge loops).
- [ ] Build `src/app.py` (Gradio with Streaming).
- **Deliverable**: Working DeepCritical Agent on HuggingFace.

---

### **Phase 5: Magentic Integration (OPTIONAL - Post-MVP)**

*Goal: Upgrade orchestrator to use Microsoft Agent Framework patterns.*

- [ ] Wrap SearchHandler as `AgentProtocol` (SearchAgent) with strict protocol compliance.
- [ ] Wrap JudgeHandler as `AgentProtocol` (JudgeAgent) with strict protocol compliance.
- [ ] Implement `MagenticOrchestrator` using `MagenticBuilder`.
- [ ] Create factory pattern for switching implementations.
- **Deliverable**: Same API, better multi-agent orchestration engine.

**NOTE**: Only implement Phase 5 if time permits after MVP is shipped.

---

## Spec Documents

1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)**
2. **[Phase 2 Spec: Search Slice](02_phase_search.md)**
3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)**
4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)**
5. **[Phase 5 Spec: Magentic Integration](05_phase_magentic.md)** *(Optional)*

*Start by reading Phase 1 Spec to initialize the repo.*