VibecoderMcSwaggins commited on
Commit
5e7604a
Β·
1 Parent(s): 521d97d

docs: add AI agent context files for team collaboration

Browse files

Added AGENTS.md, CLAUDE.md, GEMINI.md - AI-assisted scoping documents
that help AI coding tools understand the project architecture,
coding standards, and development workflow.

Requested by

@Tonic
for team acceleration.

Files changed (3) hide show
  1. AGENTS.md +102 -0
  2. CLAUDE.md +95 -0
  3. GEMINI.md +54 -0
AGENTS.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AGENTS.md
2
+
3
+ This file provides guidance to AI agents when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
8
+
9
+ ## Development Commands
10
+
11
+ ```bash
12
+ # Install all dependencies (including dev)
13
+ make install # or: uv sync --all-extras && uv run pre-commit install
14
+
15
+ # Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
16
+ make check
17
+
18
+ # Individual commands
19
+ make test # uv run pytest tests/unit/ -v
20
+ make lint # uv run ruff check src tests
21
+ make format # uv run ruff format src tests
22
+ make typecheck # uv run mypy src
23
+ make test-cov # uv run pytest --cov=src --cov-report=term-missing
24
+
25
+ # Run single test
26
+ uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
27
+
28
+ # Integration tests (real APIs)
29
+ uv run pytest -m integration
30
+ ```
31
+
32
+ ## Architecture
33
+
34
+ **Pattern**: Search-and-judge loop with multi-tool orchestration.
35
+
36
+ ```
37
+ User Question β†’ Orchestrator
38
+ ↓
39
+ Search Loop:
40
+ 1. Query PubMed
41
+ 2. Gather evidence
42
+ 3. Judge quality ("Do we have enough?")
43
+ 4. If NO β†’ Refine query, search more
44
+ 5. If YES β†’ Synthesize findings
45
+ ↓
46
+ Research Report with Citations
47
+ ```
48
+
49
+ **Key Components**:
50
+ - `src/orchestrator.py` - Main agent loop
51
+ - `src/tools/pubmed.py` - PubMed E-utilities search
52
+ - `src/tools/search_handler.py` - Scatter-gather orchestration
53
+ - `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
54
+ - `src/agent_factory/judges.py` - LLM-based evidence assessment
55
+ - `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
56
+ - `src/utils/config.py` - Pydantic Settings (loads from `.env`)
57
+ - `src/utils/models.py` - Evidence, Citation, SearchResult models
58
+ - `src/utils/exceptions.py` - Exception hierarchy
59
+ - `src/app.py` - Gradio UI (HuggingFace Spaces)
60
+
61
+ **Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
62
+
63
+ ## Configuration
64
+
65
+ Settings via pydantic-settings from `.env`:
66
+ - `LLM_PROVIDER`: "openai" or "anthropic"
67
+ - `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
68
+ - `NCBI_API_KEY`: Optional, for higher PubMed rate limits
69
+ - `MAX_ITERATIONS`: 1-50, default 10
70
+ - `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
71
+
72
+ ## Exception Hierarchy
73
+
74
+ ```
75
+ DeepCriticalError (base)
76
+ β”œβ”€β”€ SearchError
77
+ β”‚ └── RateLimitError
78
+ β”œβ”€β”€ JudgeError
79
+ └── ConfigurationError
80
+ ```
81
+
82
+ ## Testing
83
+
84
+ - **TDD**: Write tests first in `tests/unit/`, implement in `src/`
85
+ - **Markers**: `unit`, `integration`, `slow`
86
+ - **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
87
+ - **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
88
+
89
+ ## Coding Standards
90
+
91
+ - Python 3.11+, strict mypy, ruff (100-char lines)
92
+ - Type all functions, use Pydantic models for data
93
+ - Use `structlog` for logging, not print
94
+ - Conventional commits: `feat(scope):`, `fix:`, `docs:`
95
+
96
+ ## Git Workflow
97
+
98
+ - `main`: Production-ready
99
+ - `dev`: Development
100
+ - `vcms-dev`: HuggingFace Spaces sandbox
101
+ - Remote `origin`: GitHub
102
+ - Remote `huggingface-upstream`: HuggingFace Spaces
CLAUDE.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
8
+
9
+ ## Development Commands
10
+
11
+ ```bash
12
+ # Install all dependencies (including dev)
13
+ make install # or: uv sync --all-extras && uv run pre-commit install
14
+
15
+ # Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
16
+ make check
17
+
18
+ # Individual commands
19
+ make test # uv run pytest tests/unit/ -v
20
+ make lint # uv run ruff check src tests
21
+ make format # uv run ruff format src tests
22
+ make typecheck # uv run mypy src
23
+ make test-cov # uv run pytest --cov=src --cov-report=term-missing
24
+
25
+ # Run single test
26
+ uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
27
+
28
+ # Integration tests (real APIs)
29
+ uv run pytest -m integration
30
+ ```
31
+
32
+ ## Architecture
33
+
34
+ **Pattern**: Search-and-judge loop with multi-tool orchestration.
35
+
36
+ ```
37
+ User Question β†’ Orchestrator
38
+ ↓
39
+ Search Loop:
40
+ 1. Query PubMed
41
+ 2. Gather evidence
42
+ 3. Judge quality ("Do we have enough?")
43
+ 4. If NO β†’ Refine query, search more
44
+ 5. If YES β†’ Synthesize findings
45
+ ↓
46
+ Research Report with Citations
47
+ ```
48
+
49
+ **Key Components**:
50
+ - `src/orchestrator.py` - Main agent loop
51
+ - `src/tools/pubmed.py` - PubMed E-utilities search
52
+ - `src/tools/search_handler.py` - Scatter-gather orchestration
53
+ - `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
54
+ - `src/agent_factory/judges.py` - LLM-based evidence assessment
55
+ - `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
56
+ - `src/utils/config.py` - Pydantic Settings (loads from `.env`)
57
+ - `src/utils/models.py` - Evidence, Citation, SearchResult models
58
+ - `src/utils/exceptions.py` - Exception hierarchy
59
+ - `src/app.py` - Gradio UI (HuggingFace Spaces)
60
+
61
+ **Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
62
+
63
+ ## Configuration
64
+
65
+ Settings via pydantic-settings from `.env`:
66
+ - `LLM_PROVIDER`: "openai" or "anthropic"
67
+ - `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
68
+ - `NCBI_API_KEY`: Optional, for higher PubMed rate limits
69
+ - `MAX_ITERATIONS`: 1-50, default 10
70
+ - `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
71
+
72
+ ## Exception Hierarchy
73
+
74
+ ```
75
+ DeepCriticalError (base)
76
+ β”œβ”€β”€ SearchError
77
+ β”‚ └── RateLimitError
78
+ β”œβ”€β”€ JudgeError
79
+ └── ConfigurationError
80
+ ```
81
+
82
+ ## Testing
83
+
84
+ - **TDD**: Write tests first in `tests/unit/`, implement in `src/`
85
+ - **Markers**: `unit`, `integration`, `slow`
86
+ - **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
87
+ - **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
88
+
89
+ ## Git Workflow
90
+
91
+ - `main`: Production-ready
92
+ - `dev`: Development
93
+ - `vcms-dev`: HuggingFace Spaces sandbox
94
+ - Remote `origin`: GitHub
95
+ - Remote `huggingface-upstream`: HuggingFace Spaces
GEMINI.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepCritical Context
2
+
3
+ ## Project Overview
4
+ **DeepCritical** is an AI-native Medical Drug Repurposing Research Agent.
5
+ **Goal:** To accelerate the discovery of new uses for existing drugs by intelligently searching biomedical literature (PubMed), evaluating evidence, and hypothesizing potential applications.
6
+
7
+ **Architecture:**
8
+ The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orchestrator) and adheres to **Strict TDD** (Test-Driven Development).
9
+
10
+ **Current Status:**
11
+ - **Phases 1-8:** COMPLETE. Foundation, Search, Judge, UI, Orchestrator, Embeddings, Hypothesis, Report.
12
+ - **Phase 9 (Source Cleanup):** COMPLETE. Removed DuckDuckGo web search (unreliable for scientific research).
13
+ - **Phase 10-11:** PLANNED. ClinicalTrials.gov and bioRxiv integration.
14
+
15
+ ## Tech Stack & Tooling
16
+ - **Language:** Python 3.11 (Pinned)
17
+ - **Package Manager:** `uv` (Rust-based, extremely fast)
18
+ - **Frameworks:** `pydantic`, `pydantic-ai`, `httpx`, `gradio`
19
+ - **Vector DB:** `chromadb` with `sentence-transformers` for semantic search
20
+ - **Testing:** `pytest`, `pytest-asyncio`, `respx` (for mocking)
21
+ - **Quality:** `ruff` (linting/formatting), `mypy` (strict type checking), `pre-commit`
22
+
23
+ ## Building & Running
24
+ We use a `Makefile` to standardize developer commands.
25
+
26
+ | Command | Description |
27
+ | :--- | :--- |
28
+ | `make install` | Install dependencies and pre-commit hooks. |
29
+ | `make test` | Run unit tests. |
30
+ | `make lint` | Run Ruff linter. |
31
+ | `make format` | Run Ruff formatter. |
32
+ | `make typecheck` | Run Mypy static type checker. |
33
+ | `make check` | **The Golden Gate:** Runs lint, typecheck, and test. Must pass before committing. |
34
+ | `make clean` | Clean up cache and artifacts. |
35
+
36
+ ## Directory Structure
37
+ - `src/`: Source code
38
+ - `utils/`: Shared utilities (`config.py`, `exceptions.py`, `models.py`)
39
+ - `tools/`: Search tools (`pubmed.py`, `base.py`, `search_handler.py`)
40
+ - `services/`: Services (`embeddings.py` - ChromaDB vector store)
41
+ - `agents/`: Magentic multi-agent mode agents
42
+ - `agent_factory/`: Agent definitions (judges, prompts)
43
+ - `tests/`: Test suite
44
+ - `unit/`: Isolated unit tests (Mocked)
45
+ - `integration/`: Real API tests (Marked as slow/integration)
46
+ - `docs/`: Documentation and Implementation Specs
47
+ - `examples/`: Working demos for each phase
48
+
49
+ ## Development Conventions
50
+ 1. **Strict TDD:** Write failing tests in `tests/unit/` *before* implementing logic in `src/`.
51
+ 2. **Type Safety:** All code must pass `mypy --strict`. Use Pydantic models for data exchange.
52
+ 3. **Linting:** Zero tolerance for Ruff errors.
53
+ 4. **Mocking:** Use `respx` or `unittest.mock` for all external API calls in unit tests. Real calls go in `tests/integration`.
54
+ 5. **Vertical Slices:** Implement features end-to-end (Search -> Judge -> UI) rather than layer-by-layer.