File size: 12,576 Bytes
53c4c46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
# Contributing to DeepCritical

Thank you for your interest in contributing to DeepCritical! This guide will help you get started.

## Table of Contents

- [Git Workflow](#git-workflow)
- [Getting Started](#getting-started)
- [Development Commands](#development-commands)
- [Code Style & Conventions](#code-style--conventions)
- [Type Safety](#type-safety)
- [Error Handling & Logging](#error-handling--logging)
- [Testing Requirements](#testing-requirements)
- [Implementation Patterns](#implementation-patterns)
- [Code Quality & Documentation](#code-quality--documentation)
- [Prompt Engineering & Citation Validation](#prompt-engineering--citation-validation)
- [MCP Integration](#mcp-integration)
- [Common Pitfalls](#common-pitfalls)
- [Key Principles](#key-principles)
- [Pull Request Process](#pull-request-process)

## Git Workflow

- `main`: Production-ready (GitHub)
- `dev`: Development integration (GitHub)
- Use feature branches: `yourname-dev`
- **NEVER** push directly to `main` or `dev` on HuggingFace
- GitHub is source of truth; HuggingFace is for deployment

## Getting Started

1. **Fork the repository** on GitHub
2. **Clone your fork**:

   ```bash
   git clone https://github.com/yourusername/GradioDemo.git
   cd GradioDemo
   ```

3. **Install dependencies**:

   ```bash
   make install
   ```

4. **Create a feature branch**:

   ```bash
   git checkout -b yourname-feature-name
   ```

5. **Make your changes** following the guidelines below
6. **Run checks**:

   ```bash
   make check
   ```

7. **Commit and push**:

   ```bash
   git commit -m "Description of changes"
   git push origin yourname-feature-name
   ```
8. **Create a pull request** on GitHub

## Development Commands

```bash
make install      # Install dependencies + pre-commit
make check        # Lint + typecheck + test (MUST PASS)
make test         # Run unit tests
make lint         # Run ruff
make format       # Format with ruff
make typecheck    # Run mypy
make test-cov     # Test with coverage
make docs-build  # Build documentation
make docs-serve  # Serve documentation locally
```

## Code Style & Conventions

### Type Safety

- **ALWAYS** use type hints for all function parameters and return types
- Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
- Use `TYPE_CHECKING` imports for circular dependencies:

```python
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from src.services.embeddings import EmbeddingService
```

### Pydantic Models

- All data exchange uses Pydantic models (`src/utils/models.py`)
- Models are frozen (`model_config = {"frozen": True}`) for immutability
- Use `Field()` with descriptions for all model fields
- Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints

### Async Patterns

- **ALL** I/O operations must be async (`async def`, `await`)
- Use `asyncio.gather()` for parallel operations
- CPU-bound work (embeddings, parsing) must use `run_in_executor()`:

```python
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, cpu_bound_function, args)
```

- Never block the event loop with synchronous I/O

### Linting

- Ruff with 100-char line length
- Ignore rules documented in `pyproject.toml`:
  - `PLR0913`: Too many arguments (agents need many params)
  - `PLR0912`: Too many branches (complex orchestrator logic)
  - `PLR0911`: Too many return statements (complex agent logic)
  - `PLR2004`: Magic values (statistical constants)
  - `PLW0603`: Global statement (singleton pattern)
  - `PLC0415`: Lazy imports for optional dependencies

### Pre-commit

- Run `make check` before committing
- Must pass: lint + typecheck + test-cov
- Pre-commit hooks installed via `make install`
- **CRITICAL**: Make sure you run the full pre-commit checks before opening a PR (not draft), otherwise Obstacle is the Way will lose his mind

## Error Handling & Logging

### Exception Hierarchy

Use custom exception hierarchy (`src/utils/exceptions.py`):

- `DeepCriticalError` (base)
- `SearchError` β†’ `RateLimitError`
- `JudgeError`
- `ConfigurationError`

### Error Handling Rules

- Always chain exceptions: `raise SearchError(...) from e`
- Log errors with context using `structlog`:

```python
logger.error("Operation failed", error=str(e), context=value)
```

- Never silently swallow exceptions
- Provide actionable error messages

### Logging

- Use `structlog` for all logging (NOT `print` or `logging`)
- Import: `import structlog; logger = structlog.get_logger()`
- Log with structured data: `logger.info("event", key=value)`
- Use appropriate levels: DEBUG, INFO, WARNING, ERROR

### Logging Examples

```python
logger.info("Starting search", query=query, tools=[t.name for t in tools])
logger.warning("Search tool failed", tool=tool.name, error=str(result))
logger.error("Assessment failed", error=str(e))
```

### Error Chaining

Always preserve exception context:

```python
try:
    result = await api_call()
except httpx.HTTPError as e:
    raise SearchError(f"API call failed: {e}") from e
```

## Testing Requirements

### Test Structure

- Unit tests in `tests/unit/` (mocked, fast)
- Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`)
- Use markers: `unit`, `integration`, `slow`

### Mocking

- Use `respx` for httpx mocking
- Use `pytest-mock` for general mocking
- Mock LLM calls in unit tests (use `MockJudgeHandler`)
- Fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`

### TDD Workflow

1. Write failing test in `tests/unit/`
2. Implement in `src/`
3. Ensure test passes
4. Run `make check` (lint + typecheck + test)

### Test Examples

```python
@pytest.mark.unit
async def test_pubmed_search(mock_httpx_client):
    tool = PubMedTool()
    results = await tool.search("metformin", max_results=5)
    assert len(results) > 0
    assert all(isinstance(r, Evidence) for r in results)

@pytest.mark.integration
async def test_real_pubmed_search():
    tool = PubMedTool()
    results = await tool.search("metformin", max_results=3)
    assert len(results) <= 3
```

### Test Coverage

- Run `make test-cov` for coverage report
- Aim for >80% coverage on critical paths
- Exclude: `__init__.py`, `TYPE_CHECKING` blocks

## Implementation Patterns

### Search Tools

All tools implement `SearchTool` protocol (`src/tools/base.py`):

- Must have `name` property
- Must implement `async def search(query, max_results) -> list[Evidence]`
- Use `@retry` decorator from tenacity for resilience
- Rate limiting: Implement `_rate_limit()` for APIs with limits (e.g., PubMed)
- Error handling: Raise `SearchError` or `RateLimitError` on failures

Example pattern:

```python
class MySearchTool:
    @property
    def name(self) -> str:
        return "mytool"
    
    @retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
        # Implementation
        return evidence_list
```

### Judge Handlers

- Implement `JudgeHandlerProtocol` (`async def assess(question, evidence) -> JudgeAssessment`)
- Use pydantic-ai `Agent` with `output_type=JudgeAssessment`
- System prompts in `src/prompts/judge.py`
- Support fallback handlers: `MockJudgeHandler`, `HFInferenceJudgeHandler`
- Always return valid `JudgeAssessment` (never raise exceptions)

### Agent Factory Pattern

- Use factory functions for creating agents (`src/agent_factory/`)
- Lazy initialization for optional dependencies (e.g., embeddings, Modal)
- Check requirements before initialization:

```python
def check_magentic_requirements() -> None:
    if not settings.has_openai_key:
        raise ConfigurationError("Magentic requires OpenAI")
```

### State Management

- **Magentic Mode**: Use `ContextVar` for thread-safe state (`src/agents/state.py`)
- **Simple Mode**: Pass state via function parameters
- Never use global mutable state (except singletons via `@lru_cache`)

### Singleton Pattern

Use `@lru_cache(maxsize=1)` for singletons:

```python
@lru_cache(maxsize=1)
def get_embedding_service() -> EmbeddingService:
    return EmbeddingService()
```

- Lazy initialization to avoid requiring dependencies at import time

## Code Quality & Documentation

### Docstrings

- Google-style docstrings for all public functions
- Include Args, Returns, Raises sections
- Use type hints in docstrings only if needed for clarity

Example:

```python
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
    """Search PubMed and return evidence.

    Args:
        query: The search query string
        max_results: Maximum number of results to return

    Returns:
        List of Evidence objects

    Raises:
        SearchError: If the search fails
        RateLimitError: If we hit rate limits
    """
```

### Code Comments

- Explain WHY, not WHAT
- Document non-obvious patterns (e.g., why `requests` not `httpx` for ClinicalTrials)
- Mark critical sections: `# CRITICAL: ...`
- Document rate limiting rationale
- Explain async patterns when non-obvious

## Prompt Engineering & Citation Validation

### Judge Prompts

- System prompt in `src/prompts/judge.py`
- Format evidence with truncation (1500 chars per item)
- Handle empty evidence case separately
- Always request structured JSON output
- Use `format_user_prompt()` and `format_empty_evidence_prompt()` helpers

### Hypothesis Prompts

- Use diverse evidence selection (MMR algorithm)
- Sentence-aware truncation (`truncate_at_sentence()`)
- Format: Drug β†’ Target β†’ Pathway β†’ Effect
- System prompt emphasizes mechanistic reasoning
- Use `format_hypothesis_prompt()` with embeddings for diversity

### Report Prompts

- Include full citation details for validation
- Use diverse evidence selection (n=20)
- **CRITICAL**: Emphasize citation validation rules
- Format hypotheses with support/contradiction counts
- System prompt includes explicit JSON structure requirements

### Citation Validation

- **ALWAYS** validate references before returning reports
- Use `validate_references()` from `src/utils/citation_validator.py`
- Remove hallucinated citations (URLs not in evidence)
- Log warnings for removed citations
- Never trust LLM-generated citations without validation

### Citation Validation Rules

1. Every reference URL must EXACTLY match a provided evidence URL
2. Do NOT invent, fabricate, or hallucinate any references
3. Do NOT modify paper titles, authors, dates, or URLs
4. If unsure about a citation, OMIT it rather than guess
5. Copy URLs exactly as provided - do not create similar-looking URLs

### Evidence Selection

- Use `select_diverse_evidence()` for MMR-based selection
- Balance relevance vs diversity (lambda=0.7 default)
- Sentence-aware truncation preserves meaning
- Limit evidence per prompt to avoid context overflow

## MCP Integration

### MCP Tools

- Functions in `src/mcp_tools.py` for Claude Desktop
- Full type hints required
- Google-style docstrings with Args/Returns sections
- Formatted string returns (markdown)

### Gradio MCP Server

- Enable with `mcp_server=True` in `demo.launch()`
- Endpoint: `/gradio_api/mcp/`
- Use `ssr_mode=False` to fix hydration issues in HF Spaces

## Common Pitfalls

1. **Blocking the event loop**: Never use sync I/O in async functions
2. **Missing type hints**: All functions must have complete type annotations
3. **Hallucinated citations**: Always validate references
4. **Global mutable state**: Use ContextVar or pass via parameters
5. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings)
6. **Rate limiting**: Always implement for external APIs
7. **Error chaining**: Always use `from e` when raising exceptions

## Key Principles

1. **Type Safety First**: All code must pass `mypy --strict`
2. **Async Everything**: All I/O must be async
3. **Test-Driven**: Write tests before implementation
4. **No Hallucinations**: Validate all citations
5. **Graceful Degradation**: Support free tier (HF Inference) when no API keys
6. **Lazy Loading**: Don't require optional dependencies at import time
7. **Structured Logging**: Use structlog, never print()
8. **Error Chaining**: Always preserve exception context

## Pull Request Process

1. Ensure all checks pass: `make check`
2. Update documentation if needed
3. Add tests for new features
4. Update CHANGELOG if applicable
5. Request review from maintainers
6. Address review feedback
7. Wait for approval before merging

## Questions?

- Open an issue on GitHub
- Check existing documentation
- Review code examples in the codebase

Thank you for contributing to DeepCritical!