Spaces:
Running
Running
File size: 12,576 Bytes
53c4c46 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 |
# Contributing to DeepCritical
Thank you for your interest in contributing to DeepCritical! This guide will help you get started.
## Table of Contents
- [Git Workflow](#git-workflow)
- [Getting Started](#getting-started)
- [Development Commands](#development-commands)
- [Code Style & Conventions](#code-style--conventions)
- [Type Safety](#type-safety)
- [Error Handling & Logging](#error-handling--logging)
- [Testing Requirements](#testing-requirements)
- [Implementation Patterns](#implementation-patterns)
- [Code Quality & Documentation](#code-quality--documentation)
- [Prompt Engineering & Citation Validation](#prompt-engineering--citation-validation)
- [MCP Integration](#mcp-integration)
- [Common Pitfalls](#common-pitfalls)
- [Key Principles](#key-principles)
- [Pull Request Process](#pull-request-process)
## Git Workflow
- `main`: Production-ready (GitHub)
- `dev`: Development integration (GitHub)
- Use feature branches: `yourname-dev`
- **NEVER** push directly to `main` or `dev` on HuggingFace
- GitHub is source of truth; HuggingFace is for deployment
## Getting Started
1. **Fork the repository** on GitHub
2. **Clone your fork**:
```bash
git clone https://github.com/yourusername/GradioDemo.git
cd GradioDemo
```
3. **Install dependencies**:
```bash
make install
```
4. **Create a feature branch**:
```bash
git checkout -b yourname-feature-name
```
5. **Make your changes** following the guidelines below
6. **Run checks**:
```bash
make check
```
7. **Commit and push**:
```bash
git commit -m "Description of changes"
git push origin yourname-feature-name
```
8. **Create a pull request** on GitHub
## Development Commands
```bash
make install # Install dependencies + pre-commit
make check # Lint + typecheck + test (MUST PASS)
make test # Run unit tests
make lint # Run ruff
make format # Format with ruff
make typecheck # Run mypy
make test-cov # Test with coverage
make docs-build # Build documentation
make docs-serve # Serve documentation locally
```
## Code Style & Conventions
### Type Safety
- **ALWAYS** use type hints for all function parameters and return types
- Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
- Use `TYPE_CHECKING` imports for circular dependencies:
```python
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from src.services.embeddings import EmbeddingService
```
### Pydantic Models
- All data exchange uses Pydantic models (`src/utils/models.py`)
- Models are frozen (`model_config = {"frozen": True}`) for immutability
- Use `Field()` with descriptions for all model fields
- Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints
### Async Patterns
- **ALL** I/O operations must be async (`async def`, `await`)
- Use `asyncio.gather()` for parallel operations
- CPU-bound work (embeddings, parsing) must use `run_in_executor()`:
```python
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, cpu_bound_function, args)
```
- Never block the event loop with synchronous I/O
### Linting
- Ruff with 100-char line length
- Ignore rules documented in `pyproject.toml`:
- `PLR0913`: Too many arguments (agents need many params)
- `PLR0912`: Too many branches (complex orchestrator logic)
- `PLR0911`: Too many return statements (complex agent logic)
- `PLR2004`: Magic values (statistical constants)
- `PLW0603`: Global statement (singleton pattern)
- `PLC0415`: Lazy imports for optional dependencies
### Pre-commit
- Run `make check` before committing
- Must pass: lint + typecheck + test-cov
- Pre-commit hooks installed via `make install`
- **CRITICAL**: Make sure you run the full pre-commit checks before opening a PR (not draft), otherwise Obstacle is the Way will lose his mind
## Error Handling & Logging
### Exception Hierarchy
Use custom exception hierarchy (`src/utils/exceptions.py`):
- `DeepCriticalError` (base)
- `SearchError` β `RateLimitError`
- `JudgeError`
- `ConfigurationError`
### Error Handling Rules
- Always chain exceptions: `raise SearchError(...) from e`
- Log errors with context using `structlog`:
```python
logger.error("Operation failed", error=str(e), context=value)
```
- Never silently swallow exceptions
- Provide actionable error messages
### Logging
- Use `structlog` for all logging (NOT `print` or `logging`)
- Import: `import structlog; logger = structlog.get_logger()`
- Log with structured data: `logger.info("event", key=value)`
- Use appropriate levels: DEBUG, INFO, WARNING, ERROR
### Logging Examples
```python
logger.info("Starting search", query=query, tools=[t.name for t in tools])
logger.warning("Search tool failed", tool=tool.name, error=str(result))
logger.error("Assessment failed", error=str(e))
```
### Error Chaining
Always preserve exception context:
```python
try:
result = await api_call()
except httpx.HTTPError as e:
raise SearchError(f"API call failed: {e}") from e
```
## Testing Requirements
### Test Structure
- Unit tests in `tests/unit/` (mocked, fast)
- Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`)
- Use markers: `unit`, `integration`, `slow`
### Mocking
- Use `respx` for httpx mocking
- Use `pytest-mock` for general mocking
- Mock LLM calls in unit tests (use `MockJudgeHandler`)
- Fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`
### TDD Workflow
1. Write failing test in `tests/unit/`
2. Implement in `src/`
3. Ensure test passes
4. Run `make check` (lint + typecheck + test)
### Test Examples
```python
@pytest.mark.unit
async def test_pubmed_search(mock_httpx_client):
tool = PubMedTool()
results = await tool.search("metformin", max_results=5)
assert len(results) > 0
assert all(isinstance(r, Evidence) for r in results)
@pytest.mark.integration
async def test_real_pubmed_search():
tool = PubMedTool()
results = await tool.search("metformin", max_results=3)
assert len(results) <= 3
```
### Test Coverage
- Run `make test-cov` for coverage report
- Aim for >80% coverage on critical paths
- Exclude: `__init__.py`, `TYPE_CHECKING` blocks
## Implementation Patterns
### Search Tools
All tools implement `SearchTool` protocol (`src/tools/base.py`):
- Must have `name` property
- Must implement `async def search(query, max_results) -> list[Evidence]`
- Use `@retry` decorator from tenacity for resilience
- Rate limiting: Implement `_rate_limit()` for APIs with limits (e.g., PubMed)
- Error handling: Raise `SearchError` or `RateLimitError` on failures
Example pattern:
```python
class MySearchTool:
@property
def name(self) -> str:
return "mytool"
@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
# Implementation
return evidence_list
```
### Judge Handlers
- Implement `JudgeHandlerProtocol` (`async def assess(question, evidence) -> JudgeAssessment`)
- Use pydantic-ai `Agent` with `output_type=JudgeAssessment`
- System prompts in `src/prompts/judge.py`
- Support fallback handlers: `MockJudgeHandler`, `HFInferenceJudgeHandler`
- Always return valid `JudgeAssessment` (never raise exceptions)
### Agent Factory Pattern
- Use factory functions for creating agents (`src/agent_factory/`)
- Lazy initialization for optional dependencies (e.g., embeddings, Modal)
- Check requirements before initialization:
```python
def check_magentic_requirements() -> None:
if not settings.has_openai_key:
raise ConfigurationError("Magentic requires OpenAI")
```
### State Management
- **Magentic Mode**: Use `ContextVar` for thread-safe state (`src/agents/state.py`)
- **Simple Mode**: Pass state via function parameters
- Never use global mutable state (except singletons via `@lru_cache`)
### Singleton Pattern
Use `@lru_cache(maxsize=1)` for singletons:
```python
@lru_cache(maxsize=1)
def get_embedding_service() -> EmbeddingService:
return EmbeddingService()
```
- Lazy initialization to avoid requiring dependencies at import time
## Code Quality & Documentation
### Docstrings
- Google-style docstrings for all public functions
- Include Args, Returns, Raises sections
- Use type hints in docstrings only if needed for clarity
Example:
```python
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
"""Search PubMed and return evidence.
Args:
query: The search query string
max_results: Maximum number of results to return
Returns:
List of Evidence objects
Raises:
SearchError: If the search fails
RateLimitError: If we hit rate limits
"""
```
### Code Comments
- Explain WHY, not WHAT
- Document non-obvious patterns (e.g., why `requests` not `httpx` for ClinicalTrials)
- Mark critical sections: `# CRITICAL: ...`
- Document rate limiting rationale
- Explain async patterns when non-obvious
## Prompt Engineering & Citation Validation
### Judge Prompts
- System prompt in `src/prompts/judge.py`
- Format evidence with truncation (1500 chars per item)
- Handle empty evidence case separately
- Always request structured JSON output
- Use `format_user_prompt()` and `format_empty_evidence_prompt()` helpers
### Hypothesis Prompts
- Use diverse evidence selection (MMR algorithm)
- Sentence-aware truncation (`truncate_at_sentence()`)
- Format: Drug β Target β Pathway β Effect
- System prompt emphasizes mechanistic reasoning
- Use `format_hypothesis_prompt()` with embeddings for diversity
### Report Prompts
- Include full citation details for validation
- Use diverse evidence selection (n=20)
- **CRITICAL**: Emphasize citation validation rules
- Format hypotheses with support/contradiction counts
- System prompt includes explicit JSON structure requirements
### Citation Validation
- **ALWAYS** validate references before returning reports
- Use `validate_references()` from `src/utils/citation_validator.py`
- Remove hallucinated citations (URLs not in evidence)
- Log warnings for removed citations
- Never trust LLM-generated citations without validation
### Citation Validation Rules
1. Every reference URL must EXACTLY match a provided evidence URL
2. Do NOT invent, fabricate, or hallucinate any references
3. Do NOT modify paper titles, authors, dates, or URLs
4. If unsure about a citation, OMIT it rather than guess
5. Copy URLs exactly as provided - do not create similar-looking URLs
### Evidence Selection
- Use `select_diverse_evidence()` for MMR-based selection
- Balance relevance vs diversity (lambda=0.7 default)
- Sentence-aware truncation preserves meaning
- Limit evidence per prompt to avoid context overflow
## MCP Integration
### MCP Tools
- Functions in `src/mcp_tools.py` for Claude Desktop
- Full type hints required
- Google-style docstrings with Args/Returns sections
- Formatted string returns (markdown)
### Gradio MCP Server
- Enable with `mcp_server=True` in `demo.launch()`
- Endpoint: `/gradio_api/mcp/`
- Use `ssr_mode=False` to fix hydration issues in HF Spaces
## Common Pitfalls
1. **Blocking the event loop**: Never use sync I/O in async functions
2. **Missing type hints**: All functions must have complete type annotations
3. **Hallucinated citations**: Always validate references
4. **Global mutable state**: Use ContextVar or pass via parameters
5. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings)
6. **Rate limiting**: Always implement for external APIs
7. **Error chaining**: Always use `from e` when raising exceptions
## Key Principles
1. **Type Safety First**: All code must pass `mypy --strict`
2. **Async Everything**: All I/O must be async
3. **Test-Driven**: Write tests before implementation
4. **No Hallucinations**: Validate all citations
5. **Graceful Degradation**: Support free tier (HF Inference) when no API keys
6. **Lazy Loading**: Don't require optional dependencies at import time
7. **Structured Logging**: Use structlog, never print()
8. **Error Chaining**: Always preserve exception context
## Pull Request Process
1. Ensure all checks pass: `make check`
2. Update documentation if needed
3. Add tests for new features
4. Update CHANGELOG if applicable
5. Request review from maintainers
6. Address review feedback
7. Wait for approval before merging
## Questions?
- Open an issue on GitHub
- Check existing documentation
- Review code examples in the codebase
Thank you for contributing to DeepCritical!
|