File size: 3,704 Bytes
53c4c46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c37620b
ce644a9
 
 
448c679
cb48bd4
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# Services Architecture

DeepCritical provides several services for embeddings, RAG, and statistical analysis.

## Embedding Service

**File**: `src/services/embeddings.py`

**Purpose**: Local sentence-transformers for semantic search and deduplication

**Features**:
- **No API Key Required**: Uses local sentence-transformers models
- **Async-Safe**: All operations use `run_in_executor()` to avoid blocking
- **ChromaDB Storage**: Vector storage for embeddings
- **Deduplication**: 0.85 similarity threshold (85% similarity = duplicate)

**Model**: Configurable via `settings.local_embedding_model` (default: `all-MiniLM-L6-v2`)

**Methods**:
- `async def embed(text: str) -> list[float]`: Generate embeddings
- `async def embed_batch(texts: list[str]) -> list[list[float]]`: Batch embedding
- `async def similarity(text1: str, text2: str) -> float`: Calculate similarity
- `async def find_duplicates(texts: list[str], threshold: float = 0.85) -> list[tuple[int, int]]`: Find duplicates

**Usage**:
```python
from src.services.embeddings import get_embedding_service

service = get_embedding_service()
embedding = await service.embed("text to embed")
```

## LlamaIndex RAG Service

**File**: `src/services/rag.py`

**Purpose**: Retrieval-Augmented Generation using LlamaIndex

**Features**:
- **OpenAI Embeddings**: Requires `OPENAI_API_KEY`
- **ChromaDB Storage**: Vector database for document storage
- **Metadata Preservation**: Preserves source, title, URL, date, authors
- **Lazy Initialization**: Graceful fallback if OpenAI key not available

**Methods**:
- `async def ingest_evidence(evidence: list[Evidence]) -> None`: Ingest evidence into RAG
- `async def retrieve(query: str, top_k: int = 5) -> list[Document]`: Retrieve relevant documents
- `async def query(query: str, top_k: int = 5) -> str`: Query with RAG

**Usage**:
```python
from src.services.rag import get_rag_service

service = get_rag_service()
if service:
    documents = await service.retrieve("query", top_k=5)
```

## Statistical Analyzer

**File**: `src/services/statistical_analyzer.py`

**Purpose**: Secure execution of AI-generated statistical code

**Features**:
- **Modal Sandbox**: Secure, isolated execution environment
- **Code Generation**: Generates Python code via LLM
- **Library Pinning**: Version-pinned libraries in `SANDBOX_LIBRARIES`
- **Network Isolation**: `block_network=True` by default

**Libraries Available**:
- pandas, numpy, scipy
- matplotlib, scikit-learn
- statsmodels

**Output**: `AnalysisResult` with:
- `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
- `code`: Generated analysis code
- `output`: Execution output
- `error`: Error message if execution failed

**Usage**:
```python
from src.services.statistical_analyzer import StatisticalAnalyzer

analyzer = StatisticalAnalyzer()
result = await analyzer.analyze(
    hypothesis="Metformin reduces cancer risk",
    evidence=evidence_list
)
```

## Singleton Pattern

All services use the singleton pattern with `@lru_cache(maxsize=1)`:

```python
@lru_cache(maxsize=1)
def get_embedding_service() -> EmbeddingService:
    return EmbeddingService()
```

This ensures:
- Single instance per process
- Lazy initialization
- No dependencies required at import time

## Service Availability

Services check availability before use:

```python
from src.utils.config import settings

if settings.modal_available:
    # Use Modal sandbox
    pass

if settings.has_openai_key:
    # Use OpenAI embeddings for RAG
    pass
```

## See Also

- [Tools](tools.md) - How services are used by search tools
- [API Reference - Services](../api/services.md) - API documentation
- [Configuration](../configuration/index.md) - Service configuration