File size: 5,343 Bytes
b7b8e60 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
# RAG Pipeline API Documentation
## Overview
FastAPI-based RAG (Retrieval-Augmented Generation) pipeline with OpenRouter GLM integration for intelligent tool calling.
## Base URL
```
http://localhost:8000
```
## Endpoints
### `/chat` - Main Chat Endpoint
**Method:** `POST`
**Description:** Intelligent chat with RAG tool calling. GLM automatically determines when to use RAG vs. general conversation.
#### Request Body
```json
{
"messages": [
{
"role": "user|assistant|system",
"content": "string"
}
]
}
```
#### Response Format
```json
{
"response": "string",
"tool_calls": [
{
"name": "rag_qa",
"arguments": "{\"question\": \"string\", \"dataset\": \"string\"}"
}
] | null
}
```
#### Examples
**1. General Greeting (No RAG):**
```bash
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hi"}]}'
```
**Response:**
```json
{
"response": "Hi! I'm Rohit's AI assistant. I can help you learn about his professional background, skills, and experience. What would you like to know about Rohit?",
"tool_calls": null
}
```
**2. Portfolio Question (RAG Enabled):**
```bash
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What is your current role?"}]}'
```
**Response:**
```json
{
"response": "Based on the portfolio information, Rohit is currently working as a Tech Lead at FleetEnable, where he leads UI development for a logistics SaaS product focused on drayage and freight management...",
"tool_calls": [
{
"name": "rag_qa",
"arguments": "{\"question\": \"What is your current role?\"}"
}
]
}
```
### `/health` - Health Check
**Method:** `GET`
**Description:** Check API and dataset loading status.
#### Response
```json
{
"status": "healthy",
"datasets_loaded": 1,
"available_datasets": ["developer-portfolio"]
}
```
### `/datasets` - List Available Datasets
**Method:** `GET`
**Description:** Get list of available datasets.
#### Response
```json
{
"datasets": ["developer-portfolio"]
}
```
## Features
### π§ Intelligent Tool Calling
- **Automatic Detection:** GLM determines when questions need RAG vs. general conversation
- **Context-Aware:** Uses portfolio information for relevant questions
- **Natural Responses:** Synthesizes RAG results into conversational answers
### π― Third-Person AI Assistant
- **Portfolio Focus:** Responds about Rohit's experience (not "my" experience)
- **Professional Tone:** Maintains proper third-person references
- **Context Integration:** Combines multiple data points coherently
### β‘ Performance Optimizations
- **On-Demand Loading:** Datasets load only when RAG is needed
- **Clean Output:** No verbose ML logging for general conversations
- **Fast Responses:** Sub-second for greetings, ~20s for first RAG query
## Available Datasets
### `developer-portfolio`
- **Content:** Work experience, skills, projects, achievements
- **Topics:** FleetEnable, Coditude, technologies, leadership
- **Size:** 19 documents with full metadata
## Error Handling
### Common Responses
- **Datasets Loading:** "RAG Pipeline is running but datasets are still loading..."
- **Dataset Not Found:** "Dataset 'xyz' not available. Available datasets: [...]"
- **API Errors:** HTTP 500 with error details
### Status Codes
- `200` - Success
- `400` - Bad Request (invalid JSON, missing fields)
- `500` - Internal Server Error
## Environment Variables
Create `.env` file:
```bash
OPENROUTER_API_KEY=sk-or-v1-your-key-here
PORT=8000
TOKENIZERS_PARALLELISM=false
```
## Development
### Running Locally
```bash
# Install dependencies
pip install -r requirements.txt
# Start server
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Or use script
./start.sh
```
### Testing
```bash
# Health check
curl http://localhost:8000/health
# Chat test
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hi"}]}'
```
## Deployment
### Docker
```bash
# Build
docker build -t rag-pipeline .
# Run
docker run -p 8000:8000 rag-pipeline
```
### Hugging Face Spaces
1. Push code to repository
2. Connect Space to repository
3. Set environment variables in Space settings
4. Automatic deployment from `main` branch
## Architecture
```
OpenRouter GLM-4.5-air (Parent AI)
βββ Tool Calling Logic
β βββ Automatically detects RAG-worthy questions
β βββ Falls back to general knowledge
βββ RAG Tool Function
β βββ Dataset selection (developer-portfolio)
β βββ Document retrieval
β βββ Context formatting
βββ Response Generation
βββ Tool results integration
βββ Natural language responses
```
## Changelog
### v2.0 - Current
- β
OpenRouter GLM integration with tool calling
- β
Intelligent RAG vs. conversation detection
- β
Third-person AI assistant for Rohit's portfolio
- β
On-demand dataset loading
- β
Removed `/answer` endpoint (use `/chat` only)
- β
Environment variable configuration
- β
Performance optimizations
### v1.0 - Legacy
- Google Gemini integration
- Multiple endpoints (`/answer`, `/chat`)
- Background dataset loading
- First-person responses |