rohit
update
b7b8e60
|
raw
history blame
5.34 kB
# RAG Pipeline API Documentation
## Overview
FastAPI-based RAG (Retrieval-Augmented Generation) pipeline with OpenRouter GLM integration for intelligent tool calling.
## Base URL
```
http://localhost:8000
```
## Endpoints
### `/chat` - Main Chat Endpoint
**Method:** `POST`
**Description:** Intelligent chat with RAG tool calling. GLM automatically determines when to use RAG vs. general conversation.
#### Request Body
```json
{
"messages": [
{
"role": "user|assistant|system",
"content": "string"
}
]
}
```
#### Response Format
```json
{
"response": "string",
"tool_calls": [
{
"name": "rag_qa",
"arguments": "{\"question\": \"string\", \"dataset\": \"string\"}"
}
] | null
}
```
#### Examples
**1. General Greeting (No RAG):**
```bash
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hi"}]}'
```
**Response:**
```json
{
"response": "Hi! I'm Rohit's AI assistant. I can help you learn about his professional background, skills, and experience. What would you like to know about Rohit?",
"tool_calls": null
}
```
**2. Portfolio Question (RAG Enabled):**
```bash
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What is your current role?"}]}'
```
**Response:**
```json
{
"response": "Based on the portfolio information, Rohit is currently working as a Tech Lead at FleetEnable, where he leads UI development for a logistics SaaS product focused on drayage and freight management...",
"tool_calls": [
{
"name": "rag_qa",
"arguments": "{\"question\": \"What is your current role?\"}"
}
]
}
```
### `/health` - Health Check
**Method:** `GET`
**Description:** Check API and dataset loading status.
#### Response
```json
{
"status": "healthy",
"datasets_loaded": 1,
"available_datasets": ["developer-portfolio"]
}
```
### `/datasets` - List Available Datasets
**Method:** `GET`
**Description:** Get list of available datasets.
#### Response
```json
{
"datasets": ["developer-portfolio"]
}
```
## Features
### 🧠 Intelligent Tool Calling
- **Automatic Detection:** GLM determines when questions need RAG vs. general conversation
- **Context-Aware:** Uses portfolio information for relevant questions
- **Natural Responses:** Synthesizes RAG results into conversational answers
### 🎯 Third-Person AI Assistant
- **Portfolio Focus:** Responds about Rohit's experience (not "my" experience)
- **Professional Tone:** Maintains proper third-person references
- **Context Integration:** Combines multiple data points coherently
### ⚑ Performance Optimizations
- **On-Demand Loading:** Datasets load only when RAG is needed
- **Clean Output:** No verbose ML logging for general conversations
- **Fast Responses:** Sub-second for greetings, ~20s for first RAG query
## Available Datasets
### `developer-portfolio`
- **Content:** Work experience, skills, projects, achievements
- **Topics:** FleetEnable, Coditude, technologies, leadership
- **Size:** 19 documents with full metadata
## Error Handling
### Common Responses
- **Datasets Loading:** "RAG Pipeline is running but datasets are still loading..."
- **Dataset Not Found:** "Dataset 'xyz' not available. Available datasets: [...]"
- **API Errors:** HTTP 500 with error details
### Status Codes
- `200` - Success
- `400` - Bad Request (invalid JSON, missing fields)
- `500` - Internal Server Error
## Environment Variables
Create `.env` file:
```bash
OPENROUTER_API_KEY=sk-or-v1-your-key-here
PORT=8000
TOKENIZERS_PARALLELISM=false
```
## Development
### Running Locally
```bash
# Install dependencies
pip install -r requirements.txt
# Start server
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Or use script
./start.sh
```
### Testing
```bash
# Health check
curl http://localhost:8000/health
# Chat test
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hi"}]}'
```
## Deployment
### Docker
```bash
# Build
docker build -t rag-pipeline .
# Run
docker run -p 8000:8000 rag-pipeline
```
### Hugging Face Spaces
1. Push code to repository
2. Connect Space to repository
3. Set environment variables in Space settings
4. Automatic deployment from `main` branch
## Architecture
```
OpenRouter GLM-4.5-air (Parent AI)
β”œβ”€β”€ Tool Calling Logic
β”‚ β”œβ”€β”€ Automatically detects RAG-worthy questions
β”‚ └── Falls back to general knowledge
β”œβ”€β”€ RAG Tool Function
β”‚ β”œβ”€β”€ Dataset selection (developer-portfolio)
β”‚ β”œβ”€β”€ Document retrieval
β”‚ └── Context formatting
└── Response Generation
β”œβ”€β”€ Tool results integration
└── Natural language responses
```
## Changelog
### v2.0 - Current
- βœ… OpenRouter GLM integration with tool calling
- βœ… Intelligent RAG vs. conversation detection
- βœ… Third-person AI assistant for Rohit's portfolio
- βœ… On-demand dataset loading
- βœ… Removed `/answer` endpoint (use `/chat` only)
- βœ… Environment variable configuration
- βœ… Performance optimizations
### v1.0 - Legacy
- Google Gemini integration
- Multiple endpoints (`/answer`, `/chat`)
- Background dataset loading
- First-person responses