# RAG Pipeline API Documentation ## Overview FastAPI-based RAG (Retrieval-Augmented Generation) pipeline with OpenRouter GLM integration for intelligent tool calling. ## Base URL ``` http://localhost:8000 ``` ## Endpoints ### `/chat` - Main Chat Endpoint **Method:** `POST` **Description:** Intelligent chat with RAG tool calling. GLM automatically determines when to use RAG vs. general conversation. #### Request Body ```json { "messages": [ { "role": "user|assistant|system", "content": "string" } ] } ``` #### Response Format ```json { "response": "string", "tool_calls": [ { "name": "rag_qa", "arguments": "{\"question\": \"string\", \"dataset\": \"string\"}" } ] | null } ``` #### Examples **1. General Greeting (No RAG):** ```bash curl -X POST http://localhost:8000/chat \ -H "Content-Type: application/json" \ -d '{"messages":[{"role":"user","content":"hi"}]}' ``` **Response:** ```json { "response": "Hi! I'm Rohit's AI assistant. I can help you learn about his professional background, skills, and experience. What would you like to know about Rohit?", "tool_calls": null } ``` **2. Portfolio Question (RAG Enabled):** ```bash curl -X POST http://localhost:8000/chat \ -H "Content-Type: application/json" \ -d '{"messages":[{"role":"user","content":"What is your current role?"}]}' ``` **Response:** ```json { "response": "Based on the portfolio information, Rohit is currently working as a Tech Lead at FleetEnable, where he leads UI development for a logistics SaaS product focused on drayage and freight management...", "tool_calls": [ { "name": "rag_qa", "arguments": "{\"question\": \"What is your current role?\"}" } ] } ``` ### `/health` - Health Check **Method:** `GET` **Description:** Check API and dataset loading status. #### Response ```json { "status": "healthy", "datasets_loaded": 1, "available_datasets": ["developer-portfolio"] } ``` ### `/datasets` - List Available Datasets **Method:** `GET` **Description:** Get list of available datasets. #### Response ```json { "datasets": ["developer-portfolio"] } ``` ## Features ### 🧠 Intelligent Tool Calling - **Automatic Detection:** GLM determines when questions need RAG vs. general conversation - **Context-Aware:** Uses portfolio information for relevant questions - **Natural Responses:** Synthesizes RAG results into conversational answers ### 🎯 Third-Person AI Assistant - **Portfolio Focus:** Responds about Rohit's experience (not "my" experience) - **Professional Tone:** Maintains proper third-person references - **Context Integration:** Combines multiple data points coherently ### ⚡ Performance Optimizations - **On-Demand Loading:** Datasets load only when RAG is needed - **Clean Output:** No verbose ML logging for general conversations - **Fast Responses:** Sub-second for greetings, ~20s for first RAG query ## Available Datasets ### `developer-portfolio` - **Content:** Work experience, skills, projects, achievements - **Topics:** FleetEnable, Coditude, technologies, leadership - **Size:** 19 documents with full metadata ## Error Handling ### Common Responses - **Datasets Loading:** "RAG Pipeline is running but datasets are still loading..." - **Dataset Not Found:** "Dataset 'xyz' not available. Available datasets: [...]" - **API Errors:** HTTP 500 with error details ### Status Codes - `200` - Success - `400` - Bad Request (invalid JSON, missing fields) - `500` - Internal Server Error ## Environment Variables Create `.env` file: ```bash OPENROUTER_API_KEY=sk-or-v1-your-key-here PORT=8000 TOKENIZERS_PARALLELISM=false ``` ## Development ### Running Locally ```bash # Install dependencies pip install -r requirements.txt # Start server python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload # Or use script ./start.sh ``` ### Testing ```bash # Health check curl http://localhost:8000/health # Chat test curl -X POST http://localhost:8000/chat \ -H "Content-Type: application/json" \ -d '{"messages":[{"role":"user","content":"hi"}]}' ``` ## Deployment ### Docker ```bash # Build docker build -t rag-pipeline . # Run docker run -p 8000:8000 rag-pipeline ``` ### Hugging Face Spaces 1. Push code to repository 2. Connect Space to repository 3. Set environment variables in Space settings 4. Automatic deployment from `main` branch ## Architecture ``` OpenRouter GLM-4.5-air (Parent AI) ├── Tool Calling Logic │ ├── Automatically detects RAG-worthy questions │ └── Falls back to general knowledge ├── RAG Tool Function │ ├── Dataset selection (developer-portfolio) │ ├── Document retrieval │ └── Context formatting └── Response Generation ├── Tool results integration └── Natural language responses ``` ## Changelog ### v2.0 - Current - ✅ OpenRouter GLM integration with tool calling - ✅ Intelligent RAG vs. conversation detection - ✅ Third-person AI assistant for Rohit's portfolio - ✅ On-demand dataset loading - ✅ Removed `/answer` endpoint (use `/chat` only) - ✅ Environment variable configuration - ✅ Performance optimizations ### v1.0 - Legacy - Google Gemini integration - Multiple endpoints (`/answer`, `/chat`) - Background dataset loading - First-person responses