Spaces:

syntaxhacker
/

developer-portfolio-rag

Sleeping

App Files Files Community

developer-portfolio-rag / api.md

rohit

update

b7b8e60 about 2 months ago

preview code

raw

history blame

5.34 kB

	# RAG Pipeline API Documentation

	## Overview
	FastAPI-based RAG (Retrieval-Augmented Generation) pipeline with OpenRouter GLM integration for intelligent tool calling.

	## Base URL
	```
	http://localhost:8000
	```

	## Endpoints

	### `/chat` - Main Chat Endpoint
	Method: `POST`
	Description: Intelligent chat with RAG tool calling. GLM automatically determines when to use RAG vs. general conversation.

	#### Request Body
	```json
	{
	"messages": [
	{
	"role": "user\|assistant\|system",
	"content": "string"
	}
	]
	}
	```

	#### Response Format
	```json
	{
	"response": "string",
	"tool_calls": [
	{
	"name": "rag_qa",
	"arguments": "{\"question\": \"string\", \"dataset\": \"string\"}"
	}
	] \| null
	}
	```

	#### Examples

	1. General Greeting (No RAG):
	```bash
	curl -X POST http://localhost:8000/chat \
	-H "Content-Type: application/json" \
	-d '{"messages":[{"role":"user","content":"hi"}]}'
	```

	Response:
	```json
	{
	"response": "Hi! I'm Rohit's AI assistant. I can help you learn about his professional background, skills, and experience. What would you like to know about Rohit?",
	"tool_calls": null
	}
	```

	2. Portfolio Question (RAG Enabled):
	```bash
	curl -X POST http://localhost:8000/chat \
	-H "Content-Type: application/json" \
	-d '{"messages":[{"role":"user","content":"What is your current role?"}]}'
	```

	Response:
	```json
	{
	"response": "Based on the portfolio information, Rohit is currently working as a Tech Lead at FleetEnable, where he leads UI development for a logistics SaaS product focused on drayage and freight management...",
	"tool_calls": [
	{
	"name": "rag_qa",
	"arguments": "{\"question\": \"What is your current role?\"}"
	}
	]
	}
	```

	### `/health` - Health Check
	Method: `GET`
	Description: Check API and dataset loading status.

	#### Response
	```json
	{
	"status": "healthy",
	"datasets_loaded": 1,
	"available_datasets": ["developer-portfolio"]
	}
	```

	### `/datasets` - List Available Datasets
	Method: `GET`
	Description: Get list of available datasets.

	#### Response
	```json
	{
	"datasets": ["developer-portfolio"]
	}
	```

	## Features

	### 🧠 Intelligent Tool Calling
	- Automatic Detection: GLM determines when questions need RAG vs. general conversation
	- Context-Aware: Uses portfolio information for relevant questions
	- Natural Responses: Synthesizes RAG results into conversational answers

	### 🎯 Third-Person AI Assistant
	- Portfolio Focus: Responds about Rohit's experience (not "my" experience)
	- Professional Tone: Maintains proper third-person references
	- Context Integration: Combines multiple data points coherently

	### ⚡ Performance Optimizations
	- On-Demand Loading: Datasets load only when RAG is needed
	- Clean Output: No verbose ML logging for general conversations
	- Fast Responses: Sub-second for greetings, ~20s for first RAG query

	## Available Datasets

	### `developer-portfolio`
	- Content: Work experience, skills, projects, achievements
	- Topics: FleetEnable, Coditude, technologies, leadership
	- Size: 19 documents with full metadata

	## Error Handling

	### Common Responses
	- Datasets Loading: "RAG Pipeline is running but datasets are still loading..."
	- Dataset Not Found: "Dataset 'xyz' not available. Available datasets: [...]"
	- API Errors: HTTP 500 with error details

	### Status Codes
	- `200` - Success
	- `400` - Bad Request (invalid JSON, missing fields)
	- `500` - Internal Server Error

	## Environment Variables

	Create `.env` file:
	```bash
	OPENROUTER_API_KEY=sk-or-v1-your-key-here
	PORT=8000
	TOKENIZERS_PARALLELISM=false
	```

	## Development

	### Running Locally
	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Start server
	python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

	# Or use script
	./start.sh
	```

	### Testing
	```bash
	# Health check
	curl http://localhost:8000/health

	# Chat test
	curl -X POST http://localhost:8000/chat \
	-H "Content-Type: application/json" \
	-d '{"messages":[{"role":"user","content":"hi"}]}'
	```

	## Deployment

	### Docker
	```bash
	# Build
	docker build -t rag-pipeline .

	# Run
	docker run -p 8000:8000 rag-pipeline
	```

	### Hugging Face Spaces
	1. Push code to repository
	2. Connect Space to repository
	3. Set environment variables in Space settings
	4. Automatic deployment from `main` branch

	## Architecture

	```
	OpenRouter GLM-4.5-air (Parent AI)
	├── Tool Calling Logic
	│ ├── Automatically detects RAG-worthy questions
	│ └── Falls back to general knowledge
	├── RAG Tool Function
	│ ├── Dataset selection (developer-portfolio)
	│ ├── Document retrieval
	│ └── Context formatting
	└── Response Generation
	├── Tool results integration
	└── Natural language responses
	```

	## Changelog

	### v2.0 - Current
	- ✅ OpenRouter GLM integration with tool calling
	- ✅ Intelligent RAG vs. conversation detection
	- ✅ Third-person AI assistant for Rohit's portfolio
	- ✅ On-demand dataset loading
	- ✅ Removed `/answer` endpoint (use `/chat` only)
	- ✅ Environment variable configuration
	- ✅ Performance optimizations

	### v1.0 - Legacy
	- Google Gemini integration
	- Multiple endpoints (`/answer`, `/chat`)
	- Background dataset loading
	- First-person responses