Krishna Chaitanya Cheedella
commited on
Commit
·
f3045de
1
Parent(s):
1e90386
Update deployment guides for OpenAI + HuggingFace setup
Browse files- DEPLOYMENT_GUIDE.md +190 -148
- QUICKSTART.md +64 -54
DEPLOYMENT_GUIDE.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# LLM Council - Comprehensive Guide
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
The LLM Council is a sophisticated multi-agent system that uses multiple Large Language Models (LLMs) to collectively answer questions through a 3-stage deliberation process:
|
| 6 |
|
|
@@ -8,6 +8,8 @@ The LLM Council is a sophisticated multi-agent system that uses multiple Large L
|
|
| 8 |
2. **Stage 2 - Peer Review**: Council members rank each other's anonymized responses
|
| 9 |
3. **Stage 3 - Synthesis**: A chairman model synthesizes the final answer based on all inputs
|
| 10 |
|
|
|
|
|
|
|
| 11 |
## 🏗️ Architecture
|
| 12 |
|
| 13 |
### Current Implementation
|
|
@@ -41,143 +43,115 @@ The LLM Council is a sophisticated multi-agent system that uses multiple Large L
|
|
| 41 |
└─────────────────────────────────────────────────────────────┘
|
| 42 |
```
|
| 43 |
|
| 44 |
-
## 🔧 Current Models (
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
-
|
| 47 |
-
- `
|
| 48 |
-
- `
|
| 49 |
-
- `Qwen/Qwen3-235B-A22B-Instruct-2507:hyperbolic` - Qwen large model
|
| 50 |
|
| 51 |
### Chairman
|
| 52 |
-
- `
|
| 53 |
|
| 54 |
-
**
|
| 55 |
-
-
|
| 56 |
-
-
|
| 57 |
-
-
|
|
|
|
|
|
|
| 58 |
|
| 59 |
-
## ✨
|
| 60 |
|
| 61 |
-
###
|
| 62 |
|
| 63 |
```python
|
| 64 |
COUNCIL_MODELS = [
|
| 65 |
-
"
|
| 66 |
-
"
|
| 67 |
-
"
|
| 68 |
-
"
|
| 69 |
-
"
|
| 70 |
]
|
| 71 |
-
|
| 72 |
-
CHAIRMAN_MODEL = "deepseek/deepseek-reasoner" # DeepSeek R1 for synthesis
|
| 73 |
```
|
|
|
|
| 74 |
|
| 75 |
-
###
|
| 76 |
-
|
| 77 |
-
#### Budget Council (Fast & Cost-Effective)
|
| 78 |
-
```python
|
| 79 |
-
COUNCIL_MODELS = [
|
| 80 |
-
"deepseek/deepseek-chat",
|
| 81 |
-
"google/gemini-2.0-flash-exp:free",
|
| 82 |
-
"qwen/qwen-2.5-72b-instruct",
|
| 83 |
-
"meta-llama/llama-3.3-70b-instruct",
|
| 84 |
-
]
|
| 85 |
-
CHAIRMAN_MODEL = "deepseek/deepseek-chat"
|
| 86 |
-
```
|
| 87 |
|
| 88 |
-
#### Premium Council (Maximum Quality)
|
| 89 |
```python
|
| 90 |
COUNCIL_MODELS = [
|
| 91 |
-
"
|
| 92 |
-
"openai
|
| 93 |
-
"
|
| 94 |
-
"
|
| 95 |
-
"
|
| 96 |
]
|
| 97 |
-
CHAIRMAN_MODEL = "openai
|
| 98 |
-
```
|
| 99 |
-
|
| 100 |
-
#### Reasoning Council (Complex Problems)
|
| 101 |
-
```python
|
| 102 |
-
COUNCIL_MODELS = [
|
| 103 |
-
"openai/o1-mini",
|
| 104 |
-
"deepseek/deepseek-reasoner",
|
| 105 |
-
"google/gemini-2.0-flash-thinking-exp:free",
|
| 106 |
-
"qwen/qwq-32b-preview",
|
| 107 |
-
]
|
| 108 |
-
CHAIRMAN_MODEL = "deepseek/deepseek-reasoner"
|
| 109 |
```
|
|
|
|
| 110 |
|
| 111 |
## 🚀 Running on Hugging Face Spaces
|
| 112 |
|
| 113 |
### Prerequisites
|
| 114 |
|
| 115 |
-
1. **
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
#### Method 1: Using Existing Space (Fork)
|
| 122 |
|
| 123 |
-
|
| 124 |
-
-
|
| 125 |
-
-
|
| 126 |
-
-
|
|
|
|
| 127 |
|
| 128 |
-
|
| 129 |
-
- Go to your space → Settings → Repository secrets
|
| 130 |
-
- Add secret: `OPENROUTER_API_KEY` with your OpenRouter API key
|
| 131 |
|
| 132 |
-
|
| 133 |
-
- Edit `backend/config.py` to use recommended models
|
| 134 |
-
- Commit changes
|
| 135 |
|
| 136 |
-
|
| 137 |
-
- HF Spaces will automatically rebuild and deploy
|
| 138 |
|
| 139 |
-
#### Method
|
| 140 |
|
| 141 |
1. **Create New Space**
|
| 142 |
-
```
|
| 143 |
- Go to huggingface.co/new-space
|
| 144 |
- Choose "Gradio" as SDK
|
| 145 |
- Select SDK version: 6.0.0
|
| 146 |
-
- Choose hardware: CPU (free)
|
| 147 |
-
```
|
| 148 |
|
| 149 |
-
2. **
|
| 150 |
```bash
|
| 151 |
-
# Clone your
|
| 152 |
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
|
| 153 |
cd YOUR_SPACE_NAME
|
| 154 |
|
| 155 |
-
# Copy your
|
| 156 |
cp -r /path/to/llm_council/* .
|
| 157 |
|
| 158 |
-
#
|
| 159 |
git add .
|
| 160 |
-
git commit -m "Initial
|
| 161 |
git push
|
| 162 |
```
|
| 163 |
|
| 164 |
-
3. **Configure
|
| 165 |
-
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
sdk: gradio
|
| 173 |
-
sdk_version: 6.0.0
|
| 174 |
-
app_file: app.py
|
| 175 |
-
pinned: false
|
| 176 |
-
---
|
| 177 |
-
```
|
| 178 |
|
| 179 |
-
4. **
|
| 180 |
-
-
|
|
|
|
| 181 |
|
| 182 |
### Required Files Structure
|
| 183 |
|
|
@@ -198,10 +172,37 @@ your-space/
|
|
| 198 |
|
| 199 |
## 🔐 Environment Variables
|
| 200 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 201 |
Create `.env` file locally (DO NOT commit to git):
|
| 202 |
|
| 203 |
```env
|
| 204 |
-
|
|
|
|
| 205 |
```
|
| 206 |
|
| 207 |
For Hugging Face Spaces, use Repository Secrets instead of `.env` file.
|
|
@@ -212,11 +213,14 @@ For Hugging Face Spaces, use Repository Secrets instead of `.env` file.
|
|
| 212 |
gradio>=6.0.0
|
| 213 |
httpx>=0.27.0
|
| 214 |
python-dotenv>=1.0.0
|
| 215 |
-
|
| 216 |
-
uvicorn>=0.30.0 # Optional - for REST API
|
| 217 |
-
pydantic>=2.0.0 # Optional - for REST API
|
| 218 |
```
|
| 219 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
## 💻 Running Locally
|
| 221 |
|
| 222 |
```bash
|
|
@@ -231,8 +235,9 @@ source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
| 231 |
# 3. Install dependencies
|
| 232 |
pip install -r requirements.txt
|
| 233 |
|
| 234 |
-
# 4. Create .env file
|
| 235 |
-
echo
|
|
|
|
| 236 |
|
| 237 |
# 5. Run the app
|
| 238 |
python app.py
|
|
@@ -240,44 +245,56 @@ python app.py
|
|
| 240 |
|
| 241 |
The app will be available at `http://localhost:7860`
|
| 242 |
|
| 243 |
-
## 🔧 Code
|
| 244 |
-
|
| 245 |
-
### 1. Enhanced Error Handling
|
| 246 |
-
- Retry logic with exponential backoff
|
| 247 |
-
- Graceful handling of model failures
|
| 248 |
-
- Better timeout management
|
| 249 |
-
- Detailed error logging
|
| 250 |
|
| 251 |
-
###
|
| 252 |
-
- Updated to latest stable models
|
| 253 |
-
- Multiple configuration presets
|
| 254 |
-
- Configurable timeouts and retries
|
| 255 |
-
- Clear documentation of alternatives
|
| 256 |
|
| 257 |
-
|
| 258 |
-
-
|
| 259 |
-
-
|
| 260 |
-
-
|
| 261 |
-
-
|
| 262 |
|
| 263 |
-
|
| 264 |
-
-
|
| 265 |
-
-
|
| 266 |
-
-
|
| 267 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 268 |
|
| 269 |
## 📊 Performance Characteristics
|
| 270 |
|
| 271 |
-
### Typical Response Times (
|
| 272 |
-
- **Stage 1**: 10-30 seconds (parallel
|
| 273 |
-
- **Stage 2**: 15-45 seconds (
|
| 274 |
-
- **Stage 3**:
|
| 275 |
-
- **Total**: ~
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 276 |
|
| 277 |
-
|
| 278 |
-
-
|
| 279 |
-
-
|
| 280 |
-
-
|
| 281 |
|
| 282 |
*Costs vary based on prompt length and response complexity*
|
| 283 |
|
|
@@ -285,36 +302,60 @@ The app will be available at `http://localhost:7860`
|
|
| 285 |
|
| 286 |
### Common Issues
|
| 287 |
|
| 288 |
-
1. **"
|
| 289 |
-
- Check API
|
| 290 |
-
- Verify
|
| 291 |
-
-
|
|
|
|
|
|
|
| 292 |
|
| 293 |
2. **Timeout errors**
|
| 294 |
-
- Increase timeout in
|
| 295 |
-
- Use faster models
|
| 296 |
- Check network connectivity
|
|
|
|
| 297 |
|
| 298 |
3. **Space won't start**
|
| 299 |
-
- Verify `requirements.txt`
|
| 300 |
- Check logs in Space → Logs tab
|
| 301 |
-
- Ensure
|
|
|
|
| 302 |
|
| 303 |
-
4. **
|
| 304 |
-
-
|
| 305 |
-
-
|
| 306 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 307 |
|
| 308 |
## 🎯 Best Practices
|
| 309 |
|
| 310 |
1. **Model Selection**
|
| 311 |
-
- Use 3-5 council members (sweet spot)
|
| 312 |
-
-
|
|
|
|
| 313 |
- Match chairman to task complexity
|
| 314 |
|
| 315 |
2. **Cost Management**
|
| 316 |
-
- Start with
|
| 317 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 318 |
- Set spending limits
|
| 319 |
|
| 320 |
3. **Quality Optimization**
|
|
@@ -324,7 +365,8 @@ The app will be available at `http://localhost:7860`
|
|
| 324 |
|
| 325 |
## 📚 Additional Resources
|
| 326 |
|
| 327 |
-
- [
|
|
|
|
| 328 |
- [Gradio Documentation](https://gradio.app/docs)
|
| 329 |
- [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)
|
| 330 |
|
|
|
|
| 1 |
# LLM Council - Comprehensive Guide
|
| 2 |
|
| 3 |
+
## 📝 Overview
|
| 4 |
|
| 5 |
The LLM Council is a sophisticated multi-agent system that uses multiple Large Language Models (LLMs) to collectively answer questions through a 3-stage deliberation process:
|
| 6 |
|
|
|
|
| 8 |
2. **Stage 2 - Peer Review**: Council members rank each other's anonymized responses
|
| 9 |
3. **Stage 3 - Synthesis**: A chairman model synthesizes the final answer based on all inputs
|
| 10 |
|
| 11 |
+
**Current Implementation**: Uses FREE HuggingFace models (60%) + cheap OpenAI models (40%)
|
| 12 |
+
|
| 13 |
## 🏗️ Architecture
|
| 14 |
|
| 15 |
### Current Implementation
|
|
|
|
| 43 |
└─────────────────────────────────────────────────────────────┘
|
| 44 |
```
|
| 45 |
|
| 46 |
+
## 🔧 Current Models (FREE HuggingFace + OpenAI)
|
| 47 |
+
|
| 48 |
+
### Council Members (5 models)
|
| 49 |
+
**FREE HuggingFace Models** (via Inference API):
|
| 50 |
+
- `meta-llama/Llama-3.3-70B-Instruct` - Meta's latest Llama (FREE)
|
| 51 |
+
- `Qwen/Qwen2.5-72B-Instruct` - Alibaba's Qwen (FREE)
|
| 52 |
+
- `mistralai/Mixtral-8x7B-Instruct-v0.1` - Mistral MoE (FREE)
|
| 53 |
|
| 54 |
+
**OpenAI Models** (paid but cheap):
|
| 55 |
+
- `gpt-4o-mini` - Fast, affordable GPT-4 variant
|
| 56 |
+
- `gpt-3.5-turbo` - Ultra cheap, still capable
|
|
|
|
| 57 |
|
| 58 |
### Chairman
|
| 59 |
+
- `gpt-4o-mini` - Final synthesis model
|
| 60 |
|
| 61 |
+
**Benefits of Current Setup:**
|
| 62 |
+
- 60% of models are completely FREE (HuggingFace)
|
| 63 |
+
- 40% use cheap OpenAI models ($0.001-0.01 per query)
|
| 64 |
+
- 90-99% cost reduction compared to all-paid alternatives
|
| 65 |
+
- No experimental/beta endpoints - all stable APIs
|
| 66 |
+
- Diverse model providers for varied perspectives
|
| 67 |
|
| 68 |
+
## ✨ Alternative Model Configurations
|
| 69 |
|
| 70 |
+
### All-FREE Council (100% HuggingFace)
|
| 71 |
|
| 72 |
```python
|
| 73 |
COUNCIL_MODELS = [
|
| 74 |
+
{"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
|
| 75 |
+
{"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
|
| 76 |
+
{"provider": "huggingface", "model": "mistralai/Mixtral-8x7B-Instruct-v0.1"},
|
| 77 |
+
{"provider": "huggingface", "model": "meta-llama/Llama-3.1-405B-Instruct"},
|
| 78 |
+
{"provider": "huggingface", "model": "microsoft/Phi-3.5-MoE-instruct"},
|
| 79 |
]
|
| 80 |
+
CHAIRMAN_MODEL = {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"}
|
|
|
|
| 81 |
```
|
| 82 |
+
**Cost**: $0.00 per query!
|
| 83 |
|
| 84 |
+
### Premium Council (OpenAI + HuggingFace)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
|
|
|
| 86 |
```python
|
| 87 |
COUNCIL_MODELS = [
|
| 88 |
+
{"provider": "openai", "model": "gpt-4o"},
|
| 89 |
+
{"provider": "openai", "model": "gpt-4-turbo"},
|
| 90 |
+
{"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
|
| 91 |
+
{"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
|
| 92 |
+
{"provider": "openai", "model": "gpt-3.5-turbo"},
|
| 93 |
]
|
| 94 |
+
CHAIRMAN_MODEL = {"provider": "openai", "model": "gpt-4o"}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
```
|
| 96 |
+
**Cost**: ~$0.05-0.15 per query
|
| 97 |
|
| 98 |
## 🚀 Running on Hugging Face Spaces
|
| 99 |
|
| 100 |
### Prerequisites
|
| 101 |
|
| 102 |
+
1. **OpenAI API Key**:
|
| 103 |
+
- Sign up at [platform.openai.com](https://platform.openai.com/)
|
| 104 |
+
- Go to API Keys → Create new secret key
|
| 105 |
+
- Copy your key (starts with `sk-`)
|
| 106 |
+
- Add billing info and credits ($5-10 is plenty)
|
|
|
|
|
|
|
| 107 |
|
| 108 |
+
2. **HuggingFace API Token**:
|
| 109 |
+
- Sign up at [huggingface.co](https://huggingface.co/)
|
| 110 |
+
- Go to Settings → Access Tokens → New token
|
| 111 |
+
- Copy your token (starts with `hf_`)
|
| 112 |
+
- FREE! No billing required
|
| 113 |
|
| 114 |
+
3. **HuggingFace Account**: For deploying Spaces
|
|
|
|
|
|
|
| 115 |
|
| 116 |
+
### Step-by-Step Deployment
|
|
|
|
|
|
|
| 117 |
|
| 118 |
+
### Step-by-Step Deployment
|
|
|
|
| 119 |
|
| 120 |
+
#### Method 1: Deploy Your Existing Code
|
| 121 |
|
| 122 |
1. **Create New Space**
|
|
|
|
| 123 |
- Go to huggingface.co/new-space
|
| 124 |
- Choose "Gradio" as SDK
|
| 125 |
- Select SDK version: 6.0.0
|
| 126 |
+
- Choose hardware: CPU (free)
|
|
|
|
| 127 |
|
| 128 |
+
2. **Push Your Code**
|
| 129 |
```bash
|
| 130 |
+
# Clone your space
|
| 131 |
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
|
| 132 |
cd YOUR_SPACE_NAME
|
| 133 |
|
| 134 |
+
# Copy your LLM Council code
|
| 135 |
cp -r /path/to/llm_council/* .
|
| 136 |
|
| 137 |
+
# Commit and push
|
| 138 |
git add .
|
| 139 |
+
git commit -m "Initial deployment"
|
| 140 |
git push
|
| 141 |
```
|
| 142 |
|
| 143 |
+
3. **Configure Secrets**
|
| 144 |
+
- Go to your space → Settings → Repository secrets
|
| 145 |
+
- Add secret #1:
|
| 146 |
+
- Name: `OPENAI_API_KEY`
|
| 147 |
+
- Value: (your OpenAI key starting with `sk-`)
|
| 148 |
+
- Add secret #2:
|
| 149 |
+
- Name: `HUGGINGFACE_API_KEY`
|
| 150 |
+
- Value: (your HuggingFace token starting with `hf_`)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
|
| 152 |
+
4. **Space Auto-Restarts**
|
| 153 |
+
- HF Spaces will automatically rebuild and deploy
|
| 154 |
+
- Check the "Logs" tab to verify successful startup
|
| 155 |
|
| 156 |
### Required Files Structure
|
| 157 |
|
|
|
|
| 172 |
|
| 173 |
## 🔐 Environment Variables
|
| 174 |
|
| 175 |
+
### Required Variables
|
| 176 |
+
|
| 177 |
+
**For Local Development** (`.env` file):
|
| 178 |
+
```bash
|
| 179 |
+
OPENAI_API_KEY=sk-proj-your-key-here
|
| 180 |
+
HUGGINGFACE_API_KEY=hf_your-token-here
|
| 181 |
+
```
|
| 182 |
+
|
| 183 |
+
**For HuggingFace Spaces** (Settings → Repository secrets):
|
| 184 |
+
- Secret 1: `OPENAI_API_KEY` = `sk-proj-...`
|
| 185 |
+
- Secret 2: `HUGGINGFACE_API_KEY` = `hf_...`
|
| 186 |
+
|
| 187 |
+
### API Endpoints Used
|
| 188 |
+
|
| 189 |
+
**HuggingFace Inference API**:
|
| 190 |
+
- Endpoint: `https://router.huggingface.co/v1/chat/completions`
|
| 191 |
+
- Format: OpenAI-compatible
|
| 192 |
+
- Cost: FREE for inference API
|
| 193 |
+
- Models: Llama, Qwen, Mixtral, etc.
|
| 194 |
+
|
| 195 |
+
**OpenAI API**:
|
| 196 |
+
- Endpoint: `https://api.openai.com/v1/chat/completions`
|
| 197 |
+
- Format: Native OpenAI
|
| 198 |
+
- Cost: Pay-per-token (very cheap for mini/3.5-turbo)
|
| 199 |
+
- Models: GPT-4o-mini, GPT-3.5-turbo, GPT-4o
|
| 200 |
+
|
| 201 |
Create `.env` file locally (DO NOT commit to git):
|
| 202 |
|
| 203 |
```env
|
| 204 |
+
OPENAI_API_KEY=sk-proj-your-key-here
|
| 205 |
+
HUGGINGFACE_API_KEY=hf_your-token-here
|
| 206 |
```
|
| 207 |
|
| 208 |
For Hugging Face Spaces, use Repository Secrets instead of `.env` file.
|
|
|
|
| 213 |
gradio>=6.0.0
|
| 214 |
httpx>=0.27.0
|
| 215 |
python-dotenv>=1.0.0
|
| 216 |
+
openai>=1.0.0 # For OpenAI API
|
|
|
|
|
|
|
| 217 |
```
|
| 218 |
|
| 219 |
+
**Note**: The system uses:
|
| 220 |
+
- `httpx` for async HTTP requests to HuggingFace API
|
| 221 |
+
- `openai` SDK for OpenAI API calls
|
| 222 |
+
- `python-dotenv` to load environment variables from `.env`
|
| 223 |
+
|
| 224 |
## 💻 Running Locally
|
| 225 |
|
| 226 |
```bash
|
|
|
|
| 235 |
# 3. Install dependencies
|
| 236 |
pip install -r requirements.txt
|
| 237 |
|
| 238 |
+
# 4. Create .env file with both API keys
|
| 239 |
+
echo OPENAI_API_KEY=sk-proj-your-key-here > .env
|
| 240 |
+
echo HUGGINGFACE_API_KEY=hf_your-token-here >> .env
|
| 241 |
|
| 242 |
# 5. Run the app
|
| 243 |
python app.py
|
|
|
|
| 245 |
|
| 246 |
The app will be available at `http://localhost:7860`
|
| 247 |
|
| 248 |
+
## 🔧 Code Architecture
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 249 |
|
| 250 |
+
### Key Components
|
|
|
|
|
|
|
|
|
|
|
|
|
| 251 |
|
| 252 |
+
**1. Dual API Client** (`backend/api_client.py`):
|
| 253 |
+
- Supports both HuggingFace and OpenAI APIs
|
| 254 |
+
- Automatic retry logic with exponential backoff
|
| 255 |
+
- Graceful error handling and fallbacks
|
| 256 |
+
- Parallel model querying for efficiency
|
| 257 |
|
| 258 |
+
**2. FREE Model Configuration** (`backend/config_free.py`):
|
| 259 |
+
- Mix of FREE HuggingFace + cheap OpenAI models
|
| 260 |
+
- Configurable timeouts and retries
|
| 261 |
+
- Easy to customize and extend
|
| 262 |
+
|
| 263 |
+
**3. Council Orchestration** (`backend/council_free.py`):
|
| 264 |
+
- Stage 1: Parallel response collection
|
| 265 |
+
- Stage 2: Peer ranking system
|
| 266 |
+
- Stage 3: Chairman synthesis with streaming
|
| 267 |
+
|
| 268 |
+
### Error Handling Features
|
| 269 |
+
- Retry logic with exponential backoff (3 attempts)
|
| 270 |
+
- Graceful handling of individual model failures
|
| 271 |
+
- Detailed error logging for debugging
|
| 272 |
+
- Timeout management (60s default)
|
| 273 |
+
|
| 274 |
+
### Benefits of Current Architecture
|
| 275 |
+
- **Cost Efficient**: 60% FREE models, 40% ultra-cheap
|
| 276 |
+
- **Robust**: Retry logic handles transient failures
|
| 277 |
+
- **Fast**: Parallel execution minimizes wait time
|
| 278 |
+
- **Flexible**: Easy to add/remove models
|
| 279 |
+
- **Observable**: Detailed logging for debugging
|
| 280 |
|
| 281 |
## 📊 Performance Characteristics
|
| 282 |
|
| 283 |
+
### Typical Response Times (Current Setup)
|
| 284 |
+
- **Stage 1**: 10-30 seconds (5 models in parallel)
|
| 285 |
+
- **Stage 2**: 15-45 seconds (peer rankings)
|
| 286 |
+
- **Stage 3**: 15-40 seconds (synthesis with streaming)
|
| 287 |
+
- **Total**: ~40-115 seconds per question
|
| 288 |
+
|
| 289 |
+
### Cost per Query (Current Setup)
|
| 290 |
+
- **FREE HuggingFace portion**: $0.00 (3 models)
|
| 291 |
+
- **OpenAI portion**: $0.001-0.01 (2 models)
|
| 292 |
+
- **Total**: ~$0.001-0.01 per query
|
| 293 |
|
| 294 |
+
**Comparison to alternatives**:
|
| 295 |
+
- 90-99% cheaper than all-paid services
|
| 296 |
+
- Similar quality to premium setups
|
| 297 |
+
- Faster than sequential execution
|
| 298 |
|
| 299 |
*Costs vary based on prompt length and response complexity*
|
| 300 |
|
|
|
|
| 302 |
|
| 303 |
### Common Issues
|
| 304 |
|
| 305 |
+
1. **"401 Unauthorized" errors**
|
| 306 |
+
- Check both API keys are set correctly
|
| 307 |
+
- Verify OpenAI key starts with `sk-`
|
| 308 |
+
- Verify HuggingFace key starts with `hf_`
|
| 309 |
+
- Ensure OpenAI account has billing/credits enabled
|
| 310 |
+
- Check Space secrets are named exactly: `OPENAI_API_KEY` and `HUGGINGFACE_API_KEY`
|
| 311 |
|
| 312 |
2. **Timeout errors**
|
| 313 |
+
- Increase timeout in `backend/config_free.py`
|
|
|
|
| 314 |
- Check network connectivity
|
| 315 |
+
- Some models may be slow - consider replacing
|
| 316 |
|
| 317 |
3. **Space won't start**
|
| 318 |
+
- Verify `requirements.txt` includes all dependencies
|
| 319 |
- Check logs in Space → Logs tab
|
| 320 |
+
- Ensure both secrets are added (not just one)
|
| 321 |
+
- Verify Python version compatibility (3.10+)
|
| 322 |
|
| 323 |
+
4. **Some models fail, others work**
|
| 324 |
+
- Normal! System is designed to handle partial failures
|
| 325 |
+
- Check logs to see which models failed
|
| 326 |
+
- HuggingFace API may have rate limits (rare)
|
| 327 |
+
- OpenAI API requires billing setup
|
| 328 |
+
|
| 329 |
+
5. **HuggingFace 410 error**
|
| 330 |
+
- Old endpoint deprecated
|
| 331 |
+
- Ensure using `router.huggingface.co/v1/chat/completions`
|
| 332 |
+
- Update `backend/api_client.py` if needed
|
| 333 |
|
| 334 |
## 🎯 Best Practices
|
| 335 |
|
| 336 |
1. **Model Selection**
|
| 337 |
+
- Use 3-5 council members (sweet spot for quality vs speed)
|
| 338 |
+
- Mix FREE HuggingFace + cheap OpenAI for best value
|
| 339 |
+
- Choose diverse models for varied perspectives
|
| 340 |
- Match chairman to task complexity
|
| 341 |
|
| 342 |
2. **Cost Management**
|
| 343 |
+
- Start with current setup ($0.001-0.01 per query)
|
| 344 |
+
- Consider all-FREE HuggingFace config for $0 cost
|
| 345 |
+
- Monitor OpenAI usage at platform.openai.com/usage
|
| 346 |
+
- Set spending limits in OpenAI billing settings
|
| 347 |
+
|
| 348 |
+
3. **Quality Optimization**
|
| 349 |
+
- Use more council members for important queries (5-7)
|
| 350 |
+
- Use better chairman (gpt-4o instead of gpt-4o-mini)
|
| 351 |
+
- Adjust timeouts based on model speed
|
| 352 |
+
- Test different model combinations
|
| 353 |
+
|
| 354 |
+
4. **Security**
|
| 355 |
+
- NEVER commit .env to git (use .gitignore)
|
| 356 |
+
- Use HuggingFace Space secrets for production
|
| 357 |
+
- Rotate API keys periodically
|
| 358 |
+
- Monitor usage for anomalies
|
| 359 |
- Set spending limits
|
| 360 |
|
| 361 |
3. **Quality Optimization**
|
|
|
|
| 365 |
|
| 366 |
## 📚 Additional Resources
|
| 367 |
|
| 368 |
+
- [OpenAI API Documentation](https://platform.openai.com/docs)
|
| 369 |
+
- [HuggingFace Inference API](https://huggingface.co/docs/api-inference)
|
| 370 |
- [Gradio Documentation](https://gradio.app/docs)
|
| 371 |
- [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)
|
| 372 |
|
QUICKSTART.md
CHANGED
|
@@ -9,11 +9,19 @@ A sophisticated multi-LLM system where multiple AI models:
|
|
| 9 |
|
| 10 |
## ⚡ Quick Setup (5 minutes)
|
| 11 |
|
| 12 |
-
### 1️⃣ Get
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
2. Sign up / Login
|
| 15 |
-
3. Go to
|
| 16 |
-
4. Copy your
|
| 17 |
|
| 18 |
### 2️⃣ Set Up Locally
|
| 19 |
|
|
@@ -22,10 +30,8 @@ A sophisticated multi-LLM system where multiple AI models:
|
|
| 22 |
pip install -r requirements.txt
|
| 23 |
|
| 24 |
# Create environment file
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
# Edit .env and add your API key
|
| 28 |
-
# OPENROUTER_API_KEY=your_key_here
|
| 29 |
```
|
| 30 |
|
| 31 |
### 3️⃣ Run It!
|
|
@@ -38,13 +44,7 @@ Visit `http://localhost:7860` 🎉
|
|
| 38 |
|
| 39 |
## 🌐 Deploy to Hugging Face Spaces (FREE)
|
| 40 |
|
| 41 |
-
###
|
| 42 |
-
1. Visit your HuggingFace Space
|
| 43 |
-
2. Click "⋮" → "Duplicate this Space"
|
| 44 |
-
3. Settings → Repository secrets → Add `OPENROUTER_API_KEY`
|
| 45 |
-
4. Done! Your space will auto-deploy
|
| 46 |
-
|
| 47 |
-
### Option B: Create New Space
|
| 48 |
1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
|
| 49 |
2. Choose Gradio SDK 6.0.0
|
| 50 |
3. Clone and push your code:
|
|
@@ -56,65 +56,70 @@ git add .
|
|
| 56 |
git commit -m "Initial commit"
|
| 57 |
git push
|
| 58 |
```
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
## 🎯 Usage Examples
|
| 62 |
|
| 63 |
### Simple Question
|
| 64 |
```
|
| 65 |
Question: What is the capital of France?
|
| 66 |
-
⏱️ Response time: ~
|
| 67 |
-
💰 Cost: ~$0.
|
| 68 |
```
|
| 69 |
|
| 70 |
### Complex Analysis
|
| 71 |
```
|
| 72 |
Question: Compare pros and cons of renewable energy
|
| 73 |
-
⏱️ Response time: ~
|
| 74 |
-
💰 Cost: ~$0.
|
| 75 |
```
|
| 76 |
|
| 77 |
-
##
|
| 78 |
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
mv backend/openrouter.py backend/openrouter_old.py
|
| 85 |
-
|
| 86 |
-
# Use improved versions
|
| 87 |
-
mv backend/config_improved.py backend/config.py
|
| 88 |
-
mv backend/openrouter_improved.py backend/openrouter.py
|
| 89 |
-
```
|
| 90 |
|
| 91 |
-
**
|
| 92 |
-
- DeepSeek V3 (Chat & Reasoner)
|
| 93 |
-
- Claude 3.7 Sonnet
|
| 94 |
-
- GPT-4o
|
| 95 |
-
- Gemini 2.0 Flash Thinking
|
| 96 |
-
- QwQ 32B
|
| 97 |
|
| 98 |
## 📊 Monitor Usage
|
| 99 |
|
| 100 |
-
|
| 101 |
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
-
|
|
|
|
|
|
|
| 106 |
|
| 107 |
## ❓ Troubleshooting
|
| 108 |
|
| 109 |
-
**"
|
| 110 |
-
- ✅ Check API
|
| 111 |
-
- ✅ Verify
|
| 112 |
-
- ✅
|
|
|
|
| 113 |
|
| 114 |
**Space won't start on HF**
|
| 115 |
- ✅ Check logs in Space → Logs tab
|
| 116 |
-
- ✅ Verify secret
|
| 117 |
- ✅ Ensure requirements.txt is present
|
|
|
|
| 118 |
|
| 119 |
**Slow responses**
|
| 120 |
- ✅ Normal! 3 stages take 45-135 seconds
|
|
@@ -128,20 +133,25 @@ Typical costs:
|
|
| 128 |
|
| 129 |
## 💡 Tips
|
| 130 |
|
| 131 |
-
1. **
|
| 132 |
-
2. **
|
| 133 |
-
3. **Monitor
|
| 134 |
-
4. **Set spending limits** to avoid surprises
|
| 135 |
|
| 136 |
## 🎨 Customization
|
| 137 |
|
| 138 |
-
Edit `backend/
|
| 139 |
-
- Change council models
|
| 140 |
- Adjust chairman model
|
| 141 |
- Modify timeouts
|
| 142 |
- Configure retries
|
| 143 |
|
| 144 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 145 |
|
| 146 |
---
|
| 147 |
|
|
|
|
| 9 |
|
| 10 |
## ⚡ Quick Setup (5 minutes)
|
| 11 |
|
| 12 |
+
### 1️⃣ Get API Keys
|
| 13 |
+
|
| 14 |
+
**OpenAI API Key** (for GPT models):
|
| 15 |
+
1. Go to [platform.openai.com](https://platform.openai.com/)
|
| 16 |
+
2. Sign up / Login
|
| 17 |
+
3. Go to API Keys → Create new secret key
|
| 18 |
+
4. Copy your API key (starts with `sk-`)
|
| 19 |
+
|
| 20 |
+
**HuggingFace API Key** (for FREE models):
|
| 21 |
+
1. Go to [huggingface.co](https://huggingface.co/)
|
| 22 |
2. Sign up / Login
|
| 23 |
+
3. Go to Settings → Access Tokens → Create new token
|
| 24 |
+
4. Copy your token (starts with `hf_`)
|
| 25 |
|
| 26 |
### 2️⃣ Set Up Locally
|
| 27 |
|
|
|
|
| 30 |
pip install -r requirements.txt
|
| 31 |
|
| 32 |
# Create environment file
|
| 33 |
+
echo OPENAI_API_KEY=your_openai_key_here > .env
|
| 34 |
+
echo HUGGINGFACE_API_KEY=your_hf_token_here >> .env
|
|
|
|
|
|
|
| 35 |
```
|
| 36 |
|
| 37 |
### 3️⃣ Run It!
|
|
|
|
| 44 |
|
| 45 |
## 🌐 Deploy to Hugging Face Spaces (FREE)
|
| 46 |
|
| 47 |
+
### Step 1: Create New Space
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
|
| 49 |
2. Choose Gradio SDK 6.0.0
|
| 50 |
3. Clone and push your code:
|
|
|
|
| 56 |
git commit -m "Initial commit"
|
| 57 |
git push
|
| 58 |
```
|
| 59 |
+
|
| 60 |
+
### Step 2: Add API Keys as Secrets
|
| 61 |
+
1. Go to your Space → Settings → Repository secrets
|
| 62 |
+
2. Add first secret:
|
| 63 |
+
- Name: `OPENAI_API_KEY`
|
| 64 |
+
- Value: (your OpenAI API key starting with `sk-`)
|
| 65 |
+
3. Add second secret:
|
| 66 |
+
- Name: `HUGGINGFACE_API_KEY`
|
| 67 |
+
- Value: (your HuggingFace token starting with `hf_`)
|
| 68 |
+
4. Space will auto-restart and deploy!
|
| 69 |
|
| 70 |
## 🎯 Usage Examples
|
| 71 |
|
| 72 |
### Simple Question
|
| 73 |
```
|
| 74 |
Question: What is the capital of France?
|
| 75 |
+
⏱️ Response time: ~20-40 seconds
|
| 76 |
+
💰 Cost: ~$0.001-0.005 (90% cheaper with FREE HF models!)
|
| 77 |
```
|
| 78 |
|
| 79 |
### Complex Analysis
|
| 80 |
```
|
| 81 |
Question: Compare pros and cons of renewable energy
|
| 82 |
+
⏱️ Response time: ~60-120 seconds
|
| 83 |
+
💰 Cost: ~$0.005-0.015 (3 FREE HF models + 2 cheap OpenAI models)
|
| 84 |
```
|
| 85 |
|
| 86 |
+
## 🤖 Current Models
|
| 87 |
|
| 88 |
+
**FREE HuggingFace Models** (60% of council):
|
| 89 |
+
- Meta Llama 3.3 70B Instruct
|
| 90 |
+
- Qwen 2.5 72B Instruct
|
| 91 |
+
- Mistral Mixtral 8x7B Instruct
|
| 92 |
|
| 93 |
+
**OpenAI Models** (40% of council):
|
| 94 |
+
- GPT-4o-mini (very cheap)
|
| 95 |
+
- GPT-3.5-turbo (ultra cheap)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
|
| 97 |
+
**Chairman**: GPT-4o-mini (final synthesis)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
|
| 99 |
## 📊 Monitor Usage
|
| 100 |
|
| 101 |
+
**OpenAI Costs**: Check at [platform.openai.com/usage](https://platform.openai.com/usage)
|
| 102 |
|
| 103 |
+
**HuggingFace**: FREE! No monitoring needed
|
| 104 |
+
|
| 105 |
+
Typical costs per query:
|
| 106 |
+
- **Current Setup**: $0.001-0.01 (90-99% cheaper than alternatives!)
|
| 107 |
+
- 3 models are completely FREE (HuggingFace)
|
| 108 |
+
- Only pay for OpenAI models (GPT-4o-mini, GPT-3.5-turbo)
|
| 109 |
|
| 110 |
## ❓ Troubleshooting
|
| 111 |
|
| 112 |
+
**"401 Unauthorized" errors**
|
| 113 |
+
- ✅ Check both API keys in .env (locally) or Space secrets (HuggingFace)
|
| 114 |
+
- ✅ Verify OpenAI key starts with `sk-`
|
| 115 |
+
- ✅ Verify HuggingFace key starts with `hf_`
|
| 116 |
+
- ✅ Ensure OpenAI account has credits (check billing)
|
| 117 |
|
| 118 |
**Space won't start on HF**
|
| 119 |
- ✅ Check logs in Space → Logs tab
|
| 120 |
+
- ✅ Verify secret names are exact: `OPENAI_API_KEY` and `HUGGINGFACE_API_KEY`
|
| 121 |
- ✅ Ensure requirements.txt is present
|
| 122 |
+
- ✅ Both secrets must be added (not just one)
|
| 123 |
|
| 124 |
**Slow responses**
|
| 125 |
- ✅ Normal! 3 stages take 45-135 seconds
|
|
|
|
| 133 |
|
| 134 |
## 💡 Tips
|
| 135 |
|
| 136 |
+
1. **Already using FREE models!** - 3 out of 5 models cost nothing
|
| 137 |
+
2. **Very cheap**: Only ~$0.001-0.01 per query (OpenAI portion)
|
| 138 |
+
3. **Monitor OpenAI usage** at platform.openai.com/usage
|
| 139 |
+
4. **Set OpenAI spending limits** in billing settings to avoid surprises
|
| 140 |
|
| 141 |
## 🎨 Customization
|
| 142 |
|
| 143 |
+
Edit `backend/config_free.py` to:
|
| 144 |
+
- Change council models (add more FREE HuggingFace models!)
|
| 145 |
- Adjust chairman model
|
| 146 |
- Modify timeouts
|
| 147 |
- Configure retries
|
| 148 |
|
| 149 |
+
**FREE HuggingFace models you can add**:
|
| 150 |
+
- `meta-llama/Llama-3.1-405B-Instruct` (huge!)
|
| 151 |
+
- `mistralai/Mistral-Nemo-Instruct-2407`
|
| 152 |
+
- `microsoft/Phi-3.5-MoE-instruct`
|
| 153 |
+
|
| 154 |
+
See `backend/config_free.py` for examples!
|
| 155 |
|
| 156 |
---
|
| 157 |
|