Krishna Chaitanya Cheedella commited on
Commit
f3045de
·
1 Parent(s): 1e90386

Update deployment guides for OpenAI + HuggingFace setup

Browse files
Files changed (2) hide show
  1. DEPLOYMENT_GUIDE.md +190 -148
  2. QUICKSTART.md +64 -54
DEPLOYMENT_GUIDE.md CHANGED
@@ -1,6 +1,6 @@
1
  # LLM Council - Comprehensive Guide
2
 
3
- ## 📋 Overview
4
 
5
  The LLM Council is a sophisticated multi-agent system that uses multiple Large Language Models (LLMs) to collectively answer questions through a 3-stage deliberation process:
6
 
@@ -8,6 +8,8 @@ The LLM Council is a sophisticated multi-agent system that uses multiple Large L
8
  2. **Stage 2 - Peer Review**: Council members rank each other's anonymized responses
9
  3. **Stage 3 - Synthesis**: A chairman model synthesizes the final answer based on all inputs
10
 
 
 
11
  ## 🏗️ Architecture
12
 
13
  ### Current Implementation
@@ -41,143 +43,115 @@ The LLM Council is a sophisticated multi-agent system that uses multiple Large L
41
  └─────────────────────────────────────────────────────────────┘
42
  ```
43
 
44
- ## 🔧 Current Models (Original)
 
 
 
 
 
 
45
 
46
- ### Council Members
47
- - `openai/gpt-oss-120b:hyperbolic` - Open source model via Hyperbolic
48
- - `deepseek-ai/DeepSeek-V3.2-Exp:novita` - DeepSeek experimental via Novita
49
- - `Qwen/Qwen3-235B-A22B-Instruct-2507:hyperbolic` - Qwen large model
50
 
51
  ### Chairman
52
- - `deepseek-ai/DeepSeek-V3.2-Exp:novita`
53
 
54
- **Issues with Current Setup:**
55
- - Using experimental/beta endpoints which may be unstable
56
- - Limited diversity in model providers
57
- - Some models may not be optimally configured
 
 
58
 
59
- ## ✨ IMPROVED Model Recommendations
60
 
61
- ### Recommended Council (Balanced Quality & Cost)
62
 
63
  ```python
64
  COUNCIL_MODELS = [
65
- "deepseek/deepseek-chat", # DeepSeek V3 - excellent reasoning
66
- "anthropic/claude-3.7-sonnet", # Claude 3.7 - strong analysis
67
- "openai/gpt-4o", # GPT-4o - reliable & versatile
68
- "google/gemini-2.0-flash-thinking-exp:free", # Fast thinking
69
- "qwen/qwq-32b-preview", # Strong reasoning
70
  ]
71
-
72
- CHAIRMAN_MODEL = "deepseek/deepseek-reasoner" # DeepSeek R1 for synthesis
73
  ```
 
74
 
75
- ### Alternative Configurations
76
-
77
- #### Budget Council (Fast & Cost-Effective)
78
- ```python
79
- COUNCIL_MODELS = [
80
- "deepseek/deepseek-chat",
81
- "google/gemini-2.0-flash-exp:free",
82
- "qwen/qwen-2.5-72b-instruct",
83
- "meta-llama/llama-3.3-70b-instruct",
84
- ]
85
- CHAIRMAN_MODEL = "deepseek/deepseek-chat"
86
- ```
87
 
88
- #### Premium Council (Maximum Quality)
89
  ```python
90
  COUNCIL_MODELS = [
91
- "anthropic/claude-3.7-sonnet",
92
- "openai/o1",
93
- "google/gemini-exp-1206",
94
- "anthropic/claude-3-opus",
95
- "x-ai/grok-2-1212",
96
  ]
97
- CHAIRMAN_MODEL = "openai/o1" # or "anthropic/claude-3.7-sonnet"
98
- ```
99
-
100
- #### Reasoning Council (Complex Problems)
101
- ```python
102
- COUNCIL_MODELS = [
103
- "openai/o1-mini",
104
- "deepseek/deepseek-reasoner",
105
- "google/gemini-2.0-flash-thinking-exp:free",
106
- "qwen/qwq-32b-preview",
107
- ]
108
- CHAIRMAN_MODEL = "deepseek/deepseek-reasoner"
109
  ```
 
110
 
111
  ## 🚀 Running on Hugging Face Spaces
112
 
113
  ### Prerequisites
114
 
115
- 1. **OpenRouter API Key**: Sign up at [openrouter.ai](https://openrouter.ai/) and get your API key
116
-
117
- 2. **Hugging Face Account**: Create account at [huggingface.co](https://huggingface.co/)
118
-
119
- ### Step-by-Step Deployment
120
-
121
- #### Method 1: Using Existing Space (Fork)
122
 
123
- 1. **Fork the Space**
124
- - Visit your existing HuggingFace Space
125
- - Click "⋮""Duplicate this Space"
126
- - Choose a name for your space
 
127
 
128
- 2. **Configure Secrets**
129
- - Go to your space → Settings → Repository secrets
130
- - Add secret: `OPENROUTER_API_KEY` with your OpenRouter API key
131
 
132
- 3. **Update Models (Optional)**
133
- - Edit `backend/config.py` to use recommended models
134
- - Commit changes
135
 
136
- 4. **Space Auto-Restarts**
137
- - HF Spaces will automatically rebuild and deploy
138
 
139
- #### Method 2: Create New Space from Scratch
140
 
141
  1. **Create New Space**
142
- ```
143
  - Go to huggingface.co/new-space
144
  - Choose "Gradio" as SDK
145
  - Select SDK version: 6.0.0
146
- - Choose hardware: CPU (free) or GPU (paid)
147
- ```
148
 
149
- 2. **Upload Files**
150
  ```bash
151
- # Clone your local repo
152
  git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
153
  cd YOUR_SPACE_NAME
154
 
155
- # Copy your files
156
  cp -r /path/to/llm_council/* .
157
 
158
- # Add and commit
159
  git add .
160
- git commit -m "Initial commit"
161
  git push
162
  ```
163
 
164
- 3. **Configure Space**
165
- - Create `README.md` with metadata:
166
- ```markdown
167
- ---
168
- title: LLM Council
169
- emoji: 🏢
170
- colorFrom: pink
171
- colorTo: green
172
- sdk: gradio
173
- sdk_version: 6.0.0
174
- app_file: app.py
175
- pinned: false
176
- ---
177
- ```
178
 
179
- 4. **Add Secret**
180
- - Settings Repository secrets Add `OPENROUTER_API_KEY`
 
181
 
182
  ### Required Files Structure
183
 
@@ -198,10 +172,37 @@ your-space/
198
 
199
  ## 🔐 Environment Variables
200
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
  Create `.env` file locally (DO NOT commit to git):
202
 
203
  ```env
204
- OPENROUTER_API_KEY=your_openrouter_api_key_here
 
205
  ```
206
 
207
  For Hugging Face Spaces, use Repository Secrets instead of `.env` file.
@@ -212,11 +213,14 @@ For Hugging Face Spaces, use Repository Secrets instead of `.env` file.
212
  gradio>=6.0.0
213
  httpx>=0.27.0
214
  python-dotenv>=1.0.0
215
- fastapi>=0.115.0 # Optional - for REST API
216
- uvicorn>=0.30.0 # Optional - for REST API
217
- pydantic>=2.0.0 # Optional - for REST API
218
  ```
219
 
 
 
 
 
 
220
  ## 💻 Running Locally
221
 
222
  ```bash
@@ -231,8 +235,9 @@ source venv/bin/activate # On Windows: venv\Scripts\activate
231
  # 3. Install dependencies
232
  pip install -r requirements.txt
233
 
234
- # 4. Create .env file
235
- echo "OPENROUTER_API_KEY=your_key_here" > .env
 
236
 
237
  # 5. Run the app
238
  python app.py
@@ -240,44 +245,56 @@ python app.py
240
 
241
  The app will be available at `http://localhost:7860`
242
 
243
- ## 🔧 Code Improvements Made
244
-
245
- ### 1. Enhanced Error Handling
246
- - Retry logic with exponential backoff
247
- - Graceful handling of model failures
248
- - Better timeout management
249
- - Detailed error logging
250
 
251
- ### 2. Better Model Configuration
252
- - Updated to latest stable models
253
- - Multiple configuration presets
254
- - Configurable timeouts and retries
255
- - Clear documentation of alternatives
256
 
257
- ### 3. Improved API Client
258
- - Proper HTTP headers (Referer, Title)
259
- - Robust streaming support
260
- - Better exception handling
261
- - Status reporting during parallel queries
262
 
263
- ### 4. Documentation
264
- - Comprehensive deployment guide
265
- - Architecture diagrams
266
- - Configuration examples
267
- - Troubleshooting tips
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
268
 
269
  ## 📊 Performance Characteristics
270
 
271
- ### Typical Response Times (Balanced Config)
272
- - **Stage 1**: 10-30 seconds (parallel execution)
273
- - **Stage 2**: 15-45 seconds (parallel ranking)
274
- - **Stage 3**: 20-60 seconds (synthesis)
275
- - **Total**: ~45-135 seconds per question
 
 
 
 
 
276
 
277
- ### Cost per Query (Approximate)
278
- - Budget Council: $0.01 - $0.03
279
- - Balanced Council: $0.05 - $0.15
280
- - Premium Council: $0.20 - $0.50
281
 
282
  *Costs vary based on prompt length and response complexity*
283
 
@@ -285,36 +302,60 @@ The app will be available at `http://localhost:7860`
285
 
286
  ### Common Issues
287
 
288
- 1. **"All models failed to respond"**
289
- - Check API key is valid
290
- - Verify OpenRouter credit balance
291
- - Check model availability on OpenRouter
 
 
292
 
293
  2. **Timeout errors**
294
- - Increase timeout in config
295
- - Use faster models
296
  - Check network connectivity
 
297
 
298
  3. **Space won't start**
299
- - Verify `requirements.txt` is correct
300
  - Check logs in Space → Logs tab
301
- - Ensure Python version compatibility
 
302
 
303
- 4. **Slow responses**
304
- - Consider Budget Council configuration
305
- - Reduce number of council members
306
- - Use faster models
 
 
 
 
 
 
307
 
308
  ## 🎯 Best Practices
309
 
310
  1. **Model Selection**
311
- - Use 3-5 council members (sweet spot)
312
- - Choose diverse models from different providers
 
313
  - Match chairman to task complexity
314
 
315
  2. **Cost Management**
316
- - Start with Budget Council for testing
317
- - Monitor usage on OpenRouter dashboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
318
  - Set spending limits
319
 
320
  3. **Quality Optimization**
@@ -324,7 +365,8 @@ The app will be available at `http://localhost:7860`
324
 
325
  ## 📚 Additional Resources
326
 
327
- - [OpenRouter Documentation](https://openrouter.ai/docs)
 
328
  - [Gradio Documentation](https://gradio.app/docs)
329
  - [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)
330
 
 
1
  # LLM Council - Comprehensive Guide
2
 
3
+ ## 📝 Overview
4
 
5
  The LLM Council is a sophisticated multi-agent system that uses multiple Large Language Models (LLMs) to collectively answer questions through a 3-stage deliberation process:
6
 
 
8
  2. **Stage 2 - Peer Review**: Council members rank each other's anonymized responses
9
  3. **Stage 3 - Synthesis**: A chairman model synthesizes the final answer based on all inputs
10
 
11
+ **Current Implementation**: Uses FREE HuggingFace models (60%) + cheap OpenAI models (40%)
12
+
13
  ## 🏗️ Architecture
14
 
15
  ### Current Implementation
 
43
  └─────────────────────────────────────────────────────────────┘
44
  ```
45
 
46
+ ## 🔧 Current Models (FREE HuggingFace + OpenAI)
47
+
48
+ ### Council Members (5 models)
49
+ **FREE HuggingFace Models** (via Inference API):
50
+ - `meta-llama/Llama-3.3-70B-Instruct` - Meta's latest Llama (FREE)
51
+ - `Qwen/Qwen2.5-72B-Instruct` - Alibaba's Qwen (FREE)
52
+ - `mistralai/Mixtral-8x7B-Instruct-v0.1` - Mistral MoE (FREE)
53
 
54
+ **OpenAI Models** (paid but cheap):
55
+ - `gpt-4o-mini` - Fast, affordable GPT-4 variant
56
+ - `gpt-3.5-turbo` - Ultra cheap, still capable
 
57
 
58
  ### Chairman
59
+ - `gpt-4o-mini` - Final synthesis model
60
 
61
+ **Benefits of Current Setup:**
62
+ - 60% of models are completely FREE (HuggingFace)
63
+ - 40% use cheap OpenAI models ($0.001-0.01 per query)
64
+ - 90-99% cost reduction compared to all-paid alternatives
65
+ - No experimental/beta endpoints - all stable APIs
66
+ - Diverse model providers for varied perspectives
67
 
68
+ ## ✨ Alternative Model Configurations
69
 
70
+ ### All-FREE Council (100% HuggingFace)
71
 
72
  ```python
73
  COUNCIL_MODELS = [
74
+ {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
75
+ {"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
76
+ {"provider": "huggingface", "model": "mistralai/Mixtral-8x7B-Instruct-v0.1"},
77
+ {"provider": "huggingface", "model": "meta-llama/Llama-3.1-405B-Instruct"},
78
+ {"provider": "huggingface", "model": "microsoft/Phi-3.5-MoE-instruct"},
79
  ]
80
+ CHAIRMAN_MODEL = {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"}
 
81
  ```
82
+ **Cost**: $0.00 per query!
83
 
84
+ ### Premium Council (OpenAI + HuggingFace)
 
 
 
 
 
 
 
 
 
 
 
85
 
 
86
  ```python
87
  COUNCIL_MODELS = [
88
+ {"provider": "openai", "model": "gpt-4o"},
89
+ {"provider": "openai", "model": "gpt-4-turbo"},
90
+ {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
91
+ {"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
92
+ {"provider": "openai", "model": "gpt-3.5-turbo"},
93
  ]
94
+ CHAIRMAN_MODEL = {"provider": "openai", "model": "gpt-4o"}
 
 
 
 
 
 
 
 
 
 
 
95
  ```
96
+ **Cost**: ~$0.05-0.15 per query
97
 
98
  ## 🚀 Running on Hugging Face Spaces
99
 
100
  ### Prerequisites
101
 
102
+ 1. **OpenAI API Key**:
103
+ - Sign up at [platform.openai.com](https://platform.openai.com/)
104
+ - Go to API Keys → Create new secret key
105
+ - Copy your key (starts with `sk-`)
106
+ - Add billing info and credits ($5-10 is plenty)
 
 
107
 
108
+ 2. **HuggingFace API Token**:
109
+ - Sign up at [huggingface.co](https://huggingface.co/)
110
+ - Go to Settings Access Tokens → New token
111
+ - Copy your token (starts with `hf_`)
112
+ - FREE! No billing required
113
 
114
+ 3. **HuggingFace Account**: For deploying Spaces
 
 
115
 
116
+ ### Step-by-Step Deployment
 
 
117
 
118
+ ### Step-by-Step Deployment
 
119
 
120
+ #### Method 1: Deploy Your Existing Code
121
 
122
  1. **Create New Space**
 
123
  - Go to huggingface.co/new-space
124
  - Choose "Gradio" as SDK
125
  - Select SDK version: 6.0.0
126
+ - Choose hardware: CPU (free)
 
127
 
128
+ 2. **Push Your Code**
129
  ```bash
130
+ # Clone your space
131
  git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
132
  cd YOUR_SPACE_NAME
133
 
134
+ # Copy your LLM Council code
135
  cp -r /path/to/llm_council/* .
136
 
137
+ # Commit and push
138
  git add .
139
+ git commit -m "Initial deployment"
140
  git push
141
  ```
142
 
143
+ 3. **Configure Secrets**
144
+ - Go to your space → Settings → Repository secrets
145
+ - Add secret #1:
146
+ - Name: `OPENAI_API_KEY`
147
+ - Value: (your OpenAI key starting with `sk-`)
148
+ - Add secret #2:
149
+ - Name: `HUGGINGFACE_API_KEY`
150
+ - Value: (your HuggingFace token starting with `hf_`)
 
 
 
 
 
 
151
 
152
+ 4. **Space Auto-Restarts**
153
+ - HF Spaces will automatically rebuild and deploy
154
+ - Check the "Logs" tab to verify successful startup
155
 
156
  ### Required Files Structure
157
 
 
172
 
173
  ## 🔐 Environment Variables
174
 
175
+ ### Required Variables
176
+
177
+ **For Local Development** (`.env` file):
178
+ ```bash
179
+ OPENAI_API_KEY=sk-proj-your-key-here
180
+ HUGGINGFACE_API_KEY=hf_your-token-here
181
+ ```
182
+
183
+ **For HuggingFace Spaces** (Settings → Repository secrets):
184
+ - Secret 1: `OPENAI_API_KEY` = `sk-proj-...`
185
+ - Secret 2: `HUGGINGFACE_API_KEY` = `hf_...`
186
+
187
+ ### API Endpoints Used
188
+
189
+ **HuggingFace Inference API**:
190
+ - Endpoint: `https://router.huggingface.co/v1/chat/completions`
191
+ - Format: OpenAI-compatible
192
+ - Cost: FREE for inference API
193
+ - Models: Llama, Qwen, Mixtral, etc.
194
+
195
+ **OpenAI API**:
196
+ - Endpoint: `https://api.openai.com/v1/chat/completions`
197
+ - Format: Native OpenAI
198
+ - Cost: Pay-per-token (very cheap for mini/3.5-turbo)
199
+ - Models: GPT-4o-mini, GPT-3.5-turbo, GPT-4o
200
+
201
  Create `.env` file locally (DO NOT commit to git):
202
 
203
  ```env
204
+ OPENAI_API_KEY=sk-proj-your-key-here
205
+ HUGGINGFACE_API_KEY=hf_your-token-here
206
  ```
207
 
208
  For Hugging Face Spaces, use Repository Secrets instead of `.env` file.
 
213
  gradio>=6.0.0
214
  httpx>=0.27.0
215
  python-dotenv>=1.0.0
216
+ openai>=1.0.0 # For OpenAI API
 
 
217
  ```
218
 
219
+ **Note**: The system uses:
220
+ - `httpx` for async HTTP requests to HuggingFace API
221
+ - `openai` SDK for OpenAI API calls
222
+ - `python-dotenv` to load environment variables from `.env`
223
+
224
  ## 💻 Running Locally
225
 
226
  ```bash
 
235
  # 3. Install dependencies
236
  pip install -r requirements.txt
237
 
238
+ # 4. Create .env file with both API keys
239
+ echo OPENAI_API_KEY=sk-proj-your-key-here > .env
240
+ echo HUGGINGFACE_API_KEY=hf_your-token-here >> .env
241
 
242
  # 5. Run the app
243
  python app.py
 
245
 
246
  The app will be available at `http://localhost:7860`
247
 
248
+ ## 🔧 Code Architecture
 
 
 
 
 
 
249
 
250
+ ### Key Components
 
 
 
 
251
 
252
+ **1. Dual API Client** (`backend/api_client.py`):
253
+ - Supports both HuggingFace and OpenAI APIs
254
+ - Automatic retry logic with exponential backoff
255
+ - Graceful error handling and fallbacks
256
+ - Parallel model querying for efficiency
257
 
258
+ **2. FREE Model Configuration** (`backend/config_free.py`):
259
+ - Mix of FREE HuggingFace + cheap OpenAI models
260
+ - Configurable timeouts and retries
261
+ - Easy to customize and extend
262
+
263
+ **3. Council Orchestration** (`backend/council_free.py`):
264
+ - Stage 1: Parallel response collection
265
+ - Stage 2: Peer ranking system
266
+ - Stage 3: Chairman synthesis with streaming
267
+
268
+ ### Error Handling Features
269
+ - Retry logic with exponential backoff (3 attempts)
270
+ - Graceful handling of individual model failures
271
+ - Detailed error logging for debugging
272
+ - Timeout management (60s default)
273
+
274
+ ### Benefits of Current Architecture
275
+ - **Cost Efficient**: 60% FREE models, 40% ultra-cheap
276
+ - **Robust**: Retry logic handles transient failures
277
+ - **Fast**: Parallel execution minimizes wait time
278
+ - **Flexible**: Easy to add/remove models
279
+ - **Observable**: Detailed logging for debugging
280
 
281
  ## 📊 Performance Characteristics
282
 
283
+ ### Typical Response Times (Current Setup)
284
+ - **Stage 1**: 10-30 seconds (5 models in parallel)
285
+ - **Stage 2**: 15-45 seconds (peer rankings)
286
+ - **Stage 3**: 15-40 seconds (synthesis with streaming)
287
+ - **Total**: ~40-115 seconds per question
288
+
289
+ ### Cost per Query (Current Setup)
290
+ - **FREE HuggingFace portion**: $0.00 (3 models)
291
+ - **OpenAI portion**: $0.001-0.01 (2 models)
292
+ - **Total**: ~$0.001-0.01 per query
293
 
294
+ **Comparison to alternatives**:
295
+ - 90-99% cheaper than all-paid services
296
+ - Similar quality to premium setups
297
+ - Faster than sequential execution
298
 
299
  *Costs vary based on prompt length and response complexity*
300
 
 
302
 
303
  ### Common Issues
304
 
305
+ 1. **"401 Unauthorized" errors**
306
+ - Check both API keys are set correctly
307
+ - Verify OpenAI key starts with `sk-`
308
+ - Verify HuggingFace key starts with `hf_`
309
+ - Ensure OpenAI account has billing/credits enabled
310
+ - Check Space secrets are named exactly: `OPENAI_API_KEY` and `HUGGINGFACE_API_KEY`
311
 
312
  2. **Timeout errors**
313
+ - Increase timeout in `backend/config_free.py`
 
314
  - Check network connectivity
315
+ - Some models may be slow - consider replacing
316
 
317
  3. **Space won't start**
318
+ - Verify `requirements.txt` includes all dependencies
319
  - Check logs in Space → Logs tab
320
+ - Ensure both secrets are added (not just one)
321
+ - Verify Python version compatibility (3.10+)
322
 
323
+ 4. **Some models fail, others work**
324
+ - Normal! System is designed to handle partial failures
325
+ - Check logs to see which models failed
326
+ - HuggingFace API may have rate limits (rare)
327
+ - OpenAI API requires billing setup
328
+
329
+ 5. **HuggingFace 410 error**
330
+ - Old endpoint deprecated
331
+ - Ensure using `router.huggingface.co/v1/chat/completions`
332
+ - Update `backend/api_client.py` if needed
333
 
334
  ## 🎯 Best Practices
335
 
336
  1. **Model Selection**
337
+ - Use 3-5 council members (sweet spot for quality vs speed)
338
+ - Mix FREE HuggingFace + cheap OpenAI for best value
339
+ - Choose diverse models for varied perspectives
340
  - Match chairman to task complexity
341
 
342
  2. **Cost Management**
343
+ - Start with current setup ($0.001-0.01 per query)
344
+ - Consider all-FREE HuggingFace config for $0 cost
345
+ - Monitor OpenAI usage at platform.openai.com/usage
346
+ - Set spending limits in OpenAI billing settings
347
+
348
+ 3. **Quality Optimization**
349
+ - Use more council members for important queries (5-7)
350
+ - Use better chairman (gpt-4o instead of gpt-4o-mini)
351
+ - Adjust timeouts based on model speed
352
+ - Test different model combinations
353
+
354
+ 4. **Security**
355
+ - NEVER commit .env to git (use .gitignore)
356
+ - Use HuggingFace Space secrets for production
357
+ - Rotate API keys periodically
358
+ - Monitor usage for anomalies
359
  - Set spending limits
360
 
361
  3. **Quality Optimization**
 
365
 
366
  ## 📚 Additional Resources
367
 
368
+ - [OpenAI API Documentation](https://platform.openai.com/docs)
369
+ - [HuggingFace Inference API](https://huggingface.co/docs/api-inference)
370
  - [Gradio Documentation](https://gradio.app/docs)
371
  - [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)
372
 
QUICKSTART.md CHANGED
@@ -9,11 +9,19 @@ A sophisticated multi-LLM system where multiple AI models:
9
 
10
  ## ⚡ Quick Setup (5 minutes)
11
 
12
- ### 1️⃣ Get OpenRouter API Key
13
- 1. Go to [openrouter.ai](https://openrouter.ai/)
 
 
 
 
 
 
 
 
14
  2. Sign up / Login
15
- 3. Go to Keys → Create new key
16
- 4. Copy your API key
17
 
18
  ### 2️⃣ Set Up Locally
19
 
@@ -22,10 +30,8 @@ A sophisticated multi-LLM system where multiple AI models:
22
  pip install -r requirements.txt
23
 
24
  # Create environment file
25
- cp .env.example .env
26
-
27
- # Edit .env and add your API key
28
- # OPENROUTER_API_KEY=your_key_here
29
  ```
30
 
31
  ### 3️⃣ Run It!
@@ -38,13 +44,7 @@ Visit `http://localhost:7860` 🎉
38
 
39
  ## 🌐 Deploy to Hugging Face Spaces (FREE)
40
 
41
- ### Option A: Fork Existing Space
42
- 1. Visit your HuggingFace Space
43
- 2. Click "⋮" → "Duplicate this Space"
44
- 3. Settings → Repository secrets → Add `OPENROUTER_API_KEY`
45
- 4. Done! Your space will auto-deploy
46
-
47
- ### Option B: Create New Space
48
  1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
49
  2. Choose Gradio SDK 6.0.0
50
  3. Clone and push your code:
@@ -56,65 +56,70 @@ git add .
56
  git commit -m "Initial commit"
57
  git push
58
  ```
59
- 4. Settings → Repository secrets → Add `OPENROUTER_API_KEY`
 
 
 
 
 
 
 
 
 
60
 
61
  ## 🎯 Usage Examples
62
 
63
  ### Simple Question
64
  ```
65
  Question: What is the capital of France?
66
- ⏱️ Response time: ~30 seconds
67
- 💰 Cost: ~$0.01
68
  ```
69
 
70
  ### Complex Analysis
71
  ```
72
  Question: Compare pros and cons of renewable energy
73
- ⏱️ Response time: ~90 seconds
74
- 💰 Cost: ~$0.07
75
  ```
76
 
77
- ## 🔧 Use Improved Models
78
 
79
- Replace these files to use latest stable models:
 
 
 
80
 
81
- ```bash
82
- # Backup originals
83
- mv backend/config.py backend/config_old.py
84
- mv backend/openrouter.py backend/openrouter_old.py
85
-
86
- # Use improved versions
87
- mv backend/config_improved.py backend/config.py
88
- mv backend/openrouter_improved.py backend/openrouter.py
89
- ```
90
 
91
- **Improved models:**
92
- - DeepSeek V3 (Chat & Reasoner)
93
- - Claude 3.7 Sonnet
94
- - GPT-4o
95
- - Gemini 2.0 Flash Thinking
96
- - QwQ 32B
97
 
98
  ## 📊 Monitor Usage
99
 
100
- Check your costs at: [openrouter.ai/activity](https://openrouter.ai/activity)
101
 
102
- Typical costs:
103
- - Budget Council: $0.01-0.03 per query
104
- - Balanced Council: $0.05-0.15 per query
105
- - Premium Council: $0.20-0.50 per query
 
 
106
 
107
  ## ❓ Troubleshooting
108
 
109
- **"All models failed to respond"**
110
- - ✅ Check API key in .env
111
- - ✅ Verify OpenRouter credit balance
112
- - ✅ Test API key: https://openrouter.ai/playground
 
113
 
114
  **Space won't start on HF**
115
  - ✅ Check logs in Space → Logs tab
116
- - ✅ Verify secret name is exact: `OPENROUTER_API_KEY`
117
  - ✅ Ensure requirements.txt is present
 
118
 
119
  **Slow responses**
120
  - ✅ Normal! 3 stages take 45-135 seconds
@@ -128,20 +133,25 @@ Typical costs:
128
 
129
  ## 💡 Tips
130
 
131
- 1. **Start with Budget Council** to test without spending much
132
- 2. **Use Premium Council** for important questions
133
- 3. **Monitor costs** in OpenRouter dashboard
134
- 4. **Set spending limits** to avoid surprises
135
 
136
  ## 🎨 Customization
137
 
138
- Edit `backend/config.py` to:
139
- - Change council models
140
  - Adjust chairman model
141
  - Modify timeouts
142
  - Configure retries
143
 
144
- See `DEPLOYMENT_GUIDE.md` for preset configurations!
 
 
 
 
 
145
 
146
  ---
147
 
 
9
 
10
  ## ⚡ Quick Setup (5 minutes)
11
 
12
+ ### 1️⃣ Get API Keys
13
+
14
+ **OpenAI API Key** (for GPT models):
15
+ 1. Go to [platform.openai.com](https://platform.openai.com/)
16
+ 2. Sign up / Login
17
+ 3. Go to API Keys → Create new secret key
18
+ 4. Copy your API key (starts with `sk-`)
19
+
20
+ **HuggingFace API Key** (for FREE models):
21
+ 1. Go to [huggingface.co](https://huggingface.co/)
22
  2. Sign up / Login
23
+ 3. Go to SettingsAccess Tokens → Create new token
24
+ 4. Copy your token (starts with `hf_`)
25
 
26
  ### 2️⃣ Set Up Locally
27
 
 
30
  pip install -r requirements.txt
31
 
32
  # Create environment file
33
+ echo OPENAI_API_KEY=your_openai_key_here > .env
34
+ echo HUGGINGFACE_API_KEY=your_hf_token_here >> .env
 
 
35
  ```
36
 
37
  ### 3️⃣ Run It!
 
44
 
45
  ## 🌐 Deploy to Hugging Face Spaces (FREE)
46
 
47
+ ### Step 1: Create New Space
 
 
 
 
 
 
48
  1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
49
  2. Choose Gradio SDK 6.0.0
50
  3. Clone and push your code:
 
56
  git commit -m "Initial commit"
57
  git push
58
  ```
59
+
60
+ ### Step 2: Add API Keys as Secrets
61
+ 1. Go to your Space → Settings → Repository secrets
62
+ 2. Add first secret:
63
+ - Name: `OPENAI_API_KEY`
64
+ - Value: (your OpenAI API key starting with `sk-`)
65
+ 3. Add second secret:
66
+ - Name: `HUGGINGFACE_API_KEY`
67
+ - Value: (your HuggingFace token starting with `hf_`)
68
+ 4. Space will auto-restart and deploy!
69
 
70
  ## 🎯 Usage Examples
71
 
72
  ### Simple Question
73
  ```
74
  Question: What is the capital of France?
75
+ ⏱️ Response time: ~20-40 seconds
76
+ 💰 Cost: ~$0.001-0.005 (90% cheaper with FREE HF models!)
77
  ```
78
 
79
  ### Complex Analysis
80
  ```
81
  Question: Compare pros and cons of renewable energy
82
+ ⏱️ Response time: ~60-120 seconds
83
+ 💰 Cost: ~$0.005-0.015 (3 FREE HF models + 2 cheap OpenAI models)
84
  ```
85
 
86
+ ## 🤖 Current Models
87
 
88
+ **FREE HuggingFace Models** (60% of council):
89
+ - Meta Llama 3.3 70B Instruct
90
+ - Qwen 2.5 72B Instruct
91
+ - Mistral Mixtral 8x7B Instruct
92
 
93
+ **OpenAI Models** (40% of council):
94
+ - GPT-4o-mini (very cheap)
95
+ - GPT-3.5-turbo (ultra cheap)
 
 
 
 
 
 
96
 
97
+ **Chairman**: GPT-4o-mini (final synthesis)
 
 
 
 
 
98
 
99
  ## 📊 Monitor Usage
100
 
101
+ **OpenAI Costs**: Check at [platform.openai.com/usage](https://platform.openai.com/usage)
102
 
103
+ **HuggingFace**: FREE! No monitoring needed
104
+
105
+ Typical costs per query:
106
+ - **Current Setup**: $0.001-0.01 (90-99% cheaper than alternatives!)
107
+ - 3 models are completely FREE (HuggingFace)
108
+ - Only pay for OpenAI models (GPT-4o-mini, GPT-3.5-turbo)
109
 
110
  ## ❓ Troubleshooting
111
 
112
+ **"401 Unauthorized" errors**
113
+ - ✅ Check both API keys in .env (locally) or Space secrets (HuggingFace)
114
+ - ✅ Verify OpenAI key starts with `sk-`
115
+ - ✅ Verify HuggingFace key starts with `hf_`
116
+ - ✅ Ensure OpenAI account has credits (check billing)
117
 
118
  **Space won't start on HF**
119
  - ✅ Check logs in Space → Logs tab
120
+ - ✅ Verify secret names are exact: `OPENAI_API_KEY` and `HUGGINGFACE_API_KEY`
121
  - ✅ Ensure requirements.txt is present
122
+ - ✅ Both secrets must be added (not just one)
123
 
124
  **Slow responses**
125
  - ✅ Normal! 3 stages take 45-135 seconds
 
133
 
134
  ## 💡 Tips
135
 
136
+ 1. **Already using FREE models!** - 3 out of 5 models cost nothing
137
+ 2. **Very cheap**: Only ~$0.001-0.01 per query (OpenAI portion)
138
+ 3. **Monitor OpenAI usage** at platform.openai.com/usage
139
+ 4. **Set OpenAI spending limits** in billing settings to avoid surprises
140
 
141
  ## 🎨 Customization
142
 
143
+ Edit `backend/config_free.py` to:
144
+ - Change council models (add more FREE HuggingFace models!)
145
  - Adjust chairman model
146
  - Modify timeouts
147
  - Configure retries
148
 
149
+ **FREE HuggingFace models you can add**:
150
+ - `meta-llama/Llama-3.1-405B-Instruct` (huge!)
151
+ - `mistralai/Mistral-Nemo-Instruct-2407`
152
+ - `microsoft/Phi-3.5-MoE-instruct`
153
+
154
+ See `backend/config_free.py` for examples!
155
 
156
  ---
157