Fix QuantizationConfig: use config_groups with BaseQuantizationConfig ecf6a69 Alikestocode commited on Nov 10
Fix delete_revisions import - use fallback cache cleanup method 4be72e0 Alikestocode commited on Nov 10
Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization f0033ab Alikestocode commited on Nov 10
Replace AutoAWQ with LLM Compressor (vLLM native) in Colab notebook ae07f77 Alikestocode commited on Nov 10
Clarify LLM Compressor optional status - vLLM has native AWQ support b2bf767 Alikestocode commited on Nov 10
Fix vLLM token parameter and improve streaming error handling b4fd5e9 Alikestocode commited on Nov 10
Fix streaming loop break condition - only break when finished is True d6f9002 Alikestocode commited on Nov 9
Fix Gradio UI structure and add comprehensive fallback logging 03689e3 Alikestocode commited on Nov 8
Fix syntax error: correct indentation in BitsAndBytes fallback block f43bdac Alikestocode commited on Nov 8
Suppress AutoAWQ deprecation warnings and improve vLLM logging 83a232d Alikestocode commited on Nov 8
Implement vLLM with LLM Compressor and performance optimizations a79facb Alikestocode commited on Nov 8
Fix: Pre-create GPU wrappers at module load time for startup detection cdac920 Alikestocode commited on Nov 8
Make GPU duration slider functional with dynamic wrapper creation fc0ab14 Alikestocode commited on Nov 8
Fix indentation errors in _generate_router_plan_streaming_internal c454e43 Alikestocode commited on Nov 8
Improve streaming with incremental JSON parsing and plan end token f5a609d Alikestocode commited on Nov 7
Update app.py and requirements.txt for CourseGPT-Pro router models 4c3d05b Alikestocode commited on Nov 7