Commit History

Fix QuantizationConfig: use config_groups with BaseQuantizationConfig
ecf6a69

Alikestocode commited on

Fix AWQModifier: use quantization_config with num_bits
022b2da

Alikestocode commited on

Add note about restarting kernel if AWQModifier errors occur
33a1d2e

Alikestocode commited on

Simplify AWQModifier usage - remove try/except wrapper
e08f8c4

Alikestocode commited on

Fix AWQModifier parameters - use default configuration
cef8ecd

Alikestocode commited on

Fix delete_revisions import with fallback cache cleanup
7a2a590

Alikestocode commited on

Fix delete_revisions import - use fallback cache cleanup method
4be72e0

Alikestocode commited on

Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization
f0033ab

Alikestocode commited on

Fix LLM Compressor package name: llmcompressor (no hyphen)
2326498

Alikestocode commited on

Remove duplicate LLM Compressor section - now primary method
d4bc333

Alikestocode commited on

Replace AutoAWQ with LLM Compressor (vLLM native) in Colab notebook
ae07f77

Alikestocode commited on

Add advanced vLLM and LLM Compressor optimizations
808203f

Alikestocode commited on

Add disk space cleanup after quantization in Colab notebook
24107f3

Alikestocode commited on

Fix linter error: use %pip instead of !pip in Colab notebook
2dff966

Alikestocode commited on

Add Colab notebook for AWQ quantization of router models
a79bc8f

Alikestocode commited on

Clarify LLM Compressor optional status - vLLM has native AWQ support
b2bf767

Alikestocode commited on

Fix vLLM device detection for ZeroGPU
2ddfeca

Alikestocode commited on

Fix vLLM token parameter and improve streaming error handling
b4fd5e9

Alikestocode commited on

Add debug logging for model loading and generation issues
54880b1

Alikestocode commited on

Fix streaming loop break condition - only break when finished is True
d6f9002

Alikestocode commited on

Add deployment status document after re-authentication
1fb66ec

Alikestocode commited on

Add permission fix guide for spherical-gate-477614-q7 project
162c75a

Alikestocode commited on

Add Cloud Build deployment script and permission setup helper
fd26b3d

Alikestocode commited on

Add Cloud Run PORT environment variable support
1b04006

Alikestocode commited on

Add Google Cloud Platform deployment configurations
aa65d00

Alikestocode commited on

Fix Gradio UI structure and add comprehensive fallback logging
03689e3

Alikestocode commited on

Fix all indentation errors in Gradio UI components
06aef1b

Alikestocode commited on

Fix syntax error: correct indentation in BitsAndBytes fallback block
f43bdac

Alikestocode commited on

Suppress AutoAWQ deprecation warnings and improve vLLM logging
83a232d

Alikestocode commited on

Implement vLLM with LLM Compressor and performance optimizations
a79facb

Alikestocode commited on

Migrate to AWQ quantization with FlashAttention-2
06b4cf5

Alikestocode commited on

Fix: Pre-create GPU wrappers at module load time for startup detection
cdac920

Alikestocode commited on

Make GPU duration slider functional with dynamic wrapper creation
fc0ab14

Alikestocode commited on

Fix indentation errors in _generate_router_plan_streaming_internal
c454e43

Alikestocode commited on

Fix: Remove context manager usage for spaces.GPU decorator
a217627

Alikestocode commited on

Add user-configurable GPU duration slider (60-1800 seconds)
9a4d6d3

Alikestocode commited on

Fix: Move trim_at_stop_sequences function before it's used
597f1a9

Alikestocode commited on

Add Gradio client API test script
de18e95

Alikestocode commited on

Fix API launch configuration
9773e4b

Alikestocode commited on

Enable API in Gradio launch configuration
1b16b00

Alikestocode commited on

Update README and clean up old files
9592189

Alikestocode commited on

Improve streaming with incremental JSON parsing and plan end token
f5a609d

Alikestocode commited on

Add streaming support and increase max tokens to 20000
4f65341

Alikestocode commited on

Fix deprecation warnings and improve error handling
bf2fdae

Alikestocode commited on

Update app.py and requirements.txt for CourseGPT-Pro router models
4c3d05b

Alikestocode commited on

Update README: Focus on CourseGPT-Pro router checkpoints
4706b45

Alikestocode commited on

Update README with correct space URL
9af4b77

Alikestocode commited on

Add .gitignore and remove cache files
7bc8a45

Alikestocode commited on

Initial commit: ZeroGPU LLM Inference Space
f91e906

Alikestocode commited on