Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

ZeroGPU-LLM-Inference

165 kB

1 contributor

History: 83 commits

Alikestocode's picture

Add GPU estimator, DDG search, and cancel support

4ce42e8 27 days ago

.dockerignore

104 Bytes

Add Google Cloud Platform deployment configurations about 1 month ago
.gitattributes

1.52 kB

Initial commit: ZeroGPU LLM Inference Space about 1 month ago
.gitignore

27 Bytes

Add .gitignore and remove cache files about 1 month ago
DEPLOYMENT_STATUS.md

2.21 kB

Add deployment status document after re-authentication 30 days ago
Dockerfile

1.02 kB

Fix delete_revisions import with fallback cache cleanup 29 days ago
FIX_PERMISSIONS.md

2.05 kB

Add permission fix guide for spherical-gate-477614-q7 project about 1 month ago
LLM_COMPRESSOR_FEATURES.md

6.24 kB

Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization 29 days ago
MANUAL_DEPLOY.md

1.59 kB

Fix delete_revisions import with fallback cache cleanup 29 days ago
QUANTIZE_AWQ.md

3.21 kB

Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization 29 days ago
README.md

4.23 kB

Implement vLLM with LLM Compressor and performance optimizations about 1 month ago
app.py

56.9 kB

Add GPU estimator, DDG search, and cancel support 27 days ago
apt.txt

11 Bytes

Initial commit: ZeroGPU LLM Inference Space about 1 month ago
cloudbuild.yaml

1.36 kB

Add Cloud Build deployment script and permission setup helper about 1 month ago
deploy-cloud-build.sh

3.31 kB

Add Cloud Build deployment script and permission setup helper about 1 month ago
deploy-compute-engine.sh

4.23 kB

Add Google Cloud Platform deployment configurations about 1 month ago
deploy-gcp.sh

2.67 kB

Add Google Cloud Platform deployment configurations about 1 month ago
gcp-deployment.md

5.32 kB

Add Google Cloud Platform deployment configurations about 1 month ago
quantize_to_awq_colab.ipynb

32.9 kB

Lower Gemma AWQ group size to 16 28 days ago
requirements.txt

397 Bytes

Clarify LLM Compressor optional status - vLLM has native AWQ support 29 days ago
setup-gcp-permissions.sh

1.8 kB

Add Cloud Build deployment script and permission setup helper about 1 month ago
style.css

2.84 kB

Initial commit: ZeroGPU LLM Inference Space about 1 month ago
test_api.py

3.43 kB

Migrate to AWQ quantization with FlashAttention-2 about 1 month ago
test_api_gradio_client.py

7.2 kB

Implement vLLM with LLM Compressor and performance optimizations about 1 month ago
test_awq_models.py

3.12 kB

Add test scripts for AWQ models on ZeroGPU Space 28 days ago
test_quantization_notebook.py

9.84 kB

Update Qwen model to use AWQ quantized version 28 days ago
test_space_awq.sh

1.93 kB

Add test scripts for AWQ models on ZeroGPU Space 28 days ago
test_space_simple.py

3.49 kB

Add test scripts for AWQ models on ZeroGPU Space 28 days ago
test_space_simple.sh

1.68 kB

Fix delete_revisions import with fallback cache cleanup 29 days ago