Introduction
Use https://github.com/im0qianqian/llama.cpp to quantize.
For model inference, please download our release package from this url https://github.com/im0qianqian/llama.cpp/releases .
Quick start
# Use a local model file
llama-cli -m my_model.gguf
# Launch OpenAI-compatible API server
llama-server -m my_model.gguf
Demo
PR
Let's look forward to the following PR being merged:
- Downloads last month
- 63
Hardware compatibility
Log In to add your hardware
2-bit
4-bit
6-bit
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for inclusionAI/Ling-flash-2.0-GGUF
Base model
inclusionAI/Ling-flash-base-2.0 Finetuned
inclusionAI/Ling-flash-2.0
