Quantization was performed using exllama3 v0.0.15.

Quant Size (GB) KL-div (quant, orig) KL-div (orig, quant) Perplexity Top-K K=1 Top-K K=2 Top-K K=3 Top-K K=4 Top-K K=5
2.0bpw 57 0.41617405 0.63545585 5.76802263 0.7706 0.4538 0.2223 0.0960 0.0410
3.0bpw 84 0.14266876 0.19399434 4.36054284 0.8777 0.6373 0.4059 0.2369 0.1316
4.0bpw 111 0.02823074 0.03066939 3.85603126 0.9442 0.8062 0.6315 0.4630 0.3231
5.0bpw 138 0.00948993 0.00957515 3.75920156 0.9666 0.8749 0.7485 0.6102 0.4783
5.5bpw 151 0.00514412 0.00524848 3.75774266 0.9753 0.9075 0.8081 0.6913 0.5751
6.0bpw 165 0.00399840 0.00401824 3.75147423 0.9774 0.9144 0.8214 0.7118 0.6006
original 437 - - 3.73952143 - - - - -

Metrics explanation

  • KL-divergence: Measures the difference between probability distributions of quantized and original models. Lower is better (closer to original).
  • Perplexity: Indicates how well the model predicts the next token. Lower values mean better prediction quality.
  • Top-K agreement: Shows how often the quantized model's top-K predictions exactly match the original model's predictions. Higher values indicate better preservation of the original model's behavior (1.0 = perfect match):
    • K=1: The fraction of times both models predict the exact same token as their top-1 choice
    • K=5: The fraction of times both models have the same 5 tokens in their top-5 predictions in the exact same ranking order

Tool Calls Support for Qwen/GLM Models

The official tabbyAPI doesn't support tool calls for Qwen and GLM models yet.

If you're using Qwen-Code, OpenClaw, or similar software that need tool call support, you can use my fork with the tools-support branch:

Clone directly:

git clone -b tools-support https://github.com/NeuroSenko/tabbyAPI

Or add to existing tabbyAPI installation:

git remote add neurosenko https://github.com/NeuroSenko/tabbyAPI
git fetch neurosenko
git checkout -b tools-support neurosenko/tools-support

This branch includes native tool calling support for Qwen and GLM model families.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeuroSenko/Baichuan-M3-235B-exl3

Quantized
(7)
this model