Models in CI - a nm-testing Collection

nm-testing 's Collections

KV Cache Quantization

FP8-Block Quantized Models

LLM Compressor testing

Speculators testing

Sparse-Llama-3.1-8B-2of4

Models in CI

updated about 8 hours ago

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-FP8-Channelwise-compressed-tensors

Text Generation • 8B • Updated Oct 9, 2024 • 7 • 2
nm-testing/Meta-Llama-3-8B-Instruct-FBGEMM-nonuniform

Text Generation • 8B • Updated Jul 20, 2024 • 7
nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test

Text Generation • 8B • Updated Oct 9, 2024 • 16.3k
nm-testing/Meta-Llama-3-8B-Instruct-W8-Channel-A8-Dynamic-Asym-Per-Token-Test

8B • Updated Oct 9, 2024 • 6.91k • 1
nm-testing/Meta-Llama-3-8B-Instruct-W8-Channel-A8-Dynamic-Per-Token-Test

Text Generation • 8B • Updated Oct 9, 2024 • 14
nm-testing/Meta-Llama-3-8B-Instruct-nonuniform-test

Text Generation • 8B • Updated Oct 9, 2024 • 24.8k
nm-testing/Meta-Llama-3-70B-Instruct-FBGEMM-nonuniform

Text Generation • 71B • Updated Jul 20, 2024 • 814 • 1
nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16

14B • Updated Feb 24, 2025 • 112k • 1
nm-testing/Qwen2-1.5B-Instruct-FP8W8

Text Generation • 2B • Updated Oct 9, 2024 • 13
nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-chnl_wts_per_tok_dyn_act_fp8-BitM

5B • Updated Dec 17, 2024 • 3
nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change

Text Generation • 1B • Updated Oct 9, 2024 • 64k
nm-testing/pixtral-12b-FP8-dynamic

Image-Text-to-Text • Updated Apr 11, 2025 • 178 • 1
RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic

Image-Text-to-Text • 24B • Updated Oct 29, 2025 • 4.98k • 9
nm-testing/Llama-3.2-1B-Instruct-FP8-KV

1B • Updated Nov 1, 2024 • 13k
nm-testing/tinyllama-oneshot-w8a8-channel-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 21.5k
nm-testing/tinyllama-oneshot-w8-channel-a8-tensor

Text Generation • 1B • Updated Oct 9, 2024 • 864
RedHatAI/Llama-3.2-1B-quantized.w8a8

1B • Updated Jan 16, 2025 • 61.6k • 1
nm-testing/tinyllama-oneshot-w8a8-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 16.4k
nm-testing/asym-w8w8-int8-static-per-tensor-tiny-llama

1B • Updated Oct 9, 2024 • 8.51k
nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Static-Per-Tensor-Sym

8B • Updated Dec 10, 2024 • 22
nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Static-Per-Tensor-Asym

8B • Updated Dec 11, 2024 • 23
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-chnl_wts_per_tok_dyn_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 12
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-chnl_wts_tensor_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 9
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-tensor_wts_per_tok_dyn_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 11
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-tensor_wts_tensor_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 9
nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Channel-Weight-testing

1B • Updated Dec 8, 2024 • 9
nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Static-testing

1B • Updated Dec 8, 2024 • 8
nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Tensor-Weight-testing

1B • Updated Dec 8, 2024 • 8
nm-testing/TinyLlama-1.1B-Chat-v1.0-2of4-Sparse-Dense-Compressor

1B • Updated Dec 8, 2024 • 11
nm-testing/llama2.c-stories42M-pruned2.4-compressed

48.6M • Updated Jan 22, 2025 • 9
nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4

0.7B • Updated about 8 hours ago • 25.8k
nm-testing/Llama-3.2-1B-Instruct-spinquantR1R2R4-w4a16

0.7B • Updated Aug 22, 2025 • 7.92k
nm-testing/Llama-3.2-1B-Instruct-quip-w4a16

0.8B • Updated Sep 12, 2025 • 7.9k
nm-testing/tinyllama-oneshot-w4a16-channel-v2

Text Generation • 0.3B • Updated Oct 9, 2024 • 20.2k • 1
nm-testing/test-w4a16-mixtral-actorder-group

6B • Updated Dec 26, 2024 • 1.48k
nm-testing/TinyLlama-1.1B-Chat-v1.0-kvcache-fp8-attn_head

1B • Updated Jan 14 • 80
nm-testing/TinyLlama-1.1B-Chat-v1.0-kvcache-fp8-tensor

1B • Updated Jan 14 • 7.91k
nm-testing/Qwen3-30B-A3B-MXFP4A16

17B • Updated Feb 17 • 10.8k
nm-testing/Qwen3-0.6B-MXFP8

0.8B • Updated Mar 19 • 2
nm-testing/TinyLlama-1.1B-Chat-v1.0-MXFP8

1B • Updated Mar 19 • 3
nm-testing/dflash-qwen3-8b-speculators

2B • Updated 27 days ago • 14.2k
nm-testing/TinyLlama-1.1B-Chat-v1.0-MXFP4

0.6B • Updated 7 days ago • 105
nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4A16

0.7B • Updated about 8 hours ago