Support this work → · X · GitHub · REAP paper · Cerebras REAP

Qwen3.5-264B

REAP-pruned Qwen/Qwen3.5-397B-A17B.

At a glance

Base model Qwen/Qwen3.5-397B-A17B
Format BF16
Total params 264B
Active / token
Experts / layer 336
Layers 60
Hidden size 4096
Context 262,144
On-disk size 527 GB

Which variant should I pick?

Variant Format Link
Qwen3.5-264B (this) BF16 link
Qwen3.5-264B-FP8 FP8 link
Qwen3.5-264B-W4A16 W4A16 link
Qwen3.5-28B BF16 link
Qwen3.5-35B-EXL3-4bpw EXL3-4bpw link
Qwen3.5-76B BF16 link
Qwen3.5-76B-GGUF GGUF link
Qwen3.5-88B BF16 link
Qwen3.5-99B BF16 link
Qwen3.5-99B-GGUF GGUF link
  • Repository: 0xSero/Qwen3.5-264B
  • Base model: Qwen/Qwen3.5-397B-A17B
  • Artifact kind: pruned
  • Compression ratio: 34%
  • Prune metric: reap

Details

  • Maintainer: 0xSero
  • Organization: Sybil Solutions
  • Project: REAP PR17
  • Hub owner: 0xSero
  • Summary: BF16 REAP-pruned Qwen3.5-397B-A17B with 176 of 512 experts removed per MoE layer, retaining 336 experts per layer, for an estimated 264B total parameters.

Provenance

  • Observer state: /home/ubuntu/qwen397-full/observer-calibv1/qwen397-pr17-calibv1-23k-16k-observer-state.raw.pt
  • Detail state: /home/ubuntu/qwen397-full/observer-calibv1/qwen397-pr17-calibv1-23k-16k-detail-state.raw.pt

Benchmarks

No benchmark summary was found.

Custom Stress

No custom stress summary was found.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("0xSero/Qwen3.5-264B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("0xSero/Qwen3.5-264B", trust_remote_code=True)

License & citation

License inherited from the base model.

@misc{lasby2025reap,
  title  = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
  author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
  year   = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}

Sponsors

Made possible by NVIDIA · TNG Technology · Lambda · Prime Intellect · Hot Aisle.

Downloads last month
24
Safetensors
Model size
263B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0xSero/Qwen3.5-264B

Finetuned
(30)
this model
Quantizations
1 model

Datasets used to train 0xSero/Qwen3.5-264B

Space using 0xSero/Qwen3.5-264B 1

Collection including 0xSero/Qwen3.5-264B

Paper for 0xSero/Qwen3.5-264B