Training Details

SpecForge Setup

To set up SpecForge for training Eagle3 models:

https://docs.sglang.ai/SpecForge/get_started/install.html

Dataset Preparation

This model was trained on a 1,000 sample subset of the UltraChat 200k dataset. The dataset preparation involved two main steps using specforge:

1. Prepare data using SpecForge

scripts/prepare_data.py --dataset ultrachat

2. Sample Creation

python create_sample.py --input cache/dataset/ultrachat_train.jsonl --output cache/dataset/ultrachat_1k_sample_train.jsonl --size 1000

Training Command

The model was trained using the following command:

torchrun --standalone --nproc_per_node=1 scripts/train_eagle3_online.py \
  --target-model-path Qwen/Qwen3-30B-A3B \
  --draft-model-config configs/qwen3-30B-A3B-eagle3.json \
  --train-data-path cache/dataset/ultrachat_1k_sample_train.jsonl \
  --output-dir out/qwen3-30b-a3b-eagle3-ultra-1k-sample \
  --num-epochs 1 \
  --batch-size 1 \
  --learning-rate 1e-4 \
  --max-length 1024 \
  --chat-template qwen \
  --cache-dir cache \
  --embedding-key model.embed_tokens.weight \
  --tp-size 1 \
  --ttt-length 7

Training Environment

  • GPU: Single H100 PCIe
  • Framework: SpecForge with Eagle3 architecture
  • Precision: bfloat16
  • FSDP: Enabled with NO_SHARD strategy (single GPU)

Training Results

  • Total Steps: 625 (1 epoch × 625 steps per epoch)
  • Model Size: 366MB (draft model weights only)
  • Training Time: ~30 minutes on H100
  • Base Model: Qwen/Qwen3-30B-A3B (60GB)
  • Draft Model: HathoraResearch/qwen3_30b_moe_eagle3-ultra-1k-sample (366MB)

Reproduction

To reproduce this training:

  1. Install SpecForge framework (see setup section above)
  2. Download UltraChat 200k dataset using scripts/prepare_data.py --dataset ultrachat
  3. Create samples using the provided scripts
  4. Run the training command above
Downloads last month
3
Safetensors
Model size
0.2B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HathoraResearch/qwen3_30b_moe_eagle3-ultra-1k-sample

Finetuned
Qwen/Qwen3-30B-A3B
Finetuned
(31)
this model

Dataset used to train HathoraResearch/qwen3_30b_moe_eagle3-ultra-1k-sample