Training Details

SpecForge Setup

To set up SpecForge for training Eagle3 models:

https://docs.sglang.ai/SpecForge/get_started/install.html

Dataset Preparation

This model was trained on a 1,000 sample subset of the UltraChat 200k dataset. The dataset preparation involved two main steps using specforge:

1. Prepare data using SpecForge

scripts/prepare_data.py --dataset ultrachat

2. Sample Creation

python create_sample.py --input cache/dataset/ultrachat_train.jsonl --output cache/dataset/ultrachat_1k_sample_train.jsonl --size 1000

Training Command

The model was trained using the following command:

torchrun --standalone --nproc_per_node=1 scripts/train_eagle3_online.py \
  --target-model-path Qwen/Qwen3-30B-A3B \
  --draft-model-config configs/qwen3-30B-A3B-eagle3.json \
  --train-data-path cache/dataset/ultrachat_1k_sample_train.jsonl \
  --output-dir out/qwen3-30b-a3b-eagle3-ultra-1k-sample \
  --num-epochs 1 \
  --batch-size 1 \
  --learning-rate 1e-4 \
  --max-length 1024 \
  --chat-template qwen \
  --cache-dir cache \
  --embedding-key model.embed_tokens.weight \
  --tp-size 1 \
  --ttt-length 7

Training Environment

GPU: Single H100 PCIe
Framework: SpecForge with Eagle3 architecture
Precision: bfloat16
FSDP: Enabled with NO_SHARD strategy (single GPU)

Training Results

Total Steps: 625 (1 epoch × 625 steps per epoch)
Model Size: 366MB (draft model weights only)
Training Time: ~30 minutes on H100
Base Model: Qwen/Qwen3-30B-A3B (60GB)
Draft Model: HathoraResearch/qwen3_30b_moe_eagle3-ultra-1k-sample (366MB)

Reproduction

To reproduce this training:

Install SpecForge framework (see setup section above)
Download UltraChat 200k dataset using scripts/prepare_data.py --dataset ultrachat
Create samples using the provided scripts
Run the training command above

Downloads last month: 3

Safetensors

Model size

0.2B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HathoraResearch/qwen3_30b_moe_eagle3-ultra-1k-sample

Base model

Qwen/Qwen3-30B-A3B-Base

Finetuned

Qwen/Qwen3-30B-A3B