Training Details
SpecForge Setup
To set up SpecForge for training Eagle3 models:
https://docs.sglang.ai/SpecForge/get_started/install.html
Dataset Preparation
This model was trained on a 1,000 sample subset of the UltraChat 200k dataset. The dataset preparation involved two main steps using specforge:
1. Prepare data using SpecForge
scripts/prepare_data.py --dataset ultrachat
2. Sample Creation
python create_sample.py --input cache/dataset/ultrachat_train.jsonl --output cache/dataset/ultrachat_1k_sample_train.jsonl --size 1000
Training Command
The model was trained using the following command:
torchrun --standalone --nproc_per_node=1 scripts/train_eagle3_online.py \
--target-model-path Qwen/Qwen3-30B-A3B \
--draft-model-config configs/qwen3-30B-A3B-eagle3.json \
--train-data-path cache/dataset/ultrachat_1k_sample_train.jsonl \
--output-dir out/qwen3-30b-a3b-eagle3-ultra-1k-sample \
--num-epochs 1 \
--batch-size 1 \
--learning-rate 1e-4 \
--max-length 1024 \
--chat-template qwen \
--cache-dir cache \
--embedding-key model.embed_tokens.weight \
--tp-size 1 \
--ttt-length 7
Training Environment
- GPU: Single H100 PCIe
- Framework: SpecForge with Eagle3 architecture
- Precision: bfloat16
- FSDP: Enabled with NO_SHARD strategy (single GPU)
Training Results
- Total Steps: 625 (1 epoch × 625 steps per epoch)
- Model Size: 366MB (draft model weights only)
- Training Time: ~30 minutes on H100
- Base Model: Qwen/Qwen3-30B-A3B (60GB)
- Draft Model: HathoraResearch/qwen3_30b_moe_eagle3-ultra-1k-sample (366MB)
Reproduction
To reproduce this training:
- Install SpecForge framework (see setup section above)
- Download UltraChat 200k dataset using
scripts/prepare_data.py --dataset ultrachat - Create samples using the provided scripts
- Run the training command above
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support