RL-Homework

This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint.

Files

model_base.pth: base model checkpoint exported in the homework's LLaMA-like single-file format
model_sft.pth: supervised fine-tuned checkpoint trained further on the GSM8K training set
params.json: model architecture parameters for the homework loader

Model Info

Architecture from params.json:

dimension: 1024
feed-forward dimension: 4096
heads: 16
layers: 8
max sequence length: 1024
vocabulary size: 50432

Training Summary

Base model

model_base.pth is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format.

SFT model

model_sft.pth starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage.

Intended Use

homework reproduction
educational experiments
small-scale reasoning and RL homework pipelines

Limitations

these are homework checkpoints, not production models
outputs may still be repetitive or incorrect
GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning

Downloads last month: 39

zkolter
/

RL-Homework