RL-Homework

This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint.

Files

  • model_base.pth: base model checkpoint exported in the homework's LLaMA-like single-file format
  • model_sft.pth: supervised fine-tuned checkpoint trained further on the GSM8K training set
  • params.json: model architecture parameters for the homework loader

Model Info

Architecture from params.json:

  • dimension: 1024
  • feed-forward dimension: 4096
  • heads: 16
  • layers: 8
  • max sequence length: 1024
  • vocabulary size: 50432

Training Summary

Base model

model_base.pth is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format.

SFT model

model_sft.pth starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage.

Intended Use

  • homework reproduction
  • educational experiments
  • small-scale reasoning and RL homework pipelines

Limitations

  • these are homework checkpoints, not production models
  • outputs may still be repetitive or incorrect
  • GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning
Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train zkolter/RL-Homework