HuggingFaceFW/fineweb
Viewer • Updated • 52.5B • 923k • 2.8k
This is a 124M parameter Causal Language Model (GPT-2 Small architecture) trained entirely from scratch using PyTorch.
It was created as a baseline for a research ablation study to investigate training dynamics, achieving a validation loss of 4.485.
⚠️ Important: Because this model was trained using a custom PyTorch class (not the standard Hugging Face GPT2LMHeadModel), you must define the model architecture in your code before loading the weights.