nano-gpt2-fp128

A nano GPT-2 style causal language model trained on TinyStories with double-double (~FP128) arithmetic in the forward pass.

Architecture

Hyper-parameter Value
Embedding dim 32
Attention heads 2
Transformer layers 2
Context window 64
Vocabulary 82 (char-level, TinyStories)
Parameters 32,768

Precision

Stage Precision
Weight storage float64
Forward matmuls ~106-bit (double-double / Veltkamp split)
Backward pass float64
Equivalent to IEEE binary128 (113-bit) within 7 bits
Downloads last month
20
Safetensors
Model size
41k params
Tensor type
F64
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train FlameF0X/NanoGPT2-FP128