nano-gpt2-fp128
A nano GPT-2 style causal language model trained on TinyStories with double-double (~FP128) arithmetic in the forward pass.
Architecture
| Hyper-parameter | Value |
|---|---|
| Embedding dim | 32 |
| Attention heads | 2 |
| Transformer layers | 2 |
| Context window | 64 |
| Vocabulary | 82 (char-level, TinyStories) |
| Parameters | 32,768 |
Precision
| Stage | Precision |
|---|---|
| Weight storage | float64 |
| Forward matmuls | ~106-bit (double-double / Veltkamp split) |
| Backward pass | float64 |
| Equivalent to | IEEE binary128 (113-bit) within 7 bits |
- Downloads last month
- 20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support