JuliaGPT
An experimental 8,096 parameter character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek scriptio continua. No external ML framework dependencies.
Model Lineage
| Model | Params | Vocab | Context | Val Loss | Notes |
|---|---|---|---|---|---|
| MicroJulia | 4,992 | 27 chars | 64 | 2.43 | First proof-of-concept |
| JuliaGPT | 8,096 | 29 chars | 256 | 2.34 | Expanded context + vocab |
| JuliaGPT-v2 | ~10M | 38 chars | 256 | 2.91 | Scaled-up char-level |
Architecture
| Parameter | Value |
|---|---|
| Architecture | 1-layer Transformer (pure Julia, scalar autograd) |
| Parameters | 8,096 |
| Embedding dim | 16 |
| Layers | 1 |
| Attention heads | 4 |
| Head dim | 4 |
| FFN hidden dim | 64 |
| Context length | 256 characters |
| Vocabulary | 29 characters (a-z, space, period, + BOS) |
Vocabulary
29 tokens: .abcdefghijklmnopqrstuvwxyz + BOS
Numerals converted to words, all punctuation removed except period.
Training
| Value | |
|---|---|
| Dataset | Aristotle's Rhetoric + Euclid's Elements (8,461 chunks) |
| Best val loss | 2.34 |
| Framework | Pure Julia (scalar autograd, no Flux/Lux) |
Files
| File | Description |
|---|---|
best_model.json |
Original model weights + optimizer state (JSON format, scalar autograd) |
vocab.json |
38-character vocabulary array |
data/aristotle_rhetoric.txt |
Training data |
Note: The .jld2 checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to JuliaGPT-v2. The original JuliaGPT is preserved in best_model.json.
Inference Settings
| Parameter | Value |
|---|---|
| vocab_size | 29 |
| context_length | 256 |
Provenance
- Author: LisaMegaWatts
- Source code: DavinciDreams/JuliaGPT
- Training data: LisaMegaWatts/juliagpt-data
License
MIT
Dataset used to train LisaMegaWatts/JuliaGPT
Spaces using LisaMegaWatts/JuliaGPT 2
Evaluation results
- Val Loss on juliagpt-dataself-reported2.340