JuliaGPT

An experimental 8,096 parameter character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek scriptio continua. No external ML framework dependencies.

Model Lineage

Model Params Vocab Context Val Loss Notes
MicroJulia 4,992 27 chars 64 2.43 First proof-of-concept
JuliaGPT 8,096 29 chars 256 2.34 Expanded context + vocab
JuliaGPT-v2 ~10M 38 chars 256 2.91 Scaled-up char-level

Architecture

Parameter Value
Architecture 1-layer Transformer (pure Julia, scalar autograd)
Parameters 8,096
Embedding dim 16
Layers 1
Attention heads 4
Head dim 4
FFN hidden dim 64
Context length 256 characters
Vocabulary 29 characters (a-z, space, period, + BOS)

Vocabulary

29 tokens: .abcdefghijklmnopqrstuvwxyz + BOS

Numerals converted to words, all punctuation removed except period.

Training

Value
Dataset Aristotle's Rhetoric + Euclid's Elements (8,461 chunks)
Best val loss 2.34
Framework Pure Julia (scalar autograd, no Flux/Lux)

Files

File Description
best_model.json Original model weights + optimizer state (JSON format, scalar autograd)
vocab.json 38-character vocabulary array
data/aristotle_rhetoric.txt Training data

Note: The .jld2 checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to JuliaGPT-v2. The original JuliaGPT is preserved in best_model.json.

Inference Settings

Parameter Value
vocab_size 29
context_length 256

Provenance

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train LisaMegaWatts/JuliaGPT

Spaces using LisaMegaWatts/JuliaGPT 2

Evaluation results