JuliaGPT

An experimental 8,096 parameter character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek scriptio continua. No external ML framework dependencies.

Model Lineage

Model	Params	Vocab	Context	Val Loss	Notes
MicroJulia	4,992	27 chars	64	2.43	First proof-of-concept
JuliaGPT	8,096	29 chars	256	2.34	Expanded context + vocab
JuliaGPT-v2	~10M	38 chars	256	2.91	Scaled-up char-level

Architecture

Parameter	Value
Architecture	1-layer Transformer (pure Julia, scalar autograd)
Parameters	8,096
Embedding dim	16
Layers	1
Attention heads	4
Head dim	4
FFN hidden dim	64
Context length	256 characters
Vocabulary	29 characters (a-z, space, period, + BOS)

Vocabulary

29 tokens: .abcdefghijklmnopqrstuvwxyz + BOS

Numerals converted to words, all punctuation removed except period.

Training

	Value
Dataset	Aristotle's Rhetoric + Euclid's Elements (8,461 chunks)
Best val loss	2.34
Framework	Pure Julia (scalar autograd, no Flux/Lux)

Files

File	Description
`best_model.json`	Original model weights + optimizer state (JSON format, scalar autograd)
`vocab.json`	38-character vocabulary array
`data/aristotle_rhetoric.txt`	Training data

Note: The .jld2 checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to JuliaGPT-v2. The original JuliaGPT is preserved in best_model.json.

Inference Settings

Parameter	Value
vocab_size	29
context_length	256

Provenance

Author: LisaMegaWatts
Source code: DavinciDreams/JuliaGPT
Training data: LisaMegaWatts/juliagpt-data

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train LisaMegaWatts/JuliaGPT

Spaces using LisaMegaWatts/JuliaGPT 2

Evaluation results

Val Loss on juliagpt-data
self-reported

2.340