community-datasets/swahili_news
Viewer • Updated • 29.5k • 242 • 7
A character-level GPT language model trained on Swahili news articles.
# Clone the repo and run locally
git clone https://github.com/RamadhanAdam/swahili-gpt
cd swahili-gpt
pip install -r requirements.txt
# Download checkpoint from GitHub Releases
# Then generate interactively
python generate.py --prompt "Rais wa Tanzania" --tokens 400
Trained for 5,000 steps using AdamW (lr=3e-4), dropout=0.2 on a Google Colab T4 GPU.
| Step | Train Loss | Val Loss |
|---|---|---|
| 0 | 6.2798 | 6.2793 |
| 2500 | 1.2566 | 1.2548 |
| 4500 | 1.1350 | 1.1387 |
Trained on the swahili_news dataset — 29,544 news articles from Tanzanian online platforms. License: CC BY 4.0.
Full code, training notebook, and results: github.com/RamadhanAdam/swahili-gpt
Inspired by Andrej Karpathy's work on character-level language models.