Kinyarwanda Whisper Evaluation

This repository evaluates the Whisper model performance on Kinyarwanda, as described in the paper How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu.

Model Description

The development of Automatic Speech Recognition (ASR) systems for low-resource African languages remains challenging due to limited transcribed speech data. This work addresses fundamental concerns for practitioners, evaluating Whisper's performance through comprehensive experiments on Kinyarwanda. Systematic data scaling analysis on Kinyarwanda, using training sets from 1 to 1,400 hours, demonstrated that practical ASR performance (WER < 13%) becomes achievable with as little as 50 hours of training data, with substantial improvements continuing through 200 hours (WER < 10%).

For more details on the evaluation, training, and related models, visit the GitHub repository.

Training Configs

The following models were used in the Kinyarwanda Whisper evaluation, trained with different data volumes. Explore the full collection: 👉 https://huggingface.co/collections/Sunbird/kinyarwanda-hackathon-68872541c41c5d166d9bffad

Config	Hours	Model ID on Hugging Face
`baseline.yaml`	0	openai/whisper-large-v3
`train_1h.yaml`	1	akera/whisper-large-v3-kin-1h-v2
`train_50h.yaml`	50	akera/whisper-large-v3-kin-50h-v2
`train_100h.yaml`	100	akera/whisper-large-v3-kin-100h-v2
`train_150h.yaml`	150	akera/whisper-large-v3-kin-150h-v2
`train_200h.yaml`	200	akera/whisper-large-v3-kin-200h-v2
`train_500h.yaml`	500	akera/whisper-large-v3-kin-500h-v2
`train_1000h.yaml`	1000	akera/whisper-large-v3-kin-1000h-v2
`train_full.yaml`	~1400	akera/whisper-large-v3-kin-full

Evaluation Results

Evaluation on dev_test[:300] subset:

Model	Hours	WER (%)	CER (%)	Score
`openai/whisper-large-v3`	0	33.10	9.80	0.861
`akera/whisper-large-v3-kin-1h-v2`	1	47.63	16.97	0.754
`akera/whisper-large-v3-kin-50h-v2`	50	12.51	3.31	0.932
`akera/whisper-large-v3-kin-100h-v2`	100	10.90	2.84	0.943
`akera/whisper-large-v3-kin-150h-v2`	150	10.21	2.64	0.948
`akera/whisper-large-v3-kin-200h-v2`	200	9.82	2.56	0.951
`akera/whisper-large-v3-kin-500h-v2`	500	8.24	2.15	0.963
`akera/whisper-large-v3-kin-1000h-v2`	1000	7.65	1.98	0.967
`akera/whisper-large-v3-kin-full`	~1400	7.14	1.88	0.970

Score = 1 - (0.6 × CER + 0.4 × WER)

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

F32

Paper for akera/whisper-large-v3-kin-100h-v2

How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu

Paper • 2510.07221 • Published Oct 8, 2025