Speech Emotion Classification β ONNX (INT8)
ONNX INT8-quantized version of prithivMLmods/Speech-Emotion-Classification for on-device inference in macOS apps via ONNX Runtime C API.
Model Details
- Architecture: Wav2Vec2ForSequenceClassification (facebook/wav2vec2-base-960h fine-tuned)
- Format: ONNX INT8 quantized
- Size: ~91 MB (INT8), ~361 MB (FP32)
- Input: Raw audio waveform (16kHz, mono), shape
[1, num_samples] - Output: 8-class emotion logits
Emotion Labels
| ID | Label | Full |
|---|---|---|
| 0 | ANG | Anger |
| 1 | CAL | Calm |
| 2 | DIS | Disgust |
| 3 | FEA | Fear |
| 4 | HAP | Happy |
| 5 | NEU | Neutral |
| 6 | SAD | Sad |
| 7 | SUR | Surprised |
Usage
On-device real-time speech emotion classification. Inference via ONNX Runtime C API.
// Swift β load and run via OnnxRuntimeWrapper
let wrapper = OnnxRuntimeWrapper()
try wrapper.load(modelPath: "model_int8.onnx")
let logits = try wrapper.run(inputName: "input_values", inputData: audioBuffer, inputShape: [1, Int64(audioBuffer.count)])
let emotionIdx = logits.firstIndex(of: logits.max()!)!
Files
model_int8.onnxβ INT8 quantized model (recommended for on-device use)model.onnxβ FP32 full precision modelconfig.jsonβ Model configuration with label mappings
Attribution
Original model by prithivMLmods. Converted to ONNX by onnx-community. INT8 quantization and packaging for macOS by @smkrv.
Model tree for smkrv/speech-emotion-classification-onnx
Base model
facebook/wav2vec2-base-960h