Manglish NLP Sentiment Model (v3.2.0)

XLM-Roberta multi-task model fine-tuned on 14,384 labeled Manglish examples for sentiment, emotion, and intent classification.

Performance

Task	Accuracy	Labels
Sentiment	95.0%	positive, negative, neutral
Emotion	90.3%	happy, sad, angry, fear, surprise, disgust, love, neutral
Intent	97.5%	question, statement, request, complaint, greeting, opinion
Average	94.3%	-

Comparison with v3.1.0

Task	v3.1.0 (distilbert)	v3.2.0 (xlm-roberta)	Improvement
Sentiment	88.5%	95.0%	+6.5%
Emotion	83.6%	90.3%	+6.7%
Intent	94.5%	97.5%	+3.0%
Average	88.9%	94.3%	+5.4%

Architecture

Base model: xlm-roberta-base
Architecture: Multi-task with task-specific heads + uncertainty-weighted loss
Training: Focal loss, cosine annealing, FP16, gradient accumulation (effective batch 32)
Ensemble: Confidence-based fallback (< 60% uses rule-based)
Model size: ~1.1GB

Usage

# Install
# pip install malaysian-manglish-nlp[transformers]

from malaysian_manglish_nlp.transformers.manglish_model import load_model, predict

model = load_model()
result = predict("weh best gila makanan dia")
# {
#   "sentiment": {"label": "positive", "confidence": 0.95},
#   "emotion": {"label": "happy", "confidence": 0.91},
#   "intent": {"label": "opinion", "confidence": 0.89}
# }

Training Details

Dataset: 14,384 examples (7,884 original + 6,500 augmented)
Epochs: 8 (best at epoch 8)
Batch size: 4 x 8 gradient accumulation = 32 effective
Learning rate: 2e-5 with cosine annealing
Max sequence length: 96 tokens
Label smoothing: 0.1
Early stopping: patience 3
Mixed precision: FP16

Dataset

vexccz/manglish-nlp-dataset

License

MIT

Downloads last month: 152

Evaluation results

Accuracy on Manglish NLP Dataset
self-reported

95.000
Accuracy on Manglish NLP Dataset
self-reported

90.300
Accuracy on Manglish NLP Dataset
self-reported

97.500

vexccz
/

manglish-nlp-sentiment

Manglish NLP Sentiment Model (v3.2.0)

Performance

Comparison with v3.1.0

Architecture

Usage

Training Details

Dataset

Links

License

Evaluation results