Manglish NLP Sentiment Model (v3.2.0)
XLM-Roberta multi-task model fine-tuned on 14,384 labeled Manglish examples for sentiment, emotion, and intent classification.
Performance
| Task | Accuracy | Labels |
|---|---|---|
| Sentiment | 95.0% | positive, negative, neutral |
| Emotion | 90.3% | happy, sad, angry, fear, surprise, disgust, love, neutral |
| Intent | 97.5% | question, statement, request, complaint, greeting, opinion |
| Average | 94.3% | - |
Comparison with v3.1.0
| Task | v3.1.0 (distilbert) | v3.2.0 (xlm-roberta) | Improvement |
|---|---|---|---|
| Sentiment | 88.5% | 95.0% | +6.5% |
| Emotion | 83.6% | 90.3% | +6.7% |
| Intent | 94.5% | 97.5% | +3.0% |
| Average | 88.9% | 94.3% | +5.4% |
Architecture
- Base model: xlm-roberta-base
- Architecture: Multi-task with task-specific heads + uncertainty-weighted loss
- Training: Focal loss, cosine annealing, FP16, gradient accumulation (effective batch 32)
- Ensemble: Confidence-based fallback (< 60% uses rule-based)
- Model size: ~1.1GB
Usage
# Install
# pip install malaysian-manglish-nlp[transformers]
from malaysian_manglish_nlp.transformers.manglish_model import load_model, predict
model = load_model()
result = predict("weh best gila makanan dia")
# {
# "sentiment": {"label": "positive", "confidence": 0.95},
# "emotion": {"label": "happy", "confidence": 0.91},
# "intent": {"label": "opinion", "confidence": 0.89}
# }
Training Details
- Dataset: 14,384 examples (7,884 original + 6,500 augmented)
- Epochs: 8 (best at epoch 8)
- Batch size: 4 x 8 gradient accumulation = 32 effective
- Learning rate: 2e-5 with cosine annealing
- Max sequence length: 96 tokens
- Label smoothing: 0.1
- Early stopping: patience 3
- Mixed precision: FP16
Dataset
Links
- PyPI:
pip install malaysian-manglish-nlp - GitHub: https://github.com/ZafranYusof/malaysia-manglish-nlp
- Docs: https://manglish-nlp.readthedocs.io
- Demo: https://huggingface.co/spaces/vexccz/manglish-nlp-demo
License
MIT
- Downloads last month
- 152
Evaluation results
- Accuracy on Manglish NLP Datasetself-reported95.000
- Accuracy on Manglish NLP Datasetself-reported90.300
- Accuracy on Manglish NLP Datasetself-reported97.500