Manglish NLP Sentiment Model (v3.2.0)

XLM-Roberta multi-task model fine-tuned on 14,384 labeled Manglish examples for sentiment, emotion, and intent classification.

Performance

Task Accuracy Labels
Sentiment 95.0% positive, negative, neutral
Emotion 90.3% happy, sad, angry, fear, surprise, disgust, love, neutral
Intent 97.5% question, statement, request, complaint, greeting, opinion
Average 94.3% -

Comparison with v3.1.0

Task v3.1.0 (distilbert) v3.2.0 (xlm-roberta) Improvement
Sentiment 88.5% 95.0% +6.5%
Emotion 83.6% 90.3% +6.7%
Intent 94.5% 97.5% +3.0%
Average 88.9% 94.3% +5.4%

Architecture

  • Base model: xlm-roberta-base
  • Architecture: Multi-task with task-specific heads + uncertainty-weighted loss
  • Training: Focal loss, cosine annealing, FP16, gradient accumulation (effective batch 32)
  • Ensemble: Confidence-based fallback (< 60% uses rule-based)
  • Model size: ~1.1GB

Usage

# Install
# pip install malaysian-manglish-nlp[transformers]

from malaysian_manglish_nlp.transformers.manglish_model import load_model, predict

model = load_model()
result = predict("weh best gila makanan dia")
# {
#   "sentiment": {"label": "positive", "confidence": 0.95},
#   "emotion": {"label": "happy", "confidence": 0.91},
#   "intent": {"label": "opinion", "confidence": 0.89}
# }

Training Details

  • Dataset: 14,384 examples (7,884 original + 6,500 augmented)
  • Epochs: 8 (best at epoch 8)
  • Batch size: 4 x 8 gradient accumulation = 32 effective
  • Learning rate: 2e-5 with cosine annealing
  • Max sequence length: 96 tokens
  • Label smoothing: 0.1
  • Early stopping: patience 3
  • Mixed precision: FP16

Dataset

vexccz/manglish-nlp-dataset

Links

License

MIT

Downloads last month
152
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results