ToXiBERT: Hate Speech Detection Model
Model Description
ToXiBERT is a fine-tuned RoBERTa-based model specifically designed to detect and classify hate speech, offensive language, and normal content on X (formerly Twitter). The model is built on top of BERTweet and provides three-class classification capabilities.
Model Details
Model Type
- Base Model: BERTweet (RoBERTa architecture)
- Language: English
Intended Use
- Primary Use: Hate speech detection and content moderation
- Intended Users: Researchers, content moderators, social media platforms
Training Details
Training Data
- Source: Custom compiled dataset from X (Twitter) platform
Training Procedure
- Training Infrastructure: Google Colab
- Hyperparameter Tuning: Performed with systematic optimization
- Fine-tuning Strategy: Multi-class classification on pre-trained BERTweet
Classes
The model classifies text into three categories:
- Hate Speech: Content containing hateful language targeting individuals or groups
- Offensive Content: Language that is offensive but doesn't constitute hate speech
- Normal Content: Non-offensive, regular social media content
Performance
Evaluation Results
The model achieves the following performance with optimized class-specific thresholds:
Class-Specific Thresholds
- Hate Speech: 0.560
- Offensive: 0.420
- Normal: 0.320
Performance Metrics
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Hate Speech | 0.624 | 0.818 | 0.708 | 1,625 |
| Offensive | 0.892 | 0.759 | 0.820 | 3,868 |
| Normal | 0.803 | 0.827 | 0.815 | 2,474 |
| Overall Accuracy | 0.792 | 7,967 | ||
| Macro Average | 0.773 | 0.801 | 0.781 | 7,967 |
| Weighted Average | 0.810 | 0.792 | 0.796 | 7,967 |
Usage
Loading the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "your-username/toXibert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
### Recommended Thresholds
For optimal performance, use the following class-specific thresholds:
```python
thresholds = {
"hate_speech": 0.560,
"offensive": 0.420,
"normal": 0.320
}
Limitations and Bias
Limitations
- Context-dependent hate speech may be challenging to detect
- Performance may degrade on content significantly different from training data
- Requires careful threshold tuning for different use cases
Bias Considerations
- Training data sourced from X platform may not represent all demographics equally
- Model may reflect biases present in the training data
- Regular evaluation for fairness across different groups is recommended
Ethical Considerations
- Decisions based on model predictions should include human oversight
- Regular auditing for bias and fairness is essential
- Consider the impact of false positives and false negatives in your specific use case
- Downloads last month
- 15
Model tree for QuincySorrentino/toXibert
Base model
vinai/bertweet-base