ToXiBERT: Hate Speech Detection Model

Model Description

ToXiBERT is a fine-tuned RoBERTa-based model specifically designed to detect and classify hate speech, offensive language, and normal content on X (formerly Twitter). The model is built on top of BERTweet and provides three-class classification capabilities.

Model Details

Model Type

  • Base Model: BERTweet (RoBERTa architecture)
  • Language: English

Intended Use

  • Primary Use: Hate speech detection and content moderation
  • Intended Users: Researchers, content moderators, social media platforms

Training Details

Training Data

  • Source: Custom compiled dataset from X (Twitter) platform

Training Procedure

  • Training Infrastructure: Google Colab
  • Hyperparameter Tuning: Performed with systematic optimization
  • Fine-tuning Strategy: Multi-class classification on pre-trained BERTweet

Classes

The model classifies text into three categories:

  1. Hate Speech: Content containing hateful language targeting individuals or groups
  2. Offensive Content: Language that is offensive but doesn't constitute hate speech
  3. Normal Content: Non-offensive, regular social media content

Performance

Evaluation Results

The model achieves the following performance with optimized class-specific thresholds:

Class-Specific Thresholds

  • Hate Speech: 0.560
  • Offensive: 0.420
  • Normal: 0.320

Performance Metrics

Class Precision Recall F1-Score Support
Hate Speech 0.624 0.818 0.708 1,625
Offensive 0.892 0.759 0.820 3,868
Normal 0.803 0.827 0.815 2,474
Overall Accuracy 0.792 7,967
Macro Average 0.773 0.801 0.781 7,967
Weighted Average 0.810 0.792 0.796 7,967

Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "your-username/toXibert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

### Recommended Thresholds
For optimal performance, use the following class-specific thresholds:
```python
thresholds = {
    "hate_speech": 0.560,
    "offensive": 0.420,
    "normal": 0.320
}

Limitations and Bias

Limitations

  • Context-dependent hate speech may be challenging to detect
  • Performance may degrade on content significantly different from training data
  • Requires careful threshold tuning for different use cases

Bias Considerations

  • Training data sourced from X platform may not represent all demographics equally
  • Model may reflect biases present in the training data
  • Regular evaluation for fairness across different groups is recommended

Ethical Considerations

  • Decisions based on model predictions should include human oversight
  • Regular auditing for bias and fairness is essential
  • Consider the impact of false positives and false negatives in your specific use case
Downloads last month
15
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for QuincySorrentino/toXibert

Finetuned
(252)
this model