ToXiBERT: Hate Speech Detection Model

Model Description

ToXiBERT is a fine-tuned RoBERTa-based model specifically designed to detect and classify hate speech, offensive language, and normal content on X (formerly Twitter). The model is built on top of BERTweet and provides three-class classification capabilities.

Model Details

Model Type

Base Model: BERTweet (RoBERTa architecture)
Language: English

Intended Use

Primary Use: Hate speech detection and content moderation
Intended Users: Researchers, content moderators, social media platforms

Training Details

Training Data

Source: Custom compiled dataset from X (Twitter) platform

Training Procedure

Training Infrastructure: Google Colab
Hyperparameter Tuning: Performed with systematic optimization
Fine-tuning Strategy: Multi-class classification on pre-trained BERTweet

Classes

The model classifies text into three categories:

Hate Speech: Content containing hateful language targeting individuals or groups
Offensive Content: Language that is offensive but doesn't constitute hate speech
Normal Content: Non-offensive, regular social media content

Performance

Evaluation Results

The model achieves the following performance with optimized class-specific thresholds:

Class-Specific Thresholds

Hate Speech: 0.560
Offensive: 0.420
Normal: 0.320

Performance Metrics

Class	Precision	Recall	F1-Score	Support
Hate Speech	0.624	0.818	0.708	1,625
Offensive	0.892	0.759	0.820	3,868
Normal	0.803	0.827	0.815	2,474
Overall Accuracy			0.792	7,967
Macro Average	0.773	0.801	0.781	7,967
Weighted Average	0.810	0.792	0.796	7,967

Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "your-username/toXibert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

### Recommended Thresholds
For optimal performance, use the following class-specific thresholds:
```python
thresholds = {
    "hate_speech": 0.560,
    "offensive": 0.420,
    "normal": 0.320
}

Limitations and Bias

Limitations

Context-dependent hate speech may be challenging to detect
Performance may degrade on content significantly different from training data
Requires careful threshold tuning for different use cases

Bias Considerations

Training data sourced from X platform may not represent all demographics equally
Model may reflect biases present in the training data
Regular evaluation for fairness across different groups is recommended

Ethical Considerations

Decisions based on model predictions should include human oversight
Regular auditing for bias and fairness is essential
Consider the impact of false positives and false negatives in your specific use case

Downloads last month: 15

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for QuincySorrentino/toXibert

Base model

vinai/bertweet-base

Finetuned

(252)

this model