SMS Spam Detection with BERT
π― A high-performance SMS spam classifier built with BERT achieving 99.16% accuracy.
Model Description
This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:
- HAM (legitimate message)
- SPAM (unwanted/spam message)
Performance Metrics
| Metric | Score |
|---|---|
| Accuracy | 99.16% |
| Precision | 97.30% |
| Recall | 96.43% |
| F1-Score | 96.86% |
Quick Start
Using Transformers Pipeline
from transformers import pipeline
# Load the model
classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")
# Classify a message
result = classifier("Congratulations! You've won a $1000 gift card!")
print(result)
# Output: [{'label': 'SPAM', 'score': 0.9987}]
Using AutoModel and AutoTokenizer
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "niru-nny/SMS_Spam_Detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare input
text = "Hey, are we still meeting for lunch tomorrow?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
# Map to label
labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")
Training Details
Dataset
- Source: SMS Spam Collection Dataset
- Total Messages: 5,574
- Ham Messages: 4,827 (86.6%)
- Spam Messages: 747 (13.4%)
Training Configuration
- Base Model:
bert-base-uncased - Max Sequence Length: 128 tokens
- Batch Size: 16
- Learning Rate: 2e-5
- Epochs: 3
- Optimizer: AdamW
Data Split
- Training: 80%
- Validation: 20%
Model Architecture
Input Text β BERT Tokenizer β BERT Encoder (12 layers) β [CLS] Token β Classification Head β Output (HAM/SPAM)
Use Cases
β
Spam Filtering: Automatically filter spam messages in messaging applications
β
SMS Gateway Protection: Protect users from phishing and scam attempts
β
Content Moderation: Pre-screen messages in communication platforms
β
Fraud Detection: Identify suspicious messages in financial apps
Limitations
- Model is trained specifically on English SMS messages
- May not generalize well to other languages or message formats
- Performance may vary on messages with heavy slang or abbreviations
- Trained on historical data; new spam patterns may emerge
Ethical Considerations
β οΈ Privacy: Ensure compliance with data protection regulations when processing user messages
β οΈ False Positives: Important legitimate messages might be incorrectly flagged as spam
β οΈ Bias: Model may reflect biases present in training data
Citation
If you use this model, please cite:
@model{sms_spam_detection_bert_2026,
title={SMS Spam Detection with BERT},
author={niru-nny},
year={2026},
url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
}
License
MIT License
Contact
For questions or feedback, please open an issue on the model repository.
Model Card: For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.
- Downloads last month
- 21