SMS Spam Detection with BERT

🎯 A high-performance SMS spam classifier built with BERT achieving 99.16% accuracy.

Model Description

This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:

  • HAM (legitimate message)
  • SPAM (unwanted/spam message)

Performance Metrics

Metric Score
Accuracy 99.16%
Precision 97.30%
Recall 96.43%
F1-Score 96.86%

Quick Start

Using Transformers Pipeline

from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")

# Classify a message
result = classifier("Congratulations! You've won a $1000 gift card!")
print(result)
# Output: [{'label': 'SPAM', 'score': 0.9987}]

Using AutoModel and AutoTokenizer

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "niru-nny/SMS_Spam_Detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Hey, are we still meeting for lunch tomorrow?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

# Map to label
labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")

Training Details

Dataset

  • Source: SMS Spam Collection Dataset
  • Total Messages: 5,574
  • Ham Messages: 4,827 (86.6%)
  • Spam Messages: 747 (13.4%)

Training Configuration

  • Base Model: bert-base-uncased
  • Max Sequence Length: 128 tokens
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Epochs: 3
  • Optimizer: AdamW

Data Split

  • Training: 80%
  • Validation: 20%

Model Architecture

Input Text β†’ BERT Tokenizer β†’ BERT Encoder (12 layers) β†’ [CLS] Token β†’ Classification Head β†’ Output (HAM/SPAM)

Use Cases

βœ… Spam Filtering: Automatically filter spam messages in messaging applications
βœ… SMS Gateway Protection: Protect users from phishing and scam attempts
βœ… Content Moderation: Pre-screen messages in communication platforms
βœ… Fraud Detection: Identify suspicious messages in financial apps

Limitations

  • Model is trained specifically on English SMS messages
  • May not generalize well to other languages or message formats
  • Performance may vary on messages with heavy slang or abbreviations
  • Trained on historical data; new spam patterns may emerge

Ethical Considerations

⚠️ Privacy: Ensure compliance with data protection regulations when processing user messages
⚠️ False Positives: Important legitimate messages might be incorrectly flagged as spam
⚠️ Bias: Model may reflect biases present in training data

Citation

If you use this model, please cite:

@model{sms_spam_detection_bert_2026,
  title={SMS Spam Detection with BERT},
  author={niru-nny},
  year={2026},
  url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
}

License

MIT License

Contact

For questions or feedback, please open an issue on the model repository.


Model Card: For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.

Downloads last month
21
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using niru-nny/SMS_Spam_Detection 1