Fashion Category Classifier

DistilBERT fine-tuned to classify Indian fashion product titles into 11 categories.

Credits

Built on DistilBERT by Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf (Hugging Face), licensed under Apache 2.0.

Labels

Tops Bottoms Dresses Outerwear Footwear Bags Jewellery Ethnicwear Activewear Innerwear Headwear

Usage

from transformers import pipeline
clf = pipeline("text-classification", model="roaringguts/DistilBERT")
clf("Nike Dry Fit Running Tshirt")
# [{'label': 'Activewear', 'score': 0.99}]
# batch
clf(["Lavie Women Tote Bag", "Malabar Gold Plated Necklace", "Clarks Oxford Shoes"])

Training

Base model: distilbert-base-uncased
Dataset: ~63k real product titles from Myntra + synthetic samples generated for underrepresented categories
Split: 80/10/10 train/val/test, stratified
Epochs: 5 (early stopping, patience 2)
Batch size: 64
Learning rate: 3e-5
Precision: fp16

Evaluation

Evaluated on a held-out stratified test set.

Metric	Score
Accuracy	99.27%

Limitations

Trained on Indian e-commerce titles — may underperform on Western brand naming conventions
Ethnicwear and Innerwear have some overlap (e.g. sports bras vs regular bras)
Low sample count for Bags and Headwear in original data, partially filled with synthetic titles

License

This model is released under CC BY 4.0. The base DistilBERT weights it derives from are licensed under Apache 2.0.

Downloads last month: 137

Safetensors

Model size

67M params

Tensor type

F32

Model tree for roaringguts/DistilBERT

Base model

distilbert/distilbert-base-uncased

Finetuned

(11182)

this model