visolex
/

vit5-spam-binary

Text Classification

Eval Results (legacy)

Model card Files Files and versions

vit5-spam-binary / README.md

AnnyNguyen's picture

Upload README.md with huggingface_hub

6195121 verified 2 months ago

|

history blame contribute delete

3.25 kB

	---
	license: apache-2.0
	base_model: VietAI/vit5-base
	tags:
	- vietnamese
	- spam-detection
	- text-classification
	- e-commerce
	datasets:
	- ViSpamReviews
	metrics:
	- accuracy
	- macro-f1
	- macro-precision
	- macro-recall
	model-index:
	- name: vit5-spam-binary
	results:
	- task:
	type: text-classification
	name: Spam Review Detection
	dataset:
	name: ViSpamReviews
	type: ViSpamReviews
	metrics:
	- type: accuracy
	value: 0.9073
	- type: macro-f1
	value: 0.8815
	---
	# vit5-spam-binary: Spam Review Detection for Vietnamese Text

	This model is a fine-tuned version of [VietAI/vit5-base](https://huggingface.co/VietAI/vit5-base) on the ViSpamReviews dataset for spam review detection in Vietnamese e-commerce reviews.

	## Model Details

	* Base Model: `VietAI/vit5-base`
	* Description: ViT5 - Vietnamese T5 model for text generation and classification
	* Dataset: ViSpamReviews (Vietnamese Spam Review Dataset)
	* Fine-tuning Framework: HuggingFace Transformers
	* Task: Spam Review Detection (binary)
	* Number of Classes: 2

	### Hyperparameters

	* Max sequence length: `256`
	* Learning rate: `5e-5`
	* Batch size: `32`
	* Epochs: `100`
	* Early stopping patience: `5`

	## Dataset

	The model was trained on the ViSpamReviews dataset, which contains 19,860 Vietnamese e-commerce review samples. The dataset includes:

	* Train set: 14,299 samples (72%)
	* Validation set: 1,590 samples (8%)
	* Test set: 3,971 samples (20%)

	### Label Distribution


	* Non-spam (0): Genuine product reviews
	* Spam (1): Fake or promotional reviews

	## Results

	The model was evaluated on the test set with the following metrics:

	* Accuracy: `0.9073`
	* Macro-F1: `0.8815`


	## Usage

	You can use this model for spam review detection in Vietnamese text. Below is an example:

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	model_name = "visolex/vit5-spam-binary"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Example review text
	text = "Sản phẩm này rất tốt, shop giao hàng nhanh!"

	# Tokenize
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

	# Predict
	with torch.no_grad():
	outputs = model(**inputs)
	predicted_class = outputs.logits.argmax(dim=-1).item()
	probabilities = torch.softmax(outputs.logits, dim=-1)


	# Map to label
	label_map = {0: "Non-spam", 1: "Spam"}
	predicted_label = label_map[predicted_class]
	confidence = probabilities[0][predicted_class].item()

	print(f"Text: {text}")
	print(f"Predicted: {predicted_label} (confidence: {confidence:.2%})")

	```

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{{
	{model_key}_spam_detection,
	title={{{description}}},
	author={{ViSoLex Team}},
	year={{2025}},
	howpublished={{\url{{https://huggingface.co/{visolex/vit5-spam-binary}}}}}
	}}
	```

	## License

	This model is released under the Apache-2.0 license.

	## Acknowledgments

	* Base model: [{base_model}](https://huggingface.co/{base_model})
	* Dataset: ViSpamReviews (Vietnamese Spam Review Dataset)
	* ViSoLex Toolkit for Vietnamese NLP