abhishek1005 commited on
Commit
80dbb81
Β·
verified Β·
1 Parent(s): c3b7839

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +448 -0
README.md ADDED
@@ -0,0 +1,448 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - sentiment-analysis
6
+ - product-reviews
7
+ - smartphone-reviews
8
+ - aspect-based-sentiment-analysis
9
+ - distilroberta
10
+ - domain-adaptation
11
+ datasets:
12
+ - amazon-reviews
13
+ metrics:
14
+ - accuracy
15
+ - f1
16
+ widget:
17
+ - text: "Battery life is amazing! Best phone I ever had."
18
+ example_title: "Positive Review"
19
+ - text: "Terrible phone. Broke after one week."
20
+ example_title: "Negative Review"
21
+ - text: "It's okay, nothing special about it."
22
+ example_title: "Neutral Review"
23
+ - text: "Camera quality is excellent but battery drains quickly."
24
+ example_title: "Mixed Sentiment"
25
+ model-index:
26
+ - name: SmartReview DistilRoBERTa Sentiment
27
+ results:
28
+ - task:
29
+ type: text-classification
30
+ name: Sentiment Analysis
31
+ dataset:
32
+ name: Amazon Smartphone Reviews
33
+ type: amazon-reviews
34
+ metrics:
35
+ - type: accuracy
36
+ value: 88.23
37
+ name: Test Accuracy
38
+ - type: f1
39
+ value: 94.88
40
+ name: F1 Score (Positive)
41
+ - type: f1
42
+ value: 85.82
43
+ name: F1 Score (Negative)
44
+ - type: f1
45
+ value: 36.35
46
+ name: F1 Score (Neutral)
47
+ ---
48
+
49
+ # SmartReview: DistilRoBERTa for Smartphone Review Sentiment Analysis
50
+
51
+ [![Hugging Face Model](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-yellow)](https://huggingface.co/Abhishek86798/smartreview-distilroberta-sentiment)
52
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
53
+
54
+ ## Model Description
55
+
56
+ **SmartReview** is a domain-adapted DistilRoBERTa model fine-tuned for sentiment analysis of smartphone and electronics reviews.
57
+
58
+ The model achieves **88.23% accuracy** on 3-class sentiment classification (Positive, Neutral, Negative) and was specifically trained on 67,987 Amazon smartphone reviews.
59
+
60
+ ### 🎯 Key Features
61
+
62
+ - βœ… **Domain-Adapted**: Pretrained on 61,553 smartphone reviews via Masked Language Modeling
63
+ - βœ… **Efficient**: Only 82M parameters (34% smaller than RoBERTa-base)
64
+ - βœ… **Accurate**: 88.23% overall accuracy, 94.88% F1 on positive sentiment
65
+ - βœ… **Fast**: ~50ms inference time per review
66
+ - βœ… **Specialized**: Understands product review vocabulary and context
67
+
68
+ ### πŸ—οΈ Architecture
69
+
70
+ - **Base Model**: `distilroberta-base` (82M parameters)
71
+ - **Task**: 3-class sequence classification
72
+ - **Classes**:
73
+ - `LABEL_0`: Positive
74
+ - `LABEL_1`: Neutral
75
+ - `LABEL_2`: Negative
76
+ - **Max Length**: 512 tokens
77
+
78
+ ### πŸ“Š Training Approach
79
+
80
+ **Two-Phase Training:**
81
+
82
+ 1. **Phase 1 - Domain Adaptation (MLM)**
83
+ - Task: Masked Language Modeling
84
+ - Data: 61,553 smartphone reviews
85
+ - Duration: 66 minutes
86
+ - Result: 99.99% accuracy on domain vocabulary
87
+
88
+ 2. **Phase 2 - Sentiment Fine-tuning**
89
+ - Task: 3-class classification
90
+ - Data: 39,044 training samples
91
+ - Duration: 67 minutes
92
+ - Optimizer: AdamW (lr=2e-5, weight_decay=0.01)
93
+ - Hardware: NVIDIA RTX 3050 (4GB)
94
+
95
+ ---
96
+
97
+ ## πŸ“ˆ Performance
98
+
99
+ ### Overall Metrics (Test Set: 8,367 reviews)
100
+
101
+ | Metric | Score |
102
+ |--------|-------|
103
+ | **Accuracy** | **88.23%** |
104
+ | **Precision (Macro)** | 72.38% |
105
+ | **Recall (Macro)** | 72.39% |
106
+ | **F1 (Macro)** | 72.35% |
107
+ | **F1 (Weighted)** | 88.13% |
108
+
109
+ ### Per-Class Performance
110
+
111
+ | Class | Precision | Recall | F1-Score | Support |
112
+ |-------|-----------|--------|----------|---------|
113
+ | **Positive** | 95.39% | 94.38% | **94.88%** βœ… | 5,481 |
114
+ | **Neutral** | 37.79% | 35.02% | **36.35%** ⚠️ | 614 |
115
+ | **Negative** | 83.96% | 87.76% | **85.82%** βœ… | 2,272 |
116
+
117
+ **Note:** Neutral class F1 is lower due to severe class imbalance (only 7.4% of training data). This is expected in product reviews where opinions are rarely truly neutral.
118
+
119
+ ### Confusion Matrix
120
+
121
+ ```
122
+ PREDICTED
123
+ Pos Neu Neg
124
+ ACTUAL
125
+ Pos 5,173 175 133 (94.4% correct)
126
+ Neu 151 215 248 (35.0% correct)
127
+ Neg 99 179 1,994 (87.8% correct)
128
+ ```
129
+
130
+ ---
131
+
132
+ ## πŸš€ Usage
133
+
134
+ ### Quick Start
135
+
136
+ ```python
137
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
138
+ import torch
139
+
140
+ # Load model and tokenizer
141
+ model_name = "Abhishek86798/smartreview-distilroberta-sentiment"
142
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
143
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
144
+
145
+ # Example review
146
+ text = "Battery life is excellent but camera quality is poor"
147
+
148
+ # Tokenize and predict
149
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
150
+
151
+ with torch.no_grad():
152
+ outputs = model(**inputs)
153
+ logits = outputs.logits
154
+ probabilities = torch.softmax(logits, dim=-1)
155
+ prediction = logits.argmax(-1).item()
156
+
157
+ # Map to labels
158
+ labels = ["Positive", "Neutral", "Negative"]
159
+ sentiment = labels[prediction]
160
+ confidence = probabilities[0][prediction].item()
161
+
162
+ print(f"Sentiment: {sentiment}")
163
+ print(f"Confidence: {confidence:.2%}")
164
+ ```
165
+
166
+ **Output:**
167
+ ```
168
+ Sentiment: Positive
169
+ Confidence: 85.34%
170
+ ```
171
+
172
+ ### Using Pipeline
173
+
174
+ ```python
175
+ from transformers import pipeline
176
+
177
+ # Create sentiment analysis pipeline
178
+ classifier = pipeline(
179
+ "sentiment-analysis",
180
+ model="Abhishek86798/smartreview-distilroberta-sentiment",
181
+ tokenizer="Abhishek86798/smartreview-distilroberta-sentiment"
182
+ )
183
+
184
+ # Single prediction
185
+ result = classifier("Amazing phone! Battery lasts all day.")
186
+ print(result)
187
+ # [{'label': 'LABEL_0', 'score': 0.9876}] # LABEL_0 = Positive
188
+
189
+ # Batch prediction
190
+ reviews = [
191
+ "Amazing phone! Battery lasts all day.",
192
+ "Terrible. Phone broke after one week.",
193
+ "It's okay, nothing special."
194
+ ]
195
+
196
+ results = classifier(reviews)
197
+ for review, result in zip(reviews, results):
198
+ print(f"{review} β†’ {result['label']} ({result['score']:.2%})")
199
+ ```
200
+
201
+ ### Detailed Prediction Function
202
+
203
+ ```python
204
+ def predict_sentiment_detailed(text, model, tokenizer):
205
+ # Get detailed sentiment prediction with all probabilities
206
+ # Args: text (str), model, tokenizer
207
+ # Returns: dict with sentiment, confidence, and probabilities
208
+ # Tokenize
209
+ inputs = tokenizer(
210
+ text,
211
+ return_tensors="pt",
212
+ truncation=True,
213
+ max_length=512,
214
+ padding=True
215
+ )
216
+
217
+ # Predict
218
+ with torch.no_grad():
219
+ outputs = model(**inputs)
220
+ logits = outputs.logits
221
+ probabilities = torch.softmax(logits, dim=-1)[0]
222
+
223
+ # Get results
224
+ labels = ["Positive", "Neutral", "Negative"]
225
+ prediction_idx = logits.argmax(-1).item()
226
+
227
+ return {
228
+ "text": text,
229
+ "sentiment": labels[prediction_idx],
230
+ "confidence": probabilities[prediction_idx].item(),
231
+ "probabilities": {
232
+ "positive": probabilities[0].item(),
233
+ "neutral": probabilities[1].item(),
234
+ "negative": probabilities[2].item()
235
+ }
236
+ }
237
+
238
+ # Example
239
+ result = predict_sentiment_detailed(
240
+ "Screen is bright and clear, love the display!",
241
+ model,
242
+ tokenizer
243
+ )
244
+
245
+ print(f"Sentiment: {result['sentiment']}")
246
+ print(f"Confidence: {result['confidence']:.2%}")
247
+ print(f"Probabilities:")
248
+ for sentiment, prob in result['probabilities'].items():
249
+ print(f" {sentiment.capitalize()}: {prob:.2%}")
250
+ ```
251
+
252
+ ---
253
+
254
+ ## πŸ“Š Dataset
255
+
256
+ ### Training Data
257
+
258
+ - **Source**: Amazon Cell Phones & Accessories Reviews (Kaggle)
259
+ - **Time Period**: 2015-2019
260
+ - **Total Reviews**: 67,987
261
+ - **Products**: 721 smartphone models
262
+
263
+ ### Split Distribution
264
+
265
+ | Split | Reviews | Percentage |
266
+ |-------|---------|------------|
267
+ | Training | 39,044 | 57.4% |
268
+ | Validation | 8,367 | 12.3% |
269
+ | Test | 8,367 | 12.3% |
270
+
271
+ ### Sentiment Distribution
272
+
273
+ | Sentiment | Count | Percentage | Rating Mapping |
274
+ |-----------|-------|------------|----------------|
275
+ | Positive | 32,615 | 57.5% | 4-5 stars |
276
+ | Neutral | 4,200 | 7.4% | 3 stars |
277
+ | Negative | 15,572 | 27.4% | 1-2 stars |
278
+
279
+ ---
280
+
281
+ ## 🎯 Intended Use
282
+
283
+ ### βœ… Recommended Use Cases
284
+
285
+ - Sentiment analysis of smartphone/electronics reviews
286
+ - Product feedback analysis for e-commerce platforms
287
+ - Customer satisfaction monitoring
288
+ - Review summarization preprocessing
289
+ - Aspect-based sentiment analysis (as part of ABSA pipeline)
290
+
291
+ ### ❌ Out-of-Scope Use
292
+
293
+ - Non-English reviews (model trained on English only)
294
+ - Non-product reviews (news articles, social media posts, etc.)
295
+ - Offensive content detection
296
+ - Sarcasm detection (known limitation)
297
+ - Real-time chat/conversation analysis
298
+
299
+ ---
300
+
301
+ ## ⚠️ Limitations
302
+
303
+ 1. **Neutral Class Performance**: F1-score of 36.35% due to severe class imbalance (only 7.4% of training data). The model tends to classify neutral reviews as positive or negative.
304
+
305
+ 2. **Sarcasm Detection**: Model struggles with sarcastic language. Example: *"Great, another phone that breaks after a week"* may be classified as positive.
306
+
307
+ 3. **Domain Specificity**: Trained specifically on smartphone reviews. Performance may degrade on other product categories without domain adaptation.
308
+
309
+ 4. **Context-Free Predictions**: Doesn't consider user expectations or product price range. *"Battery lasts 4 hours"* might be negative for smartphones but positive for smartwatches.
310
+
311
+ 5. **Mixed Sentiments**: Reviews with multiple conflicting opinions may be misclassified based on the dominant sentiment.
312
+
313
+ ---
314
+
315
+ ## πŸ”§ Training Details
316
+
317
+ ### Hyperparameters
318
+
319
+ ```yaml
320
+ Model:
321
+ base_model: distilroberta-base
322
+ num_labels: 3
323
+ max_position_embeddings: 512
324
+ hidden_size: 768
325
+ num_hidden_layers: 6
326
+ num_attention_heads: 12
327
+ dropout: 0.1
328
+
329
+ Training:
330
+ learning_rate: 2e-5
331
+ batch_size: 4
332
+ gradient_accumulation_steps: 4
333
+ effective_batch_size: 16
334
+ epochs: 5
335
+ warmup_steps: 500
336
+ weight_decay: 0.01
337
+ optimizer: AdamW
338
+ fp16: true
339
+ max_grad_norm: 1.0
340
+
341
+ Hardware:
342
+ gpu: NVIDIA RTX 3050 (4GB VRAM)
343
+ memory_usage: ~2.5 GB
344
+ training_time: 67 minutes
345
+ ```
346
+
347
+ ### Training Loss Progression
348
+
349
+ | Epoch | Train Loss | Val Loss | Val Accuracy |
350
+ |-------|------------|----------|--------------|
351
+ | 1 | 0.3832 | 0.3724 | 87.22% |
352
+ | 2 | 0.2833 | 0.3274 | 88.17% |
353
+ | 3 | 0.1935 | 0.3740 | 88.22% |
354
+ | 4 | 0.1661 | 0.4177 | 88.68% |
355
+ | 5 | 0.1328 | 0.4728 | 88.38% |
356
+
357
+ **Best Model**: Epoch 4 (highest validation accuracy)
358
+
359
+ ---
360
+
361
+ ## 🌟 Comparison with Other Models
362
+
363
+ | Model | Parameters | Accuracy | Training Time | GPU Memory |
364
+ |-------|------------|----------|---------------|------------|
365
+ | SVM (TF-IDF) | - | 78.4% | <5 min | <1 GB |
366
+ | LSTM | 2M | 82.3% | ~45 min | ~1.5 GB |
367
+ | BERT-base | 110M | 85.7% | ~90 min | ~3.2 GB |
368
+ | **SmartReview (Ours)** | **82M** | **88.23%** | **67 min** | **2.5 GB** |
369
+ | RoBERTa-base | 125M | ~89-90% | ~120 min | ~3.8 GB |
370
+
371
+ **Key Advantage**: Achieves competitive accuracy with 34% fewer parameters and 44% faster training than RoBERTa-base.
372
+
373
+ ---
374
+
375
+ ## πŸ“ Bias and Fairness
376
+
377
+ - Model trained on Amazon reviews from 2015-2019
378
+ - May reflect temporal biases (older smartphone features/expectations)
379
+ - Performance may vary across different price ranges and brands
380
+ - Dataset primarily contains English reviews from US market
381
+ - Recommended to validate on your specific use case and domain
382
+
383
+ ---
384
+
385
+ ## πŸ“š Citation
386
+
387
+ If you use this model in your research or applications, please cite:
388
+
389
+ ```bibtex
390
+ @misc{smartreview2025,
391
+ author = {Abhishek},
392
+ title = {SmartReview: Efficient Aspect-Based Sentiment Analysis using Domain-Adapted DistilRoBERTa},
393
+ year = {2025},
394
+ publisher = {Hugging Face},
395
+ journal = {Hugging Face Model Hub},
396
+ howpublished = {\url{https://huggingface.co/Abhishek86798/smartreview-distilroberta-sentiment}}
397
+ }
398
+ ```
399
+
400
+ ---
401
+
402
+ ## πŸ”— Additional Resources
403
+
404
+ - **Project Repository**: [GitHub - SmartReview](https://github.com/Abhishek86798/smartAnalysis)
405
+ - **Full Technical Report**: Available in repository
406
+ - **Training Notebooks**: 6 complete Jupyter notebooks
407
+ - **ABSA Pipeline**: Complete aspect-based sentiment analysis system
408
+ - **Contact**: [Your Email]
409
+
410
+ ---
411
+
412
+ ## πŸ‘₯ Model Card Authors
413
+
414
+ **Abhishek** ([Abhishek86798](https://github.com/Abhishek86798))
415
+
416
+ ---
417
+
418
+ ## πŸ“„ License
419
+
420
+ This model is released under the **Apache License 2.0**.
421
+
422
+ ---
423
+
424
+ ## πŸ™ Acknowledgments
425
+
426
+ - **Base Model**: `distilroberta-base` by Hugging Face
427
+ - **Dataset**: Amazon Reviews dataset (Kaggle)
428
+ - **Framework**: Hugging Face Transformers
429
+ - **Inspiration**: Research in domain adaptation and efficient NLP models
430
+
431
+ ---
432
+
433
+ ## πŸ“ž Support
434
+
435
+ For issues, questions, or feedback:
436
+ - Open an issue on GitHub
437
+ - Contact: [Your Email]
438
+ - Hugging Face Discussions
439
+
440
+ ---
441
+
442
+ **Model Version**: 1.0
443
+ **Last Updated**: November 10, 2025
444
+ **Status**: Production-Ready βœ…
445
+
446
+ ---
447
+
448
+ *Making advanced sentiment analysis accessible for everyone!* πŸš€