File size: 2,834 Bytes
d6b6802 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
sdk: streamlit
sdk_version: 1.50.0
---
# 🧪 Advanced ML Sentiment Lab
[](https://streamlit.io/)<br>
[](LICENSE)<br>
[](https://github.com/tarekmasryo)
---
## 📌 Overview
Interactive **Streamlit + Plotly** app for **binary sentiment analysis**.
Upload any CSV with a **text column** and a **binary label**, then:
- Run quick EDA on text lengths, tokens, and class balance
- Build TF-IDF word + optional char features
- Train multiple classical models (LogReg / RF / GB / Naive Bayes)
- Tune the decision threshold with **FP/FN business costs**
- Inspect misclassified samples and test arbitrary texts live
Works well with the classic **IMDB 50K Reviews** dataset, but is generic enough for product reviews, tickets, surveys, etc.
---
## 📊 Dashboard Preview
### EDA & KPIs

### Train & Validation

### Error Analysis

### Deploy & Interactive Prediction

---
## 🚀 How to use (in this Space)
1. **Load data**
- Upload a CSV file
- Or place `IMDB Dataset.csv` / `imdb.csv` in the Space and reload
2. **Map columns**
- Choose the **text** column
- Choose the **label** column and map which values are *positive* vs *negative*
3. **Train models**
- Go to **“Train & Validation”**
- Set TF-IDF options, pick models, click **Train models**
4. **Analyse & deploy**
- Use **“Threshold & Cost”** to pick a business-aware threshold
- Check **“Compare Models”** + **“Error Analysis”**
- In **“Deploy”**, try any text and see the predicted sentiment + confidence bar
No data is stored server-side beyond the current session.
---
## 🧠 Under the hood
- **Features**
- Word TF-IDF (1–3 n-grams)
- Optional char TF-IDF (3–6 n-grams)
- **Models**
- Logistic Regression (balanced)
- Random Forest
- Gradient Boosting
- Multinomial Naive Bayes
- **Artifacts**
- Saved under `models_sentiment_lab/`:
- `vectorizers.joblib`, `models.joblib`, `results.joblib`, `metadata.joblib`
- Reused by Threshold, Compare, Error Analysis, and Deploy tabs
---
## 🖥 Run locally
```bash
git clone https://github.com/tarekmasryo/advanced-ml-sentiment-lab.git
cd advanced-ml-sentiment-lab
python -m venv .venv
# Windows: .venv\Scripts\activate
source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py
```
---
## 📄 License & credit
Code: **Apache 2.0**
Space & dashboard by **Tarek Masryo** 🚀
|