IndoHoaxDetector / evaluation.md
theonegareth's picture
Improve IndoHoaxDetector Space docs, UI, and evaluation
79e7953
# Evaluation Summary for IndoHoaxDetector Space
## Metrics Overview
- **Model Architecture**: Logistic Regression trained on Indonesian news labeled as HOAX vs FAKTA.
- **Vectorizer**: TF-IDF transform created with `tfidf_vectorizer.pkl` after applying Indonesian-specific preprocessing.
- **Accuracy**: ~97.83% on the held-out validation split used during training (metadata stored in `model_metadata.txt`).
- **Precision & Recall**: Balanced on the styled labels. Precision indicates how often the model flags hoax-style text correctly; recall shows how many hoax-like examples are captured.
- **Confidence Scores**: The Gradio app exposes probability values for both labels. Use the HOAX probability as a stylistic warning, not a verdict.
## Testing Guidelines
1. Prepare a set of Indonesian news snippets (title + body) with known labels.
2. Run the preprocessing steps defined in `app.py` (lowercasing, URL stripping, non-letter removal, stop word removal, Sastrawi stemming, TF-IDF transform).
3. Use the loaded model to infer probabilities via `predict_news`.
4. Compare predictions with labels and compute metrics with any evaluation script (e.g., run the repository-level `evaluate.py` if you copy it inside this folder).
## Reporting
- Document any changes to the dataset or vectorizer.
- If you retrain the model, update this file with the new accuracy, precision, recall, and dataset description to keep the Space trustworthy.
## Caveats
- Metrics refer only to stylistic label consistency, not factual verification.
- The evaluation set may not include every possible writing style; monitor drift over time and retrain as needed.