Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.8.0
Evaluation Summary for IndoHoaxDetector Space
Metrics Overview
- Model Architecture: Logistic Regression trained on Indonesian news labeled as HOAX vs FAKTA.
- Vectorizer: TF-IDF transform created with
tfidf_vectorizer.pklafter applying Indonesian-specific preprocessing. - Accuracy: ~97.83% on the held-out validation split used during training (metadata stored in
model_metadata.txt). - Precision & Recall: Balanced on the styled labels. Precision indicates how often the model flags hoax-style text correctly; recall shows how many hoax-like examples are captured.
- Confidence Scores: The Gradio app exposes probability values for both labels. Use the HOAX probability as a stylistic warning, not a verdict.
Testing Guidelines
- Prepare a set of Indonesian news snippets (title + body) with known labels.
- Run the preprocessing steps defined in
app.py(lowercasing, URL stripping, non-letter removal, stop word removal, Sastrawi stemming, TF-IDF transform). - Use the loaded model to infer probabilities via
predict_news. - Compare predictions with labels and compute metrics with any evaluation script (e.g., run the repository-level
evaluate.pyif you copy it inside this folder).
Reporting
- Document any changes to the dataset or vectorizer.
- If you retrain the model, update this file with the new accuracy, precision, recall, and dataset description to keep the Space trustworthy.
Caveats
- Metrics refer only to stylistic label consistency, not factual verification.
- The evaluation set may not include every possible writing style; monitor drift over time and retrain as needed.