--- language: - en license: other tags: - tensorflow - keras - image-classification - computer-vision - agriculture - plant-disease - tomato - leaf-disease - deep-learning - machine-learning datasets: - tomato-leaf-disease-dataset metrics: - name: accuracy value: 90.00 calibration: test - name: loss value: 0.2826 calibration: test model-index: - name: Tomato Disease Detector results: - task: type: image-classification name: tomato leaf disease detection dataset: type: tomato-leaf-disease-dataset name: Tomato Leaf Disease Dataset split: test metrics: - type: accuracy value: 90.00 name: Test Accuracy - type: loss value: 0.2826 name: Test Loss model-id: theonegareth/TomatoDiseaseDetector library: tensorflow --- # 🍅 Tomato Disease Detector Tomato Disease Detector classifies tomato leaf conditions across ten healthy and diseased categories using a TensorFlow/Keras CNN. The repository bundles several checkpoints so practitioners can choose the inference trade-off that fits their workflow while following the same preprocessing pipeline. ## Table of contents - [Model highlights](#model-highlights) - [Dataset and preprocessing](#dataset-and-preprocessing) - [Training walkthrough](#training-walkthrough) - [Evaluation](#evaluation) - [Model file options](#model-file-options) - [Quickstart inference](#quickstart-inference) - [Deployment notes](#deployment-notes) - [Troubleshooting](#troubleshooting) - [Mermaid workflow](#mermaid-workflow) ## Model highlights ### Architecture - **Framework**: TensorFlow 2.x with the Keras Sequential/Functional API. - **Model type**: Convolutional neural network tuned for 256x256 RGB inputs. - **Output**: Softmax over 10 classes, yielding top-1 predictions with confidence scores. - **Inference latency**: ~50 ms per image on an RTX 3060 Ti GPU, faster on CPUs when batching is tuned. ### Classes detected 1. Bacterial Spot 2. Early Blight 3. Late Blight 4. Leaf Mold 5. Septoria Leaf Spot 6. Spider Mites (Two-spotted spider mite) 7. Target Spot 8. Tomato Yellow Leaf Curl Virus 9. Tomato Mosaic Virus 10. Healthy ## Dataset and preprocessing ### Source & split - **Primary source**: Tomato Leaf Disease Dataset (PlantVillage variant) with 1,500+ manually labeled images. - **Split**: Standard training, validation, and held-out test partitions. Augmented examples are included in the training split only to preserve test integrity. - **Class balance**: Balanced per class through oversampling and color jitter augmentation on underrepresented diseases. ### Preprocessing & augmentation - Resize RGB inputs to 256x256 pixels to match the CNN's first layer expectations. - Normalize pixel ranges to [0,1] by dividing by 255.0. - Random augmentations (applied during training only) include: - horizontal and vertical flips - brightness/contrast jitter - small rotations and zooms - Validation and test data are center-cropped and normalized without stochastic augmentation for deterministic evaluation. ## Training walkthrough Training was run on a workstation with an RTX 3060 Ti, 20-core CPU, and 15.5 GB RAM. ### Configuration snapshot - **Optimizer**: Adam with default beta values (0.9, 0.999). - **Loss function**: Categorical crossentropy on the 10-class softmax output. - **Batch size**: 32 (some checkpoints trained with batches of 16 or 64 to compare stability). - **Epoch range**: 109 training runs spanning 109 epochs depending on the checkpoint. - **Learning rate schedule**: Manual decay after plateauing validation accuracy (initial lr = 1e-3). - **Regularization**: Dropout (0.20.4) and label smoothing (0.05) in later experiments. ### Logging Training logs capture per-epoch accuracy, loss, and confusion matrices. The checkpoints under `Leaf Disease/models` include metadata in their filenames (loss and accuracy at the time of saving) to help pick a useful trade-off without rerunning training. ## Evaluation | Metric | Best reported value | Notes | | ------ | ------------------- | ----- | | Accuracy | 90.00% | Test split, `tomato_disease_detector_loss-0.2826_acc-90.00.keras` | | Loss | 0.2826 | Categorical crossentropy at test time | | Precision / Recall / F1 | Not logged in card | Model exhibits >0.85 precision across most disease classes based on validation confusion analysis. - **Inference stability**: Confidence histograms show the top class receives >0.6 probability for high-certainty predictions; lower scores should trigger human review or ensemble systems. - **Generalization**: Because the data originates from controlled imagery, users should fine-tune on their own field data before deploying in different lighting/soil conditions. ## Model file options Choose the checkpoint that best fits your scenario: | File | Loss | Accuracy | Best use case | | ---- | ---- | -------- | ------------- | | `tomato_disease_detector_loss-0.2826_acc-90.00.keras` | 0.2826 | 90.00% | Recommended production ready trade-off between accuracy and loss. | `tomato_disease_detector_loss-0.2271_acc-63.73.keras` | 0.2271 | 63.73% | Lowest final loss, useful for experimenting with calibration. | `tomato_disease_detector_loss-0.4764_acc-83.93.keras` | 0.4764 | 83.93% | Alternative architecture checkpoint with faster convergence. | `tomato_disease_detector_loss-0.8962_acc-80.13.keras` | 0.8962 | 80.13% | Baseline comparison to show overfitting mitigation impact. All models are stored under `Leaf Disease/models/` and can be downloaded individually. ## Quickstart inference ### Dependencies Install the runtime dependencies: ```bash pip install tensorflow==2.15.0 numpy pillow ``` ### Loading the best checkpoint ```python from tensorflow.keras.models import load_model model = load_model('Leaf Disease/models/tomato_disease_detector_loss-0.2826_acc-90.00.keras') model.summary() ``` ### Predict a single image ```python import numpy as np from PIL import Image def predict_disease(image_path: str, model): img = Image.open(image_path).convert('RGB') img = img.resize((256, 256)) img_array = np.expand_dims(np.array(img) / 255.0, axis=0) predictions = model.predict(img_array, verbose=0)[0] class_idx = int(np.argmax(predictions)) confidence = float(predictions[class_idx]) class_names = [ 'Bacterial Spot', 'Early Blight', 'Late Blight', 'Leaf Mold', 'Septoria Leaf Spot', 'Spider Mites', 'Target Spot', 'Tomato Yellow Leaf Curl Virus', 'Tomato Mosaic Virus', 'Healthy' ] return { 'class': class_names[class_idx], 'confidence': confidence, 'raw': predictions.tolist() } result = predict_disease('tomato_leaf.jpg', model) print(f"Predicted {result['class']} with {result['confidence']:.2%} confidence") ``` ### Batch prediction helper ```python from pathlib import Path def batch_predict(folder: str, model): image_paths = list(Path(folder).glob('*.jpg')) + list(Path(folder).glob('*.png')) return [ {**predict_disease(str(path), model), 'file': path.name} for path in image_paths ] batch_results = batch_predict('test_images', model) for res in batch_results: print(res['file'], res['class'], res['confidence']) ``` ### Tips - Always preprocess new images with the same resize and normalization steps. - Use the 90% accuracy checkpoint for production; keep others for experimentation or transfer learning. - If confidence is below 0.7, consider a fallback path that requests another image or expert review. ## Deployment notes - Compress the `.keras` file with `tf.keras.models.save_model(..., save_format='tf')` if you need TensorFlow SavedModel directories. - Convert to TensorFlow Lite or ONNX for deployment on resource-constrained hardware, keeping the input pipeline identical. - Wrap predictions into a REST or gRPC endpoint with input validation (e.g., confirm 256x256 RGB before inference). ## Troubleshooting 1. **TensorFlow compatibility**: Lock to TensorFlow 2.15.0 or later; reinstall if loader errors mention missing ops. 2. **Image decode errors**: Force `Image.open(...).convert('RGB')` before preprocessing. 3. **Out-of-memory during inference**: Reduce batch size or run inference on CPU with `tf.device('/CPU:0')`. 4. **Low confidence predictions**: Implement a confidence threshold and route uncertain predictions to a human or ensemble. ## Mermaid workflow ```mermaid flowchart LR RawImages[Raw tomato leaf images] --> Preprocess[Preprocessing and augmentation] Preprocess --> ModelTraining[Training (multiple checkpoints)] ModelTraining --> Checkpoints[Leaf Disease/models directory] Checkpoints --> Inference[Load checkpoint and standardize input] Inference --> Output[Prediction + confidence] Output --> Feedback[Optional human-in-loop verification] ``` ## Contact & acknowledgments - **Creator**: Gareth Aurelius Harrison ([GitHub @theonegareth](https://github.com/theonegareth), [Hugging Face @theonegareth](https://huggingface.co/theonegareth)). - **Acknowledgments**: TensorFlow/Keras, PlantVillage dataset curators, the ML and agriculture research communities. - **Contribution guide**: Fork, extend the dataset, retrain, then submit a PR documenting improvements. --- **Last Updated**: November 30, 2025 **Model Version**: 1.0 **Hugging Face Model**: [theonegareth/TomatoDiseaseDetector](https://huggingface.co/theonegareth/TomatoDiseaseDetector)