🍅 Tomato Disease Detector

Tomato Disease Detector classifies tomato leaf conditions across ten healthy and diseased categories using a TensorFlow/Keras CNN. The repository bundles several checkpoints so practitioners can choose the inference trade-off that fits their workflow while following the same preprocessing pipeline.

Model highlights
Dataset and preprocessing
Training walkthrough
Evaluation
Model file options
Quickstart inference
Deployment notes
Troubleshooting
Mermaid workflow

Model highlights

Architecture

Framework: TensorFlow 2.x with the Keras Sequential/Functional API.
Model type: Convolutional neural network tuned for 256x256 RGB inputs.
Output: Softmax over 10 classes, yielding top-1 predictions with confidence scores.
Inference latency: ~50 ms per image on an RTX 3060 Ti GPU, faster on CPUs when batching is tuned.

Classes detected

Bacterial Spot
Early Blight
Late Blight
Leaf Mold
Septoria Leaf Spot
Spider Mites (Two-spotted spider mite)
Target Spot
Tomato Yellow Leaf Curl Virus
Tomato Mosaic Virus
Healthy

Dataset and preprocessing

Source & split

Primary source: Tomato Leaf Disease Dataset (PlantVillage variant) with 1,500+ manually labeled images.
Split: Standard training, validation, and held-out test partitions. Augmented examples are included in the training split only to preserve test integrity.
Class balance: Balanced per class through oversampling and color jitter augmentation on underrepresented diseases.

Preprocessing & augmentation

Resize RGB inputs to 256x256 pixels to match the CNN's first layer expectations.
Normalize pixel ranges to [0,1] by dividing by 255.0.
Random augmentations (applied during training only) include:
- horizontal and vertical flips
- brightness/contrast jitter
- small rotations and zooms
Validation and test data are center-cropped and normalized without stochastic augmentation for deterministic evaluation.

Training walkthrough

Training was run on a workstation with an RTX 3060 Ti, 20-core CPU, and 15.5 GB RAM.

Configuration snapshot

Optimizer: Adam with default beta values (0.9, 0.999).
Loss function: Categorical crossentropy on the 10-class softmax output.
Batch size: 32 (some checkpoints trained with batches of 16 or 64 to compare stability).
Epoch range: 109 training runs spanning 109 epochs depending on the checkpoint.
Learning rate schedule: Manual decay after plateauing validation accuracy (initial lr = 1e-3).
Regularization: Dropout (0.20.4) and label smoothing (0.05) in later experiments.

Logging

Training logs capture per-epoch accuracy, loss, and confusion matrices. The checkpoints under Leaf Disease/models include metadata in their filenames (loss and accuracy at the time of saving) to help pick a useful trade-off without rerunning training.

Evaluation

Metric	Best reported value	Notes
Accuracy	90.00%	Test split, `tomato_disease_detector_loss-0.2826_acc-90.00.keras`
Loss	0.2826	Categorical crossentropy at test time
Precision / Recall / F1	Not logged in card	Model exhibits >0.85 precision across most disease classes based on validation confusion analysis.

Inference stability: Confidence histograms show the top class receives >0.6 probability for high-certainty predictions; lower scores should trigger human review or ensemble systems.
Generalization: Because the data originates from controlled imagery, users should fine-tune on their own field data before deploying in different lighting/soil conditions.

Model file options

Choose the checkpoint that best fits your scenario:

File	Loss	Accuracy	Best use case
`tomato_disease_detector_loss-0.2826_acc-90.00.keras`	0.2826	90.00%	Recommended production ready trade-off between accuracy and loss.
`tomato_disease_detector_loss-0.2271_acc-63.73.keras`	0.2271	63.73%	Lowest final loss, useful for experimenting with calibration.
`tomato_disease_detector_loss-0.4764_acc-83.93.keras`	0.4764	83.93%	Alternative architecture checkpoint with faster convergence.
`tomato_disease_detector_loss-0.8962_acc-80.13.keras`	0.8962	80.13%	Baseline comparison to show overfitting mitigation impact.

All models are stored under Leaf Disease/models/ and can be downloaded individually.

Quickstart inference

Dependencies

Install the runtime dependencies:

pip install tensorflow==2.15.0 numpy pillow

Loading the best checkpoint

from tensorflow.keras.models import load_model

model = load_model('Leaf Disease/models/tomato_disease_detector_loss-0.2826_acc-90.00.keras')
model.summary()

Predict a single image

import numpy as np
from PIL import Image

def predict_disease(image_path: str, model):
    img = Image.open(image_path).convert('RGB')
    img = img.resize((256, 256))
    img_array = np.expand_dims(np.array(img) / 255.0, axis=0)

    predictions = model.predict(img_array, verbose=0)[0]
    class_idx = int(np.argmax(predictions))
    confidence = float(predictions[class_idx])

    class_names = [
        'Bacterial Spot',
        'Early Blight',
        'Late Blight',
        'Leaf Mold',
        'Septoria Leaf Spot',
        'Spider Mites',
        'Target Spot',
        'Tomato Yellow Leaf Curl Virus',
        'Tomato Mosaic Virus',
        'Healthy'
    ]

    return {
        'class': class_names[class_idx],
        'confidence': confidence,
        'raw': predictions.tolist()
    }

result = predict_disease('tomato_leaf.jpg', model)
print(f"Predicted {result['class']} with {result['confidence']:.2%} confidence")

Batch prediction helper

from pathlib import Path

def batch_predict(folder: str, model):
    image_paths = list(Path(folder).glob('*.jpg')) + list(Path(folder).glob('*.png'))
    return [
        {**predict_disease(str(path), model), 'file': path.name}
        for path in image_paths
    ]

batch_results = batch_predict('test_images', model)
for res in batch_results:
    print(res['file'], res['class'], res['confidence'])

Tips

Always preprocess new images with the same resize and normalization steps.
Use the 90% accuracy checkpoint for production; keep others for experimentation or transfer learning.
If confidence is below 0.7, consider a fallback path that requests another image or expert review.

Deployment notes

Compress the .keras file with tf.keras.models.save_model(..., save_format='tf') if you need TensorFlow SavedModel directories.
Convert to TensorFlow Lite or ONNX for deployment on resource-constrained hardware, keeping the input pipeline identical.
Wrap predictions into a REST or gRPC endpoint with input validation (e.g., confirm 256x256 RGB before inference).

Troubleshooting

TensorFlow compatibility: Lock to TensorFlow 2.15.0 or later; reinstall if loader errors mention missing ops.
Image decode errors: Force Image.open(...).convert('RGB') before preprocessing.
Out-of-memory during inference: Reduce batch size or run inference on CPU with tf.device('/CPU:0').
Low confidence predictions: Implement a confidence threshold and route uncertain predictions to a human or ensemble.

Mermaid workflow

flowchart LR
  RawImages[Raw tomato leaf images] --> Preprocess[Preprocessing and augmentation]
  Preprocess --> ModelTraining[Training (multiple checkpoints)]
  ModelTraining --> Checkpoints[Leaf Disease/models directory]
  Checkpoints --> Inference[Load checkpoint and standardize input]
  Inference --> Output[Prediction + confidence]
  Output --> Feedback[Optional human-in-loop verification]

Contact & acknowledgments

Creator: Gareth Aurelius Harrison (GitHub @theonegareth, Hugging Face @theonegareth).
Acknowledgments: TensorFlow/Keras, PlantVillage dataset curators, the ML and agriculture research communities.
Contribution guide: Fork, extend the dataset, retrain, then submit a PR documenting improvements.

Last Updated: November 30, 2025 Model Version: 1.0 Hugging Face Model: theonegareth/TomatoDiseaseDetector

Downloads last month: 141

Evaluation results

Test Accuracy on Tomato Leaf Disease Dataset
test set self-reported

90.000
Test Loss on Tomato Leaf Disease Dataset
test set self-reported

0.283

theonegareth
/

TomatoDiseaseDetector