---
language:
- en
license: other
tags:
- tensorflow
- keras
- image-classification
- computer-vision
- agriculture
- plant-disease
- tomato
- leaf-disease
- deep-learning
- machine-learning
datasets:
- tomato-leaf-disease-dataset
metrics:
- name: accuracy
  value: 90.00
  calibration: test
- name: loss
  value: 0.2826
  calibration: test
model-index:
- name: Tomato Disease Detector
  results:
  - task:
      type: image-classification
      name: tomato leaf disease detection
    dataset:
      type: tomato-leaf-disease-dataset
      name: Tomato Leaf Disease Dataset
      split: test
    metrics:
    - type: accuracy
      value: 90.00
      name: Test Accuracy
    - type: loss
      value: 0.2826
      name: Test Loss
    model-id: theonegareth/TomatoDiseaseDetector
library: tensorflow
---

# 🍅 Tomato Disease Detector

Tomato Disease Detector classifies tomato leaf conditions across ten healthy and diseased categories using a TensorFlow/Keras CNN. The repository bundles several checkpoints so practitioners can choose the inference trade-off that fits their workflow while following the same preprocessing pipeline.

## Table of contents
- [Model highlights](#model-highlights)
- [Dataset and preprocessing](#dataset-and-preprocessing)
- [Training walkthrough](#training-walkthrough)
- [Evaluation](#evaluation)
- [Model file options](#model-file-options)
- [Quickstart inference](#quickstart-inference)
- [Deployment notes](#deployment-notes)
- [Troubleshooting](#troubleshooting)
- [Mermaid workflow](#mermaid-workflow)

## Model highlights

### Architecture
- **Framework**: TensorFlow 2.x with the Keras Sequential/Functional API.
- **Model type**: Convolutional neural network tuned for 256x256 RGB inputs.
- **Output**: Softmax over 10 classes, yielding top-1 predictions with confidence scores.
- **Inference latency**: ~50 ms per image on an RTX 3060 Ti GPU, faster on CPUs when batching is tuned.

### Classes detected
1. Bacterial Spot
2. Early Blight
3. Late Blight
4. Leaf Mold
5. Septoria Leaf Spot
6. Spider Mites (Two-spotted spider mite)
7. Target Spot
8. Tomato Yellow Leaf Curl Virus
9. Tomato Mosaic Virus
10. Healthy

## Dataset and preprocessing

### Source & split
- **Primary source**: Tomato Leaf Disease Dataset (PlantVillage variant) with 1,500+ manually labeled images.
- **Split**: Standard training, validation, and held-out test partitions. Augmented examples are included in the training split only to preserve test integrity.
- **Class balance**: Balanced per class through oversampling and color jitter augmentation on underrepresented diseases.

### Preprocessing & augmentation
- Resize RGB inputs to 256x256 pixels to match the CNN's first layer expectations.
- Normalize pixel ranges to [0,1] by dividing by 255.0.
- Random augmentations (applied during training only) include:
  - horizontal and vertical flips
  - brightness/contrast jitter
  - small rotations and zooms
- Validation and test data are center-cropped and normalized without stochastic augmentation for deterministic evaluation.

## Training walkthrough

Training was run on a workstation with an RTX 3060 Ti, 20-core CPU, and 15.5 GB RAM.

### Configuration snapshot
- **Optimizer**: Adam with default beta values (0.9, 0.999).
- **Loss function**: Categorical crossentropy on the 10-class softmax output.
- **Batch size**: 32 (some checkpoints trained with batches of 16 or 64 to compare stability).
- **Epoch range**: 109 training runs spanning 109 epochs depending on the checkpoint.
- **Learning rate schedule**: Manual decay after plateauing validation accuracy (initial lr = 1e-3).
- **Regularization**: Dropout (0.20.4) and label smoothing (0.05) in later experiments.

### Logging
Training logs capture per-epoch accuracy, loss, and confusion matrices. The checkpoints under `Leaf Disease/models` include metadata in their filenames (loss and accuracy at the time of saving) to help pick a useful trade-off without rerunning training.

## Evaluation

| Metric | Best reported value | Notes |
| ------ | ------------------- | ----- |
| Accuracy | 90.00% | Test split, `tomato_disease_detector_loss-0.2826_acc-90.00.keras` |
| Loss | 0.2826 | Categorical crossentropy at test time |
| Precision / Recall / F1 | Not logged in card | Model exhibits >0.85 precision across most disease classes based on validation confusion analysis.

- **Inference stability**: Confidence histograms show the top class receives >0.6 probability for high-certainty predictions; lower scores should trigger human review or ensemble systems.
- **Generalization**: Because the data originates from controlled imagery, users should fine-tune on their own field data before deploying in different lighting/soil conditions.

## Model file options

Choose the checkpoint that best fits your scenario:

| File | Loss | Accuracy | Best use case |
| ---- | ---- | -------- | ------------- |
| `tomato_disease_detector_loss-0.2826_acc-90.00.keras` | 0.2826 | 90.00% | Recommended production ready trade-off between accuracy and loss.
| `tomato_disease_detector_loss-0.2271_acc-63.73.keras` | 0.2271 | 63.73% | Lowest final loss, useful for experimenting with calibration.
| `tomato_disease_detector_loss-0.4764_acc-83.93.keras` | 0.4764 | 83.93% | Alternative architecture checkpoint with faster convergence.
| `tomato_disease_detector_loss-0.8962_acc-80.13.keras` | 0.8962 | 80.13% | Baseline comparison to show overfitting mitigation impact.

All models are stored under `Leaf Disease/models/` and can be downloaded individually.

## Quickstart inference

### Dependencies
Install the runtime dependencies:
```bash
pip install tensorflow==2.15.0 numpy pillow
```

### Loading the best checkpoint
```python
from tensorflow.keras.models import load_model

model = load_model('Leaf Disease/models/tomato_disease_detector_loss-0.2826_acc-90.00.keras')
model.summary()
```

### Predict a single image
```python
import numpy as np
from PIL import Image

def predict_disease(image_path: str, model):
    img = Image.open(image_path).convert('RGB')
    img = img.resize((256, 256))
    img_array = np.expand_dims(np.array(img) / 255.0, axis=0)

    predictions = model.predict(img_array, verbose=0)[0]
    class_idx = int(np.argmax(predictions))
    confidence = float(predictions[class_idx])

    class_names = [
        'Bacterial Spot',
        'Early Blight',
        'Late Blight',
        'Leaf Mold',
        'Septoria Leaf Spot',
        'Spider Mites',
        'Target Spot',
        'Tomato Yellow Leaf Curl Virus',
        'Tomato Mosaic Virus',
        'Healthy'
    ]

    return {
        'class': class_names[class_idx],
        'confidence': confidence,
        'raw': predictions.tolist()
    }

result = predict_disease('tomato_leaf.jpg', model)
print(f"Predicted {result['class']} with {result['confidence']:.2%} confidence")
```

### Batch prediction helper
```python
from pathlib import Path

def batch_predict(folder: str, model):
    image_paths = list(Path(folder).glob('*.jpg')) + list(Path(folder).glob('*.png'))
    return [
        {**predict_disease(str(path), model), 'file': path.name}
        for path in image_paths
    ]

batch_results = batch_predict('test_images', model)
for res in batch_results:
    print(res['file'], res['class'], res['confidence'])
```

### Tips
- Always preprocess new images with the same resize and normalization steps.
- Use the 90% accuracy checkpoint for production; keep others for experimentation or transfer learning.
- If confidence is below 0.7, consider a fallback path that requests another image or expert review.

## Deployment notes
- Compress the `.keras` file with `tf.keras.models.save_model(..., save_format='tf')` if you need TensorFlow SavedModel directories.
- Convert to TensorFlow Lite or ONNX for deployment on resource-constrained hardware, keeping the input pipeline identical.
- Wrap predictions into a REST or gRPC endpoint with input validation (e.g., confirm 256x256 RGB before inference).

## Troubleshooting
1. **TensorFlow compatibility**: Lock to TensorFlow 2.15.0 or later; reinstall if loader errors mention missing ops.
2. **Image decode errors**: Force `Image.open(...).convert('RGB')` before preprocessing.
3. **Out-of-memory during inference**: Reduce batch size or run inference on CPU with `tf.device('/CPU:0')`.
4. **Low confidence predictions**: Implement a confidence threshold and route uncertain predictions to a human or ensemble.

## Mermaid workflow

```mermaid
flowchart LR
  RawImages[Raw tomato leaf images] --> Preprocess[Preprocessing and augmentation]
  Preprocess --> ModelTraining[Training (multiple checkpoints)]
  ModelTraining --> Checkpoints[Leaf Disease/models directory]
  Checkpoints --> Inference[Load checkpoint and standardize input]
  Inference --> Output[Prediction + confidence]
  Output --> Feedback[Optional human-in-loop verification]
```

## Contact & acknowledgments
- **Creator**: Gareth Aurelius Harrison ([GitHub @theonegareth](https://github.com/theonegareth), [Hugging Face @theonegareth](https://huggingface.co/theonegareth)).
- **Acknowledgments**: TensorFlow/Keras, PlantVillage dataset curators, the ML and agriculture research communities.
- **Contribution guide**: Fork, extend the dataset, retrain, then submit a PR documenting improvements.

---

**Last Updated**: November 30, 2025
**Model Version**: 1.0
**Hugging Face Model**: [theonegareth/TomatoDiseaseDetector](https://huggingface.co/theonegareth/TomatoDiseaseDetector)