Fine-tuned VGG-16 Model for Gunshot Detection
This is a fine-tuned VGG-16 model for detecting gunshots in audio recordings. The model was trained on a dataset of audio clips labeled as either "gunshot" or "background".
Model Details
- Trained by: Ranabir Saha
- Fine-tuned on: Tropical forest gunshot classification training audio dataset from Automated detection of gunshots in tropical forests using convolutional neural networks (Katsis et al. 2022)
- Dataset Source: https://doi.org/10.17632/x48cwz364j.3
- Input: Preprocessed mel-spectrograms (224x224x3) loaded from
.npyfiles, generated from 4-second audio clips - Output: Binary classification (Gunshot/Background)
Training
The model was trained using the following parameters:
- Base Model: VGG-16 pre-trained on ImageNet
- Optimizer: Adam (initial learning rate=0.0001, fine-tuning learning rate=1e-5)
- Loss Function: Categorical cross-entropy
- Metrics: Accuracy, Precision, Recall
- Batch Size: 32
- Initial Training: Up to 25 epochs with early stopping (patience=5) on validation loss
- Fine-tuning: Last 8 layers unfrozen, up to 10 epochs with early stopping (patience=5)
- Class Weights: Balanced to handle class imbalance
Usage
To use this model for inference, you can load it from the Hugging Face Hub and pass preprocessed mel-spectrograms as input.
Example
import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download
# Download the model
model_path = hf_hub_download(repo_id="ranvir-not-found/vgg16-sda_gunshot-detection", filename="vgg16_model.keras")
model = tf.keras.models.load_model(model_path)
# Function to load and preprocess .npy file
def load_and_preprocess_npy(file_path):
mel_spectrogram = np.load(file_path)
# Normalization
spec_min = np.min(mel_spectrogram)
spec_max = np.max(mel_spectrogram)
if spec_max > spec_min:
mel_spectrogram = 255 * (mel_spectrogram - spec_min) / (spec_max - spec_min)
else:
mel_spectrogram = np.zeros_like(mel_spectrogram)
mel_spectrogram = mel_spectrogram.astype(np.float32)
# Resize to 224x224
mel = tf.image.resize(mel_spectrogram[..., np.newaxis], (224, 224))
# Repeat to create 3 channels
mel = tf.repeat(mel, 3, axis=-1)
# Apply VGG-16 preprocessing
mel = tf.keras.applications.vgg16.preprocess_input(mel)
return mel
# Example usage
npy_path = "path/to/your/spectrogram.npy"
input_data = load_and_preprocess_npy(npy_path)
input_data = tf.expand_dims(input_data, axis=0) # Add batch dimension
predictions = model.predict(input_data)
class_names = ['gunshot', 'background']
predicted_class = class_names[np.argmax(predictions[0])]
print(f"Predicted class: {predicted_class}, Probabilities: {predictions[0]}")
Evaluation
The model was evaluated on a validation set, and the following metrics were computed:
- Confusion Matrix
- ROC Curve
- Precision-Recall Curve
- Classification Report
The evaluation results are saved in the
evaluation_resultsdirectory. The model was optimized for recall on the 'gunshot' class.
For more details, please refer to the training script and logs.