Samid drone detector
Audio Spectrogram Transformer fine-tuned for binary acoustic drone detection.
Quick use
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
import torch, soundfile as sf
model = AutoModelForAudioClassification.from_pretrained("Rashidbm/samid-drone-detector")
fe = AutoFeatureExtractor.from_pretrained("Rashidbm/samid-drone-detector")
audio, sr = sf.read("clip.wav")
inputs = fe(audio, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
print(f"p(drone) = {torch.softmax(logits, dim=-1)[0, 1].item():.4f}")
Training
- Backbone:
MIT/ast-finetuned-audioset-10-10-0.4593 - Datasets: geronimobasso/drone-audio-detection-samples, ahlab-drone-project/DroneAudioSet (splits 1-20)
- Augmentations applied symmetrically across both classes: codec round-trip, synthetic RIR, random EQ, FilterAugment, Patchout, SpecAugment, Mixup. Asymmetric: urban-noise overlay onto drone class.
Performance
| Test | Result |
|---|---|
| NUS DroneAudioSet held-out (48 clips) | 100% detection |
| Geronimobasso sanity (50 random clips) | 24/25 drone, 25/25 no-drone |
Recommended inference
For long audio, slide a 1-second window with 0.5s hop, apply a median
filter to per-window probabilities, and require N consecutive windows
above threshold. See scripts/standalone_inference.py in the repo.
Trained on 1.0s windows at 16 kHz mono. May fire on rotor-like sounds.
- Downloads last month
- 323