Speech Truncation Detection (12M)
Frame-wise speech truncation detector for identifying whether audio ends in a likely cutoff.
The model evaluates the tail of each input waveform and produces per-frame truncation probabilities. A final clip-level score is computed from the final portion of the sequence, then thresholded to return a boolean is_truncated prediction.
Intended Use
This model is intended for speech recordings where you want to decide whether the spoken audio appears to end abruptly (truncated) versus ending naturally.
Use this for speech-oriented pipelines such as dataset quality checks, ingestion validation, or gating before downstream ASR/transcript correction steps.
This model is not intended for non-speech audio classification tasks (music, environmental sounds, general acoustic event detection).
Quick Start
import numpy as np
from transformers import AutoModel
repo_id = "mythicinfinity/speech-truncation-detection-12M"
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
model.eval()
sr = 16000
wav = np.zeros(sr * 2, dtype=np.float32)
out = model.predict_from_audio(audio=wav, sampling_rate=sr)
is_truncated = bool(out.is_truncated[0].item())
confidence = float(out.confidence_score[0].item())
print(is_truncated, confidence)
Usage Examples
Single waveform
import numpy as np
from transformers import AutoModel
model = AutoModel.from_pretrained("mythicinfinity/speech-truncation-detection-12M", trust_remote_code=True)
waveform = np.random.randn(32000).astype('float32') # ~2s at 16kHz
out = model.predict_from_audio(audio=waveform, sampling_rate=16000)
print(float(out.confidence_score[0]), bool(out.is_truncated[0]))
Batched tensor of waveforms ([batch, time])
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("mythicinfinity/speech-truncation-detection-12M", trust_remote_code=True)
audio_batch = torch.randn(4, 32000) # 4 clips, each ~2s at 16kHz
out = model.predict_from_audio(audio=audio_batch, sampling_rate=16000)
print(out.confidence_score.shape) # torch.Size([4])
print(out.is_truncated.shape) # torch.Size([4])
List of waveforms (variable lengths) with per-item sample rates
import numpy as np
from transformers import AutoModel
model = AutoModel.from_pretrained("mythicinfinity/speech-truncation-detection-12M", trust_remote_code=True)
audio_list = [
np.random.randn(24000).astype('float32'), # 1.5s @ 16kHz
np.random.randn(8000).astype('float32'), # 1.0s @ 8kHz
]
sample_rates = [16000, 8000]
out = model.predict_from_audio(audio=audio_list, sampling_rate=sample_rates)
print(out.confidence_score)
print(out.is_truncated)
Accepted input forms: single tensor/ndarray waveform, batched tensor ([B, T]), or list/tuple of waveforms.
Outputs
truncation_probabilities: per-frame probability of truncation ([batch, time])confidence_score: clip-level truncation score ([batch])is_truncated: clip-level boolean decision ([batch])
Non-Remote-Code Fallback
If you prefer not to use trust_remote_code=True, download the artifact files (config.json, model.safetensors, configuration_*.py, modeling_*.py, processing_*.py) into a local directory and load from those local files:
import json
import sys
from pathlib import Path
from safetensors.torch import load_file
artifact_dir = Path('/path/to/downloaded-artifact')
sys.path.insert(0, str(artifact_dir))
from configuration_speech_truncation_detection import SpeechTruncationDetectionConfig
from modeling_speech_truncation_detection import SpeechTruncationDetectionModel
cfg_payload = json.loads((artifact_dir / 'config.json').read_text(encoding='utf-8'))
cfg = SpeechTruncationDetectionConfig(**cfg_payload)
model = SpeechTruncationDetectionModel(cfg)
state = load_file(str(artifact_dir / 'model.safetensors'), device='cpu')
model.load_state_dict(state, strict=True)
model.eval()
- Downloads last month
- 83