marcosremar2/gemini-dataset-erinome
Viewer • Updated • 10k • 44
How to use marcosremar2/iaratts-sft-v4 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-to-speech", model="marcosremar2/iaratts-sft-v4", trust_remote_code=True) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("marcosremar2/iaratts-sft-v4", trust_remote_code=True, dtype="auto")| Model | WER | Δ vs v3 |
|---|---|---|
| SFT v3 (10 epochs) | 0.1928 | baseline |
| SFT v4 (this) | 0.1646 | −14.6% relative |
<laugh>, <sigh>, <yawn>, <gasp>, <groan>, <chuckle>, <cough>, <sniffle>from transformers import AutoModel, AutoTokenizer
m = AutoModel.from_pretrained("marcosremar2/iaratts-sft-v4", trust_remote_code=True)
t = AutoTokenizer.from_pretrained("marcosremar2/iaratts-sft-v4", trust_remote_code=True)
# Tags as single tokens:
ids = t.encode("<laugh> Que dia! Estou exausto.", add_special_tokens=False)
CLI via MOSS infer.py:
python infer.py --checkpoint marcosremar2/iaratts-sft-v4 \
--audio-tokenizer-pretrained-name-or-path OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano \
--text "<sigh> Encontrei um erro no código." \
--output-audio-path out.wav --mode continuation --seed 42
Phase 4 wave complete. Next: Phase 4.4 (IndexTTS2 instruction LM) or Phase 5.5 (CosyVoice-2 distill to 150M streaming).
License: MIT (same as upstream MOSS-TTS-Nano).
Base model
OpenMOSS-Team/MOSS-TTS-Nano-100M