Bhili TTS

A text-to-speech model for Bhili (भीली), specifically the Dehvali Bhili dialect, an Indo-Aryan language spoken by the Bhil community in western India.

This model is a fine-tuned version of ai4bharat/indic-parler-tts, trained on 10 hours of studio-quality Bhili speech data. The model accepts a Bhili sentence (in Devanagari script) and a natural-language voice description, and produces 44.1 kHz speech. Try out the model here!

Quick start

Install

pip install git+https://github.com/huggingface/parler-tts.git
pip install transformers soundfile

Generate speech

import torch
import soundfile as sf
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("ai4bharat/bhili-tts").to(device)
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/bhili-tts")
description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)

prompt = "अजैविक ताण व्यवस्थापन खातुर शेतकरी मल्चिंगु वापर केएते."
description = "A male speaker with a clear, moderate-paced voice. The recording is clean with no background noise."

input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_ids)
audio = generation.cpu().numpy().squeeze()
sf.write("bhili_out.wav", audio, model.config.sampling_rate)

Voice control via descriptions

You can steer the output voice by changing the description. The base Indic-Parler model understands attributes like:

  • Gender: male / female
  • Pace: slow / moderate / fast
  • Pitch: low / moderate / high
  • Expressiveness: monotone / neutral / expressive
  • Recording: clean studio / slight background noise / reverberant

Example descriptions:

# Calm, clear male narrator
"A male speaker with a moderate, calm voice and clean studio recording."

# Energetic female speaker
"A young female speaker with expressive, fast-paced delivery in a clean recording."

# Older male, slow and deliberate
"An older male speaker with a low pitch, speaking slowly and clearly."
Downloads last month
84
Safetensors
Model size
0.9B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ai4bharat/bhili-tts

Finetuned
(3)
this model