UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

This repository contains UniVoice, a unified Large Language Model (LLM) framework that seamlessly integrates Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) within a single model through continuous representations. It combines autoregressive modeling for speech recognition with flow matching for high-quality generation, and enables high-fidelity zero-shot voice cloning.

Paper: UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Authors: Wenhao Guan, Zhikang Niu, Ziyue Jiang, Kaidi Wang, Peijie Chen, Qingyang Hong, Lin Li, Xie Chen
Project Page: https://univoice-demo.github.io/UniVoice
Code: https://github.com/gwh22/UniVoice

Quick Start

Installation

On the basis of Python >= 3.10 environment, install the necessary dependencies by running the following command:

git clone https://github.com/gwh22/UniVoice
cd UniVoice
# We recommend using conda to create a new environment.
conda create -n UniVoice python=3.10
conda activate UniVoice
# install cuda >= 11.8
conda install cudatoolkit=11.8 -c nvidia

pip install -r requirements.txt

Inference

cd UniVoice
# for ASR task
sh scripts/infer_asr.sh
# for TTS task
sh scripts/infer_tts.sh

Training

cd UniVoice
sh scripts/train_all.sh

Citation

Our code is released under MIT License. If our work and codebase is useful for you, please cite as:

@article{guan2025univoice,
  title={UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models},
  author={Guan, Wenhao and Niu, Zhikang and Jiang, Ziyue and Wang, Kaidi and Chen, Peijie and Hong, Qingyang and Li, Lin and Chen, Xie},
  journal={arXiv preprint arXiv:2510.04593},
  year={2025}
}

Acknowledgments

This codebase borrows from DiT, SmolLM2-360M, F5-TTS, Monoformer, LLaVA, and Transformers. Thanks for their great works.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for guanwenhao/univoice-all

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Paper • 2510.04593 • Published Oct 6, 2025