Instructions to use AMAImedia/VoxCPM2-KZ-Darwin-NOESIS-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- VoxCPM
How to use AMAImedia/VoxCPM2-KZ-Darwin-NOESIS-BF16 with VoxCPM:
import soundfile as sf from voxcpm import VoxCPM model = VoxCPM.from_pretrained("AMAImedia/VoxCPM2-KZ-Darwin-NOESIS-BF16") wav = model.generate( text="VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech.", prompt_wav_path=None, # optional: path to a prompt speech for voice cloning prompt_text=None, # optional: reference text cfg_value=2.0, # LM guidance on LocDiT, higher for better adherence to the prompt, but maybe worse inference_timesteps=10, # LocDiT inference timesteps, higher for better result, lower for fast speed normalize=True, # enable external TN tool denoise=True, # enable external Denoise tool retry_badcase=True, # enable retrying mode for some bad cases (unstoppable) retry_badcase_max_times=3, # maximum retrying times retry_badcase_ratio_threshold=6.0, # maximum length restriction for bad case detection (simple but effective), it could be adjusted for slow pace speech ) sf.write("output.wav", wav, 16000) print("saved: output.wav") - Notebooks
- Google Colab
- Kaggle
VoxCPM2-KZ-Darwin-NOESIS-BF16
VoxCPM2 base TTS model with baked-in Kazakh LoRA weights (voxcpm_kaz_lora), producing a ready-to-deploy Kazakh TTS without runtime adapter loading.
Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).
- Founder: Ilia Bolotnikov
- Organization: AMAImedia.com
- X (Twitter): @AMAImediacom
- LinkedIn: Ilia Bolotnikov
- Telegram: @djbionicl
- NOESIS version: v14.7
- Release date: 2026-04
⚠️ License notice
This model is a derivative of sozkz/VoxCPM2, released under the Apache License 2.0. The LoRA-fused derivative is distributed under the same license.
By downloading or using this model you agree to the Apache 2.0 license terms —
see the LICENSE file in this repository for the full text.
Model summary
| Property | Value |
|---|---|
| Base model | sozkz/VoxCPM2 |
| Architecture | VoxCPM2 (LM + Encoder + DiT + AudioVAE) |
| LM backbone | 28 layers, hidden=2048, GQA (16/2 heads), LongRoPE, vocab=73 448 |
| Encoder | 12 layers, hidden=1024, 16 heads |
| DiT (diffusion) | 12 layers, hidden=1024, CFM euler solver |
| Audio VAE | 16 kHz input → 48 kHz output |
| Format | BF16 safetensors |
| Merge method | LoRA fusion (W = W_base + lora_B @ lora_A) |
| LoRA rank | 32, scale=1.0 |
| Fused layers | 160 (self_attn q/k/v/o_proj across all LM layers) |
| Primary language | Kazakh (KK) |
| Secondary language | Russian (RU) |
LoRA fusion details
W_merged = W_base + (lora_B @ lora_A) * scale
scale = lora_alpha / lora_rank = 32 / 32 = 1.0
Fusing the LoRA eliminates adapter loading at inference time, reducing memory overhead and simplifying deployment.
Source models
| Model | Role |
|---|---|
sozkz/VoxCPM2 |
Base multilingual TTS |
voxcpm_kaz_lora |
Kazakh language adapter (r=32) |
NOESIS context
In NOESIS this model serves as the Kazakh TTS teacher for knowledge distillation into TTS-10B specialist. Applied domain boost: KK×10 in soft-label weighting during KD.
| NOESIS Stage | Role |
|---|---|
| Phase 1 → Stage 11 (TTS) | KK TTS teacher (KD stream A, text-side) |
| KD data generation | TTS teacher: w=0.25 (KK domain) |
Provenance
Full merge trace including fused layer count in merge_provenance.json.
Acknowledgements & citation
Base model: VoxCPM2 by sozkz.
@misc{noesis_voxcpm2_kaz_darwin,
title = {VoxCPM2-KZ-Darwin-NOESIS-BF16},
author = {Bolotnikov, Ilia},
year = {2026},
publisher = {AMAImedia},
url = {https://amaimedia.com}
}
@misc{noesis_v14,
title = {NOESIS v14.7: DHCF-FNO Multilingual Dubbing Platform},
author = {Bolotnikov, Ilia},
year = {2026},
publisher = {AMAImedia},
url = {https://amaimedia.com}
}
- Downloads last month
- 11