YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

MuseTalk V1.5 Installer

This repository contains an automated installation script for MuseTalk V1.5, a real-time lip-sync model from Tencent.

Features

Automated setup: Installs Miniconda with Python 3.10 environment
CUDA 12.x support: PyTorch 2.1.0 with CUDA 12.1
All dependencies: OpenMMLab stack (mmcv, mmdet, mmpose)
Model downloads: DWPose, SD-VAE, Whisper models
Version compatibility: Tested with compatible library versions

Tested Environment

GPU: NVIDIA RTX 4090 (24GB VRAM)
CUDA: 12.x
OS: Ubuntu 22.04
Python: 3.10 (via Miniconda)

Quick Start

# Download the installer
wget https://huggingface.co/dumont/musetalk-2024-12-24/raw/main/install_musetalk.sh

# Make it executable
chmod +x install_musetalk.sh

# Run as root (required for system packages)
sudo ./install_musetalk.sh

What the script installs

System packages: wget, git, ffmpeg, espeak-ng
Miniconda: Python 3.10 environment
PyTorch: 2.1.0 with CUDA 12.1
OpenMMLab: mmengine 0.10.4, mmcv 2.1.0, mmdet 3.3.0, mmpose 1.3.1
Compatible versions:
- numpy==1.26.4
- transformers==4.42.0
- diffusers==0.28.0
- huggingface_hub==0.23.2
- opencv-python==4.8.0.74
Models:
- DWPose (dw-ll_ucoco_384.pth)
- SD-VAE-FT-MSE
- OpenAI Whisper Tiny

Usage

After installation:

# Activate environment
source /opt/miniconda/bin/activate musetalk

# Go to MuseTalk directory
cd /workspace/MuseTalk

# Create a config file
cat > configs/inference/my_test.yaml << EOF
task_0:
  video_path: "data/video/sun.mp4"
  audio_path: "path/to/your_audio.wav"
EOF

# Run inference (V1.5)
PYTHONPATH=. python3 scripts/inference.py --version v15 --inference_config configs/inference/my_test.yaml

# Run inference (V1.0 original)
PYTHONPATH=. python3 scripts/inference.py --version v1 --inference_config configs/inference/my_test.yaml

Output videos are saved in ./results/v15/ or ./results/v1/

Notes

The MuseTalk model weights (~3.4GB) are downloaded automatically via git-lfs from the official repository
For custom avatars, prepare a video file with a clear face view
Audio should be WAV format (mono recommended)

License

This installer script is provided as-is. MuseTalk is licensed under the original TMElyralab license.

Credits

MuseTalk by TMElyralab/Tencent
Installer script by Dumont AI Team

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support