SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder Paper • 2512.11749 • Published 19 days ago • 36
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement Paper • 2511.23475 • Published Nov 28 • 41
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11 • 48
MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation Paper • 2508.19320 • Published Aug 26 • 29
Saire2023/wav2vec2-base-finetuned-Speaker-Classification Audio Classification • 94.6M • Updated Apr 16, 2024 • 10 • 2
harshit345/xlsr-wav2vec-speech-emotion-recognition Audio Classification • Updated Dec 12, 2021 • 990 • 62
ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition Audio Classification • 0.3B • Updated Oct 24, 2024 • 54k • 237
speechbrain/emotion-recognition-wav2vec2-IEMOCAP Audio Classification • Updated Jul 23, 2024 • 682k • 168