SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder Paper • 2512.11749 • Published 19 days ago • 36
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement Paper • 2511.23475 • Published Nov 28 • 41
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11 • 48
MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation Paper • 2508.19320 • Published Aug 26 • 29