LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 6 days ago • 127
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published 8 days ago • 48
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published 8 days ago • 151
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 11 days ago • 120
Versatile Editing of Video Content, Actions, and Dynamics without Training Paper • 2603.17989 • Published 16 days ago • 16
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing Paper • 2603.19228 • Published 15 days ago • 67
WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation Paper • 2603.16871 • Published 17 days ago • 60
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 18 days ago • 152
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation Paper • 2603.11647 • Published 22 days ago • 31
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation Paper • 2603.11421 • Published 23 days ago • 34
EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation Paper • 2603.06014 • Published 28 days ago • 9
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 28 days ago • 117
Helios: Real Real-Time Long Video Generation Model Paper • 2603.04379 • Published about 1 month ago • 178
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control Paper • 2602.18422 • Published Feb 20 • 30
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions Paper • 2602.08711 • Published Feb 9 • 28
Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers Paper • 2602.03510 • Published Feb 3 • 27
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published Feb 3 • 64