X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again Paper • 2507.22058 • Published Jul 29, 2025 • 39
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels Paper • 2507.21809 • Published Jul 29, 2025 • 136
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20, 2024 • 38
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published May 22, 2025 • 24
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published May 20, 2025 • 133
Modifying Large Language Model Post-Training for Diverse Creative Writing Paper • 2503.17126 • Published Mar 21, 2025 • 36
LLM Reasoning Papers Collection Papers to improve reasoning capabilities of LLMs • 20 items • Updated Jan 15, 2025 • 123
[MASK] is All You Need Collection Code, dataset, and pretrained model • 6 items • Updated Feb 6, 2025 • 9
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge Paper • 2412.00176 • Published Nov 29, 2024 • 9
Artist: Aesthetically Controllable Text-Driven Stylization without Training Paper • 2407.15842 • Published Jul 22, 2024 • 14
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis Paper • 2411.17769 • Published Nov 26, 2024 • 8
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 129
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published Nov 15, 2024 • 34