OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 14 days ago • 70
TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification Paper • 2604.14531 • Published 11 days ago • 7
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 12 days ago • 114
MOSS-Audio Collection An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 5 items • Updated 7 days ago • 43
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 12 days ago • 153
Geometric Context Transformer for Streaming 3D Reconstruction Paper • 2604.14141 • Published 12 days ago • 18
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published 14 days ago • 141
ERNIE-Image Collection The serieas of image generation models, including text2img、img2img. • 2 items • Updated 13 days ago • 23
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 18 days ago • 260
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 19 days ago • 187
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Paper • 2505.22019 • Published May 28, 2025 • 12