ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published 3 days ago • 124
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 9 days ago • 114
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 10 days ago • 232
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models Paper • 2603.11896 • Published Mar 12 • 10
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning Paper • 2603.02024 • Published Mar 2 • 47
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind Paper • 2601.15715 • Published Jan 22 • 14
Can We Predict Before Executing Machine Learning Agents? Paper • 2601.05930 • Published Jan 9 • 28
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published Dec 22, 2025 • 66
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 159
MiMo-Embodied: X-Embodied Foundation Model Technical Report Paper • 2511.16518 • Published Nov 20, 2025 • 26
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 107
RoboOmni: Proactive Robot Manipulation in Omni-modal Context Paper • 2510.23763 • Published Oct 27, 2025 • 61
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published Oct 27, 2025 • 56
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published Oct 27, 2025 • 28
DeepPrune: Parallel Scaling without Inter-trace Redundancy Paper • 2510.08483 • Published Oct 9, 2025 • 24
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Paper • 2506.05523 • Published Jun 5, 2025 • 34
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos Paper • 2506.04141 • Published Jun 4, 2025 • 30