Native Active Perception as Reasoning for Omni-Modal Understanding Paper • 2606.19341 • Published 7 days ago • 17
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining Paper • 2606.17200 • Published 9 days ago • 49
MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold Paper • 2606.13376 • Published 13 days ago • 14
World Model Self-Distillation: Training World Models to Solve General Tasks Paper • 2606.12072 • Published 14 days ago • 14
OpenHA Collection A Series of Open-Source Hierarchical Agentic Models & Datasets in Minecraft • 10 items • Updated Sep 21, 2025 • 3
Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language Paper • 2604.19667 • Published Apr 21 • 23
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch Paper • 2512.02395 • Published Dec 2, 2025 • 52
view article Article Preference Optimization for Vision Language Models +2 qgallouedec, vwxyzjn, merve, kashif • Jul 10, 2024 • 93
Remote Sensing Referring Expression Understanding Collection REU task for RS. • 5 items • Updated Oct 2, 2025 • 1
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Paper • 2503.06520 • Published Mar 9, 2025 • 11
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge NormalUhr • Feb 7, 2025 • 295
view article Article Fine tuning CLIP with Remote Sensing (Satellite) images and captions +4 arampacha, devv, goutham794, cataluna84, ritog, sujitpal • Oct 13, 2021 • 8