Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 13 days ago • 141
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 12 days ago • 82
Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking Paper • 2601.02669 • Published 21 days ago • 3
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs Paper • 2601.03559 • Published 20 days ago • 13
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs Paper • 2601.03559 • Published 20 days ago • 13
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published Dec 26, 2025 • 28
view article Article Building the Open Agent Ecosystem Together: Introducing OpenEnv +8 Oct 23, 2025 • 145
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Paper • 2509.19736 • Published Sep 24, 2025 • 12
GTA1 Collection A collection of GUI grounding models trained with GRPO. • 5 items • Updated Oct 31, 2025 • 4
FLEX: Continuous Agent Evolution via Forward Learning from Experience Paper • 2511.06449 • Published Nov 9, 2025 • 14
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction Paper • 2511.07327 • Published Nov 10, 2025 • 78
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published Nov 11, 2025 • 34
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 106
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published Nov 12, 2025 • 18
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models Paper • 2511.09515 • Published Nov 12, 2025 • 19