World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning Paper • 2606.03603 • Published about 1 month ago • 29
DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes Paper • 2605.28421 • Published May 27 • 48
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published May 19 • 190
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 274
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published May 7 • 237
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published May 6 • 106
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers Paper • 2604.02648 • Published Apr 3 • 48