GR00T N1: An Open Foundation Model for Generalist Humanoid Robots Paper • 2503.14734 • Published Mar 18, 2025 • 5
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2, 2025 • 147
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19, 2025 • 88
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper • 2507.18553 • Published Jul 24, 2025 • 40
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25, 2025 • 31
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving Paper • 2507.17596 • Published Jul 23, 2025 • 6
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement Paper • 2507.18742 • Published Jul 24, 2025 • 5
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI Paper • 2507.10510 • Published Jul 14, 2025 • 4
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Paper • 2507.19457 • Published Jul 25, 2025 • 28
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report Paper • 2507.16534 • Published Jul 22, 2025 • 7
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17, 2025 • 259
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1, 2025 • 248
GUI-G^2: Gaussian Reward Modeling for GUI Grounding Paper • 2507.15846 • Published Jul 21, 2025 • 133
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22, 2025 • 122
T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper • 2507.05964 • Published Jul 8, 2025 • 119
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19, 2025 • 134
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 14
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Paper • 2506.11928 • Published Jun 13, 2025 • 24
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents Paper • 2505.22954 • Published May 29, 2025 • 14
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis Paper • 2505.11581 • Published May 16, 2025 • 3
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12, 2024 • 127
Gorilla: Large Language Model Connected with Massive APIs Paper • 2305.15334 • Published May 24, 2023 • 5
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace Paper • 2303.17580 • Published Mar 30, 2023 • 15
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework Paper • 2308.08155 • Published Aug 16, 2023 • 10
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11, 2025 • 34
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7, 2025 • 106
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6, 2025 • 188
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published Apr 3, 2025 • 57
BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues Paper • 2501.10836 • Published Jan 18, 2025 • 1
DynaSaur: Large Language Agents Beyond Predefined Actions Paper • 2411.01747 • Published Nov 4, 2024 • 37
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents Paper • 2401.00812 • Published Jan 1, 2024 • 11
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents Paper • 2510.24702 • Published Oct 28, 2025 • 28
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22, 2025 • 12
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs Paper • 2508.10029 • Published Aug 8, 2025
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs Paper • 2508.10031 • Published Aug 9, 2025
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs Paper • 2508.20333 • Published Aug 28, 2025
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models Paper • 2509.17938 • Published Sep 22, 2025 • 4
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness Paper • 2509.14297 • Published Sep 17, 2025
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 501
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Paper • 2412.21199 • Published Dec 30, 2024 • 13
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization Paper • 2510.24592 • Published Oct 28, 2025 • 16
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 75
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance Paper • 2506.03828 • Published Jun 4, 2025 • 15
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 17 days ago • 82