daily paper
updated
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
288
Transformer^2: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
•
54
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper
•
2501.09012
•
Published
•
10
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper
•
2501.09747
•
Published
•
28
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
115
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative
Textual Feedback
Paper
•
2501.12895
•
Published
•
61
Sigma: Differential Rescaling of Query, Key and Value for Efficient
Language Models
Paper
•
2501.13629
•
Published
•
48
Can We Generate Images with CoT? Let's Verify and Reinforce Image
Generation Step by Step
Paper
•
2501.13926
•
Published
•
43
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference
Optimization
Paper
•
2502.04306
•
Published
•
20
ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual
Attribution
Paper
•
2502.00989
•
Published
•
8
PILAF: Optimal Human Preference Sampling for Reward Modeling
Paper
•
2502.04270
•
Published
•
12
Paper
•
2502.06786
•
Published
•
32
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and
Generation
Paper
•
2502.05415
•
Published
•
20
Region-Adaptive Sampling for Diffusion Transformers
Paper
•
2502.10389
•
Published
•
53
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper
•
2502.10391
•
Published
•
34
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
Paper
•
2502.09411
•
Published
•
22
AdaptiveStep: Automatically Dividing Reasoning Step through Model
Confidence
Paper
•
2502.13943
•
Published
•
8
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
•
2502.14499
•
Published
•
194
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic
Understanding, Localization, and Dense Features
Paper
•
2502.14786
•
Published
•
157
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper
•
2502.14502
•
Published
•
91
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper
•
2502.14739
•
Published
•
107
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
•
2502.14768
•
Published
•
47
Discovering highly efficient low-weight quantum error-correcting codes
with reinforcement learning
Paper
•
2502.14372
•
Published
•
36
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement
Learning
Paper
•
2502.12853
•
Published
•
29
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for
Multimodal Reasoning Models
Paper
•
2502.16033
•
Published
•
18
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
•
2502.18449
•
Published
•
75
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
•
2508.03680
•
Published
•
133
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
•
2510.14528
•
Published
•
113