Ai-general
updated
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper
• 2512.02472
• Published • 55
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published • 148
Video Reasoning without Training
Paper
• 2510.17045
• Published • 8
Agent Learning via Early Experience
Paper
• 2510.08558
• Published • 276
RLP: Reinforcement as a Pretraining Objective
Paper
• 2510.01265
• Published • 45
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
• 2510.00938
• Published • 60
LiveTradeBench: Seeking Real-World Alpha with Large Language Models
Paper
• 2511.03628
• Published • 13
PromptBridge: Cross-Model Prompt Transfer for Large Language Models
Paper
• 2512.01420
• Published • 11
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
• 2510.09577
• Published • 8
Diversity Has Always Been There in Your Visual Autoregressive Models
Paper
• 2511.17074
• Published • 8
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
Paper
• 2511.13254
• Published • 139
Search Self-play: Pushing the Frontier of Agent Capability without
Supervision
Paper
• 2510.18821
• Published • 19
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning
Paper
• 2510.03259
• Published • 57
Every Attention Matters: An Efficient Hybrid Architecture for
Long-Context Reasoning
Paper
• 2510.19338
• Published • 117
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper
• 2511.16043
• Published • 110
Reactive Transformer (RxT) -- Stateful Real-Time Processing for
Event-Driven Reactive Language Models
Paper
• 2510.03561
• Published • 25
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making
through Multi-Turn Reinforcement Learning
Paper
• 2509.08755
• Published • 57
gpt-oss-120b & gpt-oss-20b Model Card
Paper
• 2508.10925
• Published • 17
Paper
• 2412.16720
• Published • 37
Self-Improving VLM Judges Without Human Annotations
Paper
• 2512.05145
• Published • 20
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique
Paper
• 2511.09067
• Published • 2
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
Paper
• 2510.23038
• Published • 1
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
Paper
• 2511.06805
• Published • 13
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation
Paper
• 2511.15958
• Published • 1
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
Paper
• 2511.19899
• Published
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Paper
• 2512.05150
• Published • 76
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
Paper
• 2512.03000
• Published • 37
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
Paper
• 2512.04926
• Published • 42
Voxify3D: Pixel Art Meets Volumetric Rendering
Paper
• 2512.07834
• Published • 45
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
Paper
• 2512.07461
• Published • 79
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper
• 2512.13586
• Published • 93
RePo: Language Models with Context Re-Positioning
Paper
• 2512.14391
• Published • 12
Universal Reasoning Model
Paper
• 2512.14693
• Published • 44
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published • 121
Next-Embedding Prediction Makes Strong Vision Learners
Paper
• 2512.16922
• Published • 89
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper
• 2512.17351
• Published • 28
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Paper
• 2512.14052
• Published • 42
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper
• 2512.19535
• Published • 12
SemanticGen: Video Generation in Semantic Space
Paper
• 2512.20619
• Published • 94
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Paper
• 2512.20618
• Published • 56
Multi-hop Reasoning via Early Knowledge Alignment
Paper
• 2512.20144
• Published • 7
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Paper
• 2512.21004
• Published • 13
TimeBill: Time-Budgeted Inference for Large Language Models
Paper
• 2512.21859
• Published • 25
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
Paper
• 2512.22322
• Published • 39
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
• 2512.24618
• Published • 153
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
Paper
• 2601.03193
• Published • 49
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models
Paper
• 2601.01321
• Published • 20
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper
• 2601.16206
• Published • 86
Learning to Discover at Test Time
Paper
• 2601.16175
• Published • 44
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs
Paper
• 2601.17058
• Published • 190