-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 178 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 53 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2602.05400
-
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 195 -
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper • 2603.03276 • Published • 104 -
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 353
-
The Trinity of Consistency as a Defining Principle for General World Models
Paper • 2602.23152 • Published • 201 -
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper • 2602.22859 • Published • 151 -
OmniGAIA: Towards Native Omni-Modal AI Agents
Paper • 2602.22897 • Published • 53 -
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
Paper • 2602.22766 • Published • 44
-
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 353 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 227 -
FASA: Frequency-aware Sparse Attention
Paper • 2602.03152 • Published • 154 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 326
-
Qwen/Qwen3.6-35B-A3B
Image-Text-to-Text • 36B • Updated • 5.26M • • 1.78k -
deepseek-ai/DeepSeek-V4-Pro
Text Generation • 862B • Updated • 2.97M • • 3.99k -
moonshotai/Kimi-K2.6
Image-Text-to-Text • 1.1T • Updated • 2.15M • • 1.29k -
openai/privacy-filter
Token Classification • 1B • Updated • 239k • 1.45k
-
GLM-5: from Vibe Coding to Agentic Engineering
Paper • 2602.15763 • Published • 149 -
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
Paper • 2602.07845 • Published • 71 -
LLaDA2.1: Speeding Up Text Diffusion via Token Editing
Paper • 2602.08676 • Published • 72 -
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Paper • 2602.02474 • Published • 63
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 178 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 53 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Qwen/Qwen3.6-35B-A3B
Image-Text-to-Text • 36B • Updated • 5.26M • • 1.78k -
deepseek-ai/DeepSeek-V4-Pro
Text Generation • 862B • Updated • 2.97M • • 3.99k -
moonshotai/Kimi-K2.6
Image-Text-to-Text • 1.1T • Updated • 2.15M • • 1.29k -
openai/privacy-filter
Token Classification • 1B • Updated • 239k • 1.45k
-
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 195 -
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper • 2603.03276 • Published • 104 -
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 353
-
GLM-5: from Vibe Coding to Agentic Engineering
Paper • 2602.15763 • Published • 149 -
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
Paper • 2602.07845 • Published • 71 -
LLaDA2.1: Speeding Up Text Diffusion via Token Editing
Paper • 2602.08676 • Published • 72 -
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Paper • 2602.02474 • Published • 63
-
The Trinity of Consistency as a Defining Principle for General World Models
Paper • 2602.23152 • Published • 201 -
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper • 2602.22859 • Published • 151 -
OmniGAIA: Towards Native Omni-Modal AI Agents
Paper • 2602.22897 • Published • 53 -
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
Paper • 2602.22766 • Published • 44
-
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 353 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 227 -
FASA: Frequency-aware Sparse Attention
Paper • 2602.03152 • Published • 154 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 326