leondawn666
's Collections
Multimodality
updated
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and
Future Frontiers
Paper
•
2506.23918
•
Published
•
89
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper
•
2504.16030
•
Published
•
36
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper
•
2505.24867
•
Published
•
80
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
•
2507.01006
•
Published
•
250
Scaling RL to Long Videos
Paper
•
2507.07966
•
Published
•
159
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
•
2508.09736
•
Published
•
57
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
Mathematical Reasoning
Paper
•
2508.10433
•
Published
•
144
Thyme: Think Beyond Images
Paper
•
2508.11630
•
Published
•
81
Paper
•
2508.10104
•
Published
•
291
Paper
•
2508.11737
•
Published
•
111
The Dragon Hatchling: The Missing Link between the Transformer and
Models of the Brain
Paper
•
2509.26507
•
Published
•
542
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Paper
•
2511.15065
•
Published
•
75
Thinking with Video: Video Generation as a Promising Multimodal
Reasoning Paradigm
Paper
•
2511.04570
•
Published
•
211