From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Paranioar
AI & ML interests
Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model
Recent Activity
upvoted
a
paper
5 days ago
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
upvoted
a
paper
5 days ago
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
upvoted
a
paper
6 days ago
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling