128 63 51

Kaicheng Yang

Kaichengalex

https://kaichengyang0828.github.io/Kaicheng-Yang0828.github.io/

Kaicheng-Yang0828

AI & ML interests

Multimodal Representation Learning/ Vision-Language Pretraining/DeepResearch

Recent Activity

upvoted a paper 3 days ago

Qwen3-VL Technical Report

upvoted a paper 4 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

upvoted a paper 5 days ago

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

View all activity

Organizations

upvoted a paper 3 days ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published 11 days ago • 106

upvoted a paper 4 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published 5 days ago • 169

upvoted a paper 5 days ago

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

Paper • 2512.01342 • Published 6 days ago • 14

upvoted an article 5 days ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

6 days ago

•

224

upvoted a paper 11 days ago

HunyuanOCR Technical Report

Paper • 2511.19575 • Published 13 days ago • 19

upvoted a paper 19 days ago

Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

Paper • 2511.12609 • Published 21 days ago • 102

upvoted a paper 27 days ago

DeepEyesV2: Toward Agentic Multimodal Model

Paper • 2511.05271 • Published 30 days ago • 42

upvoted a paper about 1 month ago

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

Paper • 2510.13795 • Published Oct 15 • 56

upvoted 2 articles about 1 month ago

Article

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

Nov 5

•

Article

What makes good reasoning data

Oct 30

•

upvoted 4 papers about 1 month ago

Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum

Paper • 2510.27571 • Published Oct 31 • 17

upvoted 4 papers about 2 months ago

ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

Paper • 2510.18795 • Published Oct 21 • 10

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20 • 68

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18 • 48

UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15 • 11

upvoted 2 papers 2 months ago

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Paper • 2509.23661 • Published Sep 28 • 46

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26 • 136

Kaicheng Yang

AI & ML interests

Recent Activity

Organizations

Kaichengalex's activity

Transformers v5: Simple model definitions powering the AI ecosystem

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

What makes good reasoning data