arxiv:2603.25040
Bin Wang
wanderkid
AI & ML interests
Computer Vision, Multimodal Large Language Model
Recent Activity
authored a paper 2 days ago
TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition authored a paper 2 days ago
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery authored a paper 2 days ago
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding