Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.18265

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 303
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

Paper • 2509.18905 • Published Sep 23 • 29

List of VLM papers

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Paper • 2504.15271 • Published Apr 21 • 67

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 140
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 158
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6 • 129

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1 • 117
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25 • 103
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31 • 84
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 116
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 158
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19 • 118
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 263
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285

InternVL3.5-Core

This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL).

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
OpenGVLab/InternVL3_5-241B-A28B-HF

Image-Text-to-Text • 241B • Updated Sep 8 • 93 • 11
OpenGVLab/InternVL3_5-38B-HF

Image-Text-to-Text • 38B • Updated Sep 8 • 893 • 6
OpenGVLab/InternVL3_5-30B-A3B-HF

Image-Text-to-Text • 31B • Updated Sep 8 • 256 • 5

A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 133
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30 • 99
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1 • 117
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25 • 103
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31 • 84
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 303
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

Paper • 2509.18905 • Published Sep 23 • 29

rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 116
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 158
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19 • 118
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

List of VLM papers

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Paper • 2504.15271 • Published Apr 21 • 67

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 263
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 140
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 158
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6 • 129

InternVL3.5-Core

This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL).

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
OpenGVLab/InternVL3_5-241B-A28B-HF

Image-Text-to-Text • 241B • Updated Sep 8 • 93 • 11
OpenGVLab/InternVL3_5-38B-HF

Image-Text-to-Text • 38B • Updated Sep 8 • 893 • 6
OpenGVLab/InternVL3_5-30B-A3B-HF

Image-Text-to-Text • 31B • Updated Sep 8 • 256 • 5

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 133
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30 • 99
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs