-
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 16.3k • 1.55k -
PaddleOCR-VL Online Demo
📈232Parse images to extract text, tables, formulas, and charts
-
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 114 -
PaddlePaddle/PP-DocLayoutV2
Object Detection • Updated • 19.4k • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2510.14528
-
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
Paper • 2501.13956 • Published • 9 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 114 -
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science
Paper • 2510.16872 • Published • 109 -
Unveiling User Perceptions in the Generative AI Era: A Sentiment-Driven Evaluation of AI Educational Apps' Role in Digital Transformation of e-Teaching
Paper • 2512.11934 • Published • 1
-
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Paper • 2511.18373 • Published • 6 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 18 -
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Paper • 2511.19418 • Published • 29 -
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 129
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
tencent/HunyuanOCR
Image-Text-to-Text • 1.0B • Updated • 1.35M • 552 -
HunyuanOCR Technical Report
Paper • 2511.19575 • Published • 22 -
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 16.3k • 1.55k -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 114
-
PubTables-1M: Towards comprehensive table extraction from unstructured documents
Paper • 2110.00061 • Published • 3 -
Optimized Table Tokenization for Table Structure Recognition
Paper • 2305.03393 • Published • 1 -
Qwen3-VL Technical Report
Paper • 2511.21631 • Published • 153 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 114
-
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 16.3k • 1.55k -
PaddleOCR-VL Online Demo
📈232Parse images to extract text, tables, formulas, and charts
-
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 114 -
PaddlePaddle/PP-DocLayoutV2
Object Detection • Updated • 19.4k • 23
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
Paper • 2501.13956 • Published • 9 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 114 -
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science
Paper • 2510.16872 • Published • 109 -
Unveiling User Perceptions in the Generative AI Era: A Sentiment-Driven Evaluation of AI Educational Apps' Role in Digital Transformation of e-Teaching
Paper • 2512.11934 • Published • 1
-
tencent/HunyuanOCR
Image-Text-to-Text • 1.0B • Updated • 1.35M • 552 -
HunyuanOCR Technical Report
Paper • 2511.19575 • Published • 22 -
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 16.3k • 1.55k -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 114
-
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Paper • 2511.18373 • Published • 6 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 18 -
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Paper • 2511.19418 • Published • 29 -
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 129
-
PubTables-1M: Towards comprehensive table extraction from unstructured documents
Paper • 2110.00061 • Published • 3 -
Optimized Table Tokenization for Table Structure Recognition
Paper • 2305.03393 • Published • 1 -
Qwen3-VL Technical Report
Paper • 2511.21631 • Published • 153 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 114