Image-Text-to-Text
PaddleOCR
Safetensors
English
Chinese
multilingual
paddleocr_vl
ERNIE4.5
PaddlePaddle
image-to-text
ocr
document-parse
layout
table
formula
chart
conversational
custom_code
Eval Results
Instructions to use PaddlePaddle/PaddleOCR-VL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PaddleOCR
How to use PaddlePaddle/PaddleOCR-VL with PaddleOCR:
# See https://www.paddleocr.ai/latest/version3.x/pipeline_usage/PaddleOCR-VL.html to installation from paddleocr import PaddleOCRVL pipeline = PaddleOCRVL(pipeline_version="v1") output = pipeline.predict("path/to/document_image.png") for res in output: res.print() res.save_to_json(save_path="output") res.save_to_markdown(save_path="output") - Notebooks
- Google Colab
- Kaggle
To rename for future compatibility with transformers
#71
by xiaohei66 - opened
Our vision encoder is a heavily modified version of SigLIP, featuring a dynamic resolution mechanism and 2D RoPE instead of the original’s fixed resolution and learnable absolute position embeddings.
This makes our implementation fundamentally different from the standard SigLIP in libraries like Transformers. To avoid future naming conflicts and confusion, we must move away from the Siglip* name.
xiaohei66 changed pull request title from rename to To rename for future compatibility with transformers
xiaohei66 changed pull request status to open
xiaohei66 changed pull request status to merged