Instructions to use dh-unibe/trocr-kurrent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dh-unibe/trocr-kurrent with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="dh-unibe/trocr-kurrent")# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("dh-unibe/trocr-kurrent") model = AutoModelForImageTextToText.from_pretrained("dh-unibe/trocr-kurrent") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use dh-unibe/trocr-kurrent with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dh-unibe/trocr-kurrent" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dh-unibe/trocr-kurrent", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/dh-unibe/trocr-kurrent
- SGLang
How to use dh-unibe/trocr-kurrent with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dh-unibe/trocr-kurrent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dh-unibe/trocr-kurrent", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dh-unibe/trocr-kurrent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dh-unibe/trocr-kurrent", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use dh-unibe/trocr-kurrent with Docker Model Runner:
docker model run hf.co/dh-unibe/trocr-kurrent
No output(?)
Tried it this way:
#!/usr/bin/env python3
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# load image from the IAM database
url = "https://digi.ub.uni-heidelberg.de/diglitData/v/suetterlin/0001_page_370214_line_r1l5_docId_17239.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
processor = TrOCRProcessor.from_pretrained('dh-unibe/trocr-kurrent')
model = VisionEncoderDecoderModel.from_pretrained('dh-unibe/trocr-kurrent')
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
(a minimal modified version of the example from trocr-base-handwritten) but no (empty line) output...
The trocr-base-handwritten example works.
PS: Ubuntu 24.04. Tried Python 3.11 ... 3.14. GPU is a RTX 5060 Ti — 16 GB.
I guess the error is in the processor. As described in the other issue 3, you can load the original processor and then feed the model
#!/usr/bin/env python3
import logging
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# Log debug messages
logging.basicConfig(level=logging.DEBUG)
# load image from the IAM database
url = "https://digi.ub.uni-heidelberg.de/diglitData/v/suetterlin/0001_page_370214_line_r1l5_docId_17239.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
# Load the initial processor instead the one from the model as described in issue 3 (huggingface.co/dh-unibe/trocr-kurrent/discussions/3)
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
model = VisionEncoderDecoderModel.from_pretrained('dh-unibe/trocr-kurrent')
pixel_values = processor(images=image, return_tensors="pt").pixel_values
print(pixel_values)
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(generated_text[0])
The above script ouptuts Vogelschutz vom 17 . September. I'm also not so sure about the influence of the different processors, but I guess the results will not be as expected...
Btw. I'm not from the Digital Humanities Bern team, just part of this HF group
Thanks! And for segmentation we could use https://github.com/mittagessen/kraken $ kraken -i image.tif lines.json segment -bl
For segmentation you could use SAM with prompts, this may give you even better results
There are several approaches of using SAM to segement text in images. Perhaps you want to have a look at hi-sam. You can find the basic idea in the following paper https://arxiv.org/abs/2401.17904 and the weights for the pre-trained model on github Yukinori Yamamoto applied this approach to Spanish handwriting from the 18th century and had good results. I guess this would also be possible to other languages and historical domains. You can find a blog post about this on medium
Or PP-OCRv5 for segmentation. Seems relative reliable, but the "polygons" only have 4 corners. I don't know to what extent text from the lines below/above affects the OCR.