Instructions to use WueNLP/centurio_qwen with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WueNLP/centurio_qwen with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="WueNLP/centurio_qwen", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("WueNLP/centurio_qwen", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use WueNLP/centurio_qwen with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WueNLP/centurio_qwen"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WueNLP/centurio_qwen",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/WueNLP/centurio_qwen

SGLang

How to use WueNLP/centurio_qwen with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WueNLP/centurio_qwen" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WueNLP/centurio_qwen",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WueNLP/centurio_qwen" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WueNLP/centurio_qwen",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use WueNLP/centurio_qwen with Docker Model Runner:
```
docker model run hf.co/WueNLP/centurio_qwen
```

Centurio Qwen

Model Details

Model Description

Model type: Centurio is an open-source multilingual large vision-language model.
Training Data: COMING SOON
Languages: The model was trained with the following 100 languages: af, am, ar, ar-eg, as, azb, be, bg, bm, bn, bo, bs, ca, ceb, cs, cy, da, de, du, el, en, eo, es, et, eu, fa, fi, fr, ga, gd, gl, ha, hi, hr, ht, hu, id, ig, is, it, iw, ja, jv, ka, ki, kk, km, ko, la, lb, ln, lo, lt, lv, mi, mr, ms, mt, my, no, oc, pa, pl, pt, qu, ro, ru, sa, sc, sd, sg, sk, sl, sm, so, sq, sr, ss, sv, sw, ta, te, th, ti, tl, tn, tpi, tr, ts, tw, uk, ur, uz, vi, war, wo, xh, yo, zh, zu
License: This work is released under the Apache 2.0 license.

Model Sources

Repository: gregor-ge.github.io/Centurio
Paper: arXiv

Uses

Direct Use

The model can be used directly through the transformers library with our custom code.

from transformers import AutoModelForCausalLM, AutoProcessor
import timm
from PIL import Image    
import requests

url = "https://upload.wikimedia.org/wikipedia/commons/b/bd/Golden_Retriever_Dukedestiny01_drvd.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model_name = "WueNLP/centurio_qwen"

processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)

## Appearance of images in the prompt are indicates with '<image_placeholder>'!
prompt = "<image_placeholder>\nBriefly describe the image in German."

messages = [
    {"role": "system", "content": "You are a helpful assistant."},  # This is the system prompt used during our training.
    {"role": "user", "content": prompt}
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True
)

model_inputs = processor(text=[text], images=[image] return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

Multiple Images

We natively support multi-image inputs. You only have to 1) include more <image_placeholder> while 2) passing all images of the entire batch as a flat list:

[...]
# Variables reused from above.

processor.tokenizer.padding_side = "left" # default is 'right' but has to be 'left' for batched generation to work correctly!

image_multi_1, image_multi_2 = [...] # prepare additional images

prompt_multi = "What is the difference between the following images?\n<image_placeholder><image_placeholder>\nAnswer in German."

messages_multi = [
    {"role": "system", "content": "You are a helpful assistant."}, 
    {"role": "user", "content": prompt_multi}
]

text_multi = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = processor(text=[text, text_multi], images=[image, image_multi_1, image_multi_2] return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)

[...]

Bias, Risks, and Limitations

General biases, risks, and limitations of large vision-language models like hallucinations or biases from training data apply.
This is a research project and not recommended for production use.
Multilingual: Performance and generation quality can differ widely between languages.
OCR: Model struggles both with small text and writing in non-Latin scripts.

Citation

BibTeX:

@article{centurio2025,
  author       = {Gregor Geigle and
                  Florian Schneider and
                  Carolin Holtermann and
                  Chris Biemann and
                  Radu Timofte and
                  Anne Lauscher and
                  Goran Glava\v{s}},
  title        = {Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model},
  journal      = {arXiv},
  volume       = {abs/2501.05122},
  year         = {2025},
  url          = {https://arxiv.org/abs/2501.05122},
  eprinttype    = {arXiv},
  eprint       = {2501.05122},
}

Downloads last month: 19

Model tree for WueNLP/centurio_qwen

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(3308)

this model

Collection including WueNLP/centurio_qwen

Centurio

Collection

Artifacts of the paper "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model" • 6 items • Updated Feb 4, 2025 • 4

Paper for WueNLP/centurio_qwen

Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model

Paper • 2501.05122 • Published Jan 9, 2025 • 19