Jawi
Collection
Models for historical documents in Jawi (an adaptation of the Perso-Arabic script for the Malay language)
•
5 items
•
Updated
This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct specialized for Optical Character Recognition (OCR) of historical Malay texts written in Jawi script (Arabic script adapted for Malay language).
This was trained and evaluated using
We compared this model with https://github.com/VikParuchuri/surya, which reports high accuracy reates for Arabic, but performs poorly oun our Jawi data:
# Example code for loading and using the model
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
import torch
from qwen_vl_utils import process_vision_info
from PIL import Image
model_name = 'mevsg/qwen-for-jawi-v1'
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype=torch.bfloat16, # Use the appropriate torch dtype if needed
device_map='auto' # Optional: automatically allocate model layers across devices
)
# Load the processor from Hugging Face Hub
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
# Add example usage code
image_path = 'path/to/image'
image = Image.open(image_path).convert('RGB')
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": image,
},
{"type": "text", "text": "Convert this image to text"},
],
}
]
# Preparation for inference
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
@misc{qwen-for-jawi-v1,
title = {Qwen for Jawi v1: a model for Jawi OCR},
author = {[Miguel Escobar Varela]},
year = {2024},
publisher = {HuggingFace},
url = {[https://huggingface.co/mevsg/qwen-for-Jawi-v1]},
note = {Model created at National University of Singapore }
}
Special thanks to William Mattingly, whose finetuning script served as the base for our finetuning approach: https://github.com/wjbmattingly/qwen2-vl-finetune-huggingface