olmOCR Arabic LoRA v2
LoRA fine-tuned adapter for Arabic manuscript OCR based on allenai/olmOCR-2-7B-1025.
Training Details
- Base Model: allenai/olmOCR-2-7B-1025
- LoRA Rank: 64
- LoRA Alpha: 128
- Training Data: 1,222 full-page Arabic manuscript images from hastyle/arabic-manuscript-ocr
- Epochs: 10
- Final Loss: ~7.2
Usage
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from peft import PeftModel
# Load base model
base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"allenai/olmOCR-2-7B-1025",
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hastyle/olmOCR-arabic-lora-v2")
processor = AutoProcessor.from_pretrained("allenai/olmOCR-2-7B-1025")
Improvements over v1
- Trained on full-page manuscripts instead of text-line images
- Higher LoRA capacity (rank 64 vs 16)
- Better word boundary preservation
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support