Fanar-2-Oryx-IG (Image Generation)
Fanar-2-Oryx-IG is a culturally-aligned text-to-image generation model developed by Qatar Computing Research Institute (QCRI) at Hamad Bin Khalifa University (HBKU), a member of Qatar Foundation for Education, Science, and Community Development. It is part of the Fanar 2.0 release, a comprehensive Arabic-centric multimodal generative AI platform that also includes text generation, image understanding and poetry generation.
Fanar-2-Oryx-IG addresses a critical gap in general-purpose image generation models: the systematic underrepresentation of Arabic, Islamic, and regional visual concepts. Through taxonomy-driven data collection and cultural preference optimization, Fanar-2-Oryx-IG achieves best-in-class cultural alignment (85.49) while maintaining high visual quality (93.52), outperforming both its base model and commercial alternatives on culturally-sensitive content.
We have published a report with all the details regarding Fanar 2.0 GenAI platform. We also provide a chat interface, mobile apps for iOS and Android, and API access to our models and the GenAI platform (request access here).
Model Details
| Attribute | Value |
|---|---|
| Developed by | QCRI at HBKU |
| Sponsored by | Ministry of Communications and Information Technology, State of Qatar |
| Model Type | Text-to-Image Diffusion Model |
| Base Model | FLUX.1-schnell |
| Fine-tuning Method | LoRA adapters on denoising network |
| Training Resolution | 1024×1024 |
| Input | Text |
| Output | Images (1024×1024) |
| Training Framework | Community FLUX implementation + DDP |
| Training Data | 480K culturally-aligned images |
| Training Steps | 200K |
| Languages | English |
| License | Apache 2.0 |
Model Training
Taxonomy-Driven Data Collection
Fanar-2-Oryx-IG training data was systematically curated using a taxonomy-driven approach spanning 23,000+ search terms organized across cultural categories:
Taxonomy Categories:
- Landmarks & Architecture: Regional landmarks (Museum of Islamic Art, Souq Waqif), traditional and modern buildings
- Traditional Clothing: Thobe, abaya, hijab, ghutra, regional dress variations
- Food & Hospitality: Machboos, karak chai, Arabic coffee, traditional dishes
- Religious Settings: Mosques, prayer scenes, Islamic calligraphy
- Ceremonies & Celebrations: Weddings, Eid celebrations, traditional gatherings
- Daily Life: Majlis settings, family gatherings, markets, social interactions
- Geographical Coverage: 22 Arab countries with balanced representation
Data Sources:
- Google Images & Flickr: ~2M raw images
- Retention rate: 37% after quality filtering → 480K high-quality images
Quality Filtering & Enhancement
Filtering Criteria:
- Visual quality and resolution standards
- Relevance to cultural taxonomy
- NSFW content removal (nudity, explicit content, violence)
- Watermark and logo detection
- Cultural appropriateness verification
Image Processing:
- Standardization to 1024×1024 resolution
- Super-resolution upscaling for low-resolution sources
- Inpainting for aspect ratio correction
- Photometric adjustments (exposure, white balance, contrast)
Final selection criteria:
- Visual quality consistency
- Cultural alignment strength
- Stability across diverse prompts
Rich Metadata Annotation
Each image is annotated with comprehensive metadata which is generated via multimodal model (Gemini 2.5 Flash) analyzing both image content and contextual signals:
- Intrinsic: Resolution, format
- Adjunct: Source, query term, licensing
- Visual: Descriptions, cultural elements, objects, people, places
- Captions: 10 diverse caption variants per image
Fine-tuning Configuration
- Optimizer: AdamW
- Learning rate: 5×10⁻⁵ (constant schedule)
- Batch size: 4 (global)
- Training steps: 200K
- Hardware: Multi-GPU with DistributedDataParallel
- Precision: Mixed (bf16/fp16)
- Ablations: 60+ configurations tested
Visual Gallery
Below are examples of culturally-aligned images generated by Fanar-2-Oryx-IG across different scenarios:
Getting Started
Using Diffusers Library
Tested using diffusers v0.37.1 and peft v0.18.1.
from diffusers import FluxPipeline
import torch
model_name = "black-forest-labs/FLUX.1-schnell"
lora_path = "QCRI/Fanar-2-Oryx-IG"
pipe = FluxPipeline.from_pretrained(model_name, torch_dtype=torch.bfloat16)
pipe.load_lora_weights(lora_path)
prompt = "A falconer at the Falcon Souq in Doha holding a peregrine falcon on a leather glove"
out = pipe(
prompt=prompt,
guidance_scale=0.,
height=1024,
width=1024,
num_inference_steps=4, # Generally between 2 - 6
).images[0]
out.save("image.png")
Prompt Engineering for Cultural Content
Effective Prompts:
✅ "Museum of Islamic Art in Doha at sunset, architectural photography"
✅ "A Qatari woman wearing hijab and abaya shopping in Souq Waqif, traditional market atmosphere"
✅ "Traditional Gulf wedding ceremony with guests in cultural attire, celebration scene"
Generic Prompts (less culturally specific):
❌ "Woman shopping"
❌ "Wedding ceremony"
❌ "Museum building"
Tips:
- Include cultural specifics: clothing items, cultural context
- Specify regional details: "Gulf", "Qatari", "Arabic"
- Add atmospheric details: "modest", "cultural", "ceremonial"
Evaluation
Cultural Alignment Benchmark
Fanar-2-Oryx-IG was evaluated on a custom benchmark of 1,000 culturally-relevant prompts covering landmarks, clothing, food, religious settings, ceremonies, and daily life across the Arab world.
Automated Scoring: Gemini 2.5 Flash judge with 12 criteria aggregated into 5 dimensions:
- Instruction Following: Prompt adherence and semantic constraint satisfaction
- Visual Accuracy: People accuracy, scene accuracy, visual consistency
- Cultural Alignment: Clothing/modesty correctness, Islamic context, Arabic cultural fidelity
- Text Quality: Correctness and readability of rendered text (English/Arabic)
- Perceptual Quality: Detail richness, sharpness, overall visual quality
Performance Results
| Model | #Params | Latency | Overall | Instruction Following | Quality | Accuracy | Cultural Compliance | Text |
|---|---|---|---|---|---|---|---|---|
| Fanar-2-Oryx-IG | 12B | 1.43s | 83.76 | 78.35 | 93.52 | 85.71 | 85.49 | 43.60 |
| OpenAI ChatGPT | undisclosed (estimated >1T) | 50.76s | 92.56 | 96.94 | 95.87 | 94.92 | 85.15 | 79.35 |
| Alibaba Qwen | 20B | 36.65s | 84.08 | 83.52 | 93.24 | 87.82 | 78.59 | 49.85 |
| Flux-schnell (base) | 12B | 1.43s | 78.32 | 72.70 | 90.50 | 80.70 | 78.90 | 30.80 |
| Fanar-1-IG | 4B | 3.05s | 75.77 | 74.40 | 80.70 | 80.30 | 76.60 | 31.10 |
Key Findings:
- Best Cultural Compliance (85.49) among all evaluated models, including commercial systems
- Second-best Quality (93.52), behind only OpenAI ChatGPT
- Fastest inference time at 1.43 seconds (35 times faster than OpenAI ChatGPT)
- Significant improvement over base model Flux-schnell (+6.59 cultural, +3.02 quality)
- Strong performance relative to model size and training data scale
Qualitative Comparison
Visual inspection reveals that Fanar Fanar-2-Oryx-IG consistently generates:
- More culturally appropriate clothing (thobe, ghutra, abaya, hijab)
- Better recognition of regional landmarks and architecture
- Appropriate social contexts and gatherings
- Respectful depictions of religious and ceremonial settings
While larger commercial models may achieve higher overall scores, Fanar-2-Oryx-IG excels specifically in cultural alignment for Arabic and Islamic content.
Intended Use, Ethical Considerations & Limitations
Fanar-2-Oryx-IG is built for:
- Culturally-appropriate visual content generation for Arabic and Islamic contexts
- Marketing and advertising targeting Arab audiences
- Educational materials about Arabic culture, history, and traditions
- Media production requiring culturally-sensitive imagery
- Social media content respecting local norms and values
- Cultural preservation and documentation projects
- Research on culturally-aligned image generation
Developers are encouraged to:
- Implement content moderation for production deployments
- Respect cultural sensitivities and local norms
- Provide clear disclaimers about AI-generated content
- Monitor outputs for appropriateness in target contexts
- Consider domain-specific fine-tuning for critical applications
- Add watermarks or disclaimers for AI-generated content
It should not be used to generate harmful, illegal, misleading, or culturally insensitive content. While Fanar-2-Oryx-IG demonstrates strong cultural alignment, users should be aware of limitations:
Potential Issues:
- May occasionally generate culturally inappropriate content despite training
- Text rendering in images (especially Arabic) remains challenging
- Cannot guarantee perfect adherence to all cultural norms in every generation
- Subject to biases present in training data and base model
Not Suitable For:
- Generating realistic images of specific individuals
- Creating misleading or deceptive imagery
- High-stakes decisions requiring perfect cultural accuracy
- Situations where errors could cause significant harm
Kindly refer to our Terms of Service and Privacy Policy.
The output generated by this model is not considered a statement of QCRI, HBKU, Qatar Foundation, MCIT, or any other organization or individual.
Fanar Platform
While Fanar-2-27B-Instruct is a powerful standalone model, it is part of the broader Fanar Platform—an integrated Arabic-centric multimodal AI ecosystem that provides enhanced capabilities and continuous updates. The platform includes:
Core Capabilities:
- Text Generation: Multiple conversational models optimized for different tasks
- Speech (Aura): Speech-to-text (short-form and long-form) and text-to-speech synthesis with Arabic dialect support and bilingual Arabic-English capabilities
- Image Understanding (Oryx-IVU): Vision-language model for culturally-grounded image and video understanding including Arabic calligraphy recognition
- Image Generation (Oryx-IG): Culturally-aligned text-to-image generation trained on taxonomy-driven data across 23,000+ cultural search terms
- Machine Translation (FanarShaheen): High-quality bilingual Arabic↔English translation across diverse domains (e.g., news, STEM, and medical)
- Poetry Generation (Diwan): Classical Arabic poetry generation respecting prosodic meters (Buhur) and maintaining diacritization accuracy
Specialized Systems:
- Fanar-Sadiq: Multi-agent Islamic question-answering system with 9 specialized tools (Fiqh reasoning, Quran/Hadith retrieval, zakat/inheritance calculation, prayer times, and Hijri calendar). Deployed in production on IslamWeb and IslamOnline platforms.
- Safety & Moderation: Fanar-Guard and culturally-informed content filtering trained on 468K annotated Arabic-English safety examples
Access Points:
- Fanar Chat: Web conversational interface integrating all modalities
- iOS and Android apps: Mobile apps for on-the-go access to the Fanar Platform
- Fanar API: Programmatic access to models and specialized capabilities
The Fanar Platform continuously evolves with model updates, new capabilities, and improved safety mechanisms. For production deployments requiring the latest features, multimodal integration, cross-model orchestration, and ongoing support, we recommend using the Fanar Platform rather than the standalone models published here.
Citation
If you use Fanar-2-Oryx-IG or the Fanar 2.0 GenAI platform in your research or applications, please cite:
@misc{fanarteam2026fanar20arabicgenerative,
title={Fanar 2.0: Arabic Generative AI Stack},
author={FANAR TEAM and Ummar Abbas and Mohammad Shahmeer Ahmad and Minhaj Ahmad and Abdulaziz Al-Homaid and Anas Al-Nuaimi and Enes Altinisik and Ehsaneddin Asgari and Sanjay Chawla and Shammur Chowdhury and Fahim Dalvi and Kareem Darwish and Nadir Durrani and Mohamed Elfeky and Ahmed Elmagarmid and Mohamed Eltabakh and Asim Ersoy and Masoomali Fatehkia and Mohammed Qusay Hashim and Majd Hawasly and Mohamed Hefeeda and Mus'ab Husaini and Keivin Isufaj and Soon-Gyo Jung and Houssam Lachemat and Ji Kim Lucas and Abubakr Mohamed and Tasnim Mohiuddin and Basel Mousi and Hamdy Mubarak and Ahmad Musleh and Mourad Ouzzani and Amin Sadeghi and Husrev Taha Sencar and Mohammed Shinoy and Omar Sinan and Yifan Zhang},
year={2026},
eprint={2603.16397},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.16397},
}
Acknowledgements
This project is from Qatar Computing Research Institute (QCRI) at Hamad Bin Khalifa University (HBKU), a member of Qatar Foundation. We thank our engineers, researchers, and support team for their efforts in advancing Arabic-centric large language models.
Special thanks to the Ministry of Communications and Information Technology, State of Qatar for their continued support by providing the compute infrastructure needed to develop and serve the platform through the Google Cloud Platform.
License
This model is licensed under the Apache 2.0 License.
- Downloads last month
- 8
Model tree for QCRI/Fanar-2-Oryx-IG
Base model
black-forest-labs/FLUX.1-schnell