Fanar-2-Oryx-IG (Image Generation)

Fanar-2-Oryx-IG is a culturally-aligned text-to-image generation model developed by Qatar Computing Research Institute (QCRI) at Hamad Bin Khalifa University (HBKU), a member of Qatar Foundation for Education, Science, and Community Development. It is part of the Fanar 2.0 release, a comprehensive Arabic-centric multimodal generative AI platform that also includes text generation, image understanding and poetry generation.

Fanar-2-Oryx-IG addresses a critical gap in general-purpose image generation models: the systematic underrepresentation of Arabic, Islamic, and regional visual concepts. Through taxonomy-driven data collection and cultural preference optimization, Fanar-2-Oryx-IG achieves best-in-class cultural alignment (85.49) while maintaining high visual quality (93.52), outperforming both its base model and commercial alternatives on culturally-sensitive content.

We have published a report with all the details regarding Fanar 2.0 GenAI platform. We also provide a chat interface, mobile apps for iOS and Android, and API access to our models and the GenAI platform (request access here).

Model Details

Attribute	Value
Developed by	QCRI at HBKU
Sponsored by	Ministry of Communications and Information Technology, State of Qatar
Model Type	Text-to-Image Diffusion Model
Base Model	FLUX.1-schnell
Fine-tuning Method	LoRA adapters on denoising network
Training Resolution	1024×1024
Input	Text
Output	Images (1024×1024)
Training Framework	Community FLUX implementation + DDP
Training Data	480K culturally-aligned images
Training Steps	200K
Languages	English
License	Apache 2.0

Model Training

Taxonomy-Driven Data Collection

Fanar-2-Oryx-IG training data was systematically curated using a taxonomy-driven approach spanning 23,000+ search terms organized across cultural categories:

Taxonomy Categories:

Landmarks & Architecture: Regional landmarks (Museum of Islamic Art, Souq Waqif), traditional and modern buildings
Traditional Clothing: Thobe, abaya, hijab, ghutra, regional dress variations
Food & Hospitality: Machboos, karak chai, Arabic coffee, traditional dishes
Religious Settings: Mosques, prayer scenes, Islamic calligraphy
Ceremonies & Celebrations: Weddings, Eid celebrations, traditional gatherings
Daily Life: Majlis settings, family gatherings, markets, social interactions
Geographical Coverage: 22 Arab countries with balanced representation

Data Sources:

Google Images & Flickr: ~2M raw images
Retention rate: 37% after quality filtering → 480K high-quality images

Quality Filtering & Enhancement

Filtering Criteria:

Visual quality and resolution standards
Relevance to cultural taxonomy
NSFW content removal (nudity, explicit content, violence)
Watermark and logo detection
Cultural appropriateness verification

Image Processing:

Standardization to 1024×1024 resolution
Super-resolution upscaling for low-resolution sources
Inpainting for aspect ratio correction
Photometric adjustments (exposure, white balance, contrast)

Final selection criteria:

Visual quality consistency
Cultural alignment strength
Stability across diverse prompts

Rich Metadata Annotation

Each image is annotated with comprehensive metadata which is generated via multimodal model (Gemini 2.5 Flash) analyzing both image content and contextual signals:

Intrinsic: Resolution, format
Adjunct: Source, query term, licensing
Visual: Descriptions, cultural elements, objects, people, places
Captions: 10 diverse caption variants per image

Fine-tuning Configuration

Optimizer: AdamW
Learning rate: 5×10⁻⁵ (constant schedule)
Batch size: 4 (global)
Training steps: 200K
Hardware: Multi-GPU with DistributedDataParallel
Precision: Mixed (bf16/fp16)
Ablations: 60+ configurations tested

Visual Gallery

Below are examples of culturally-aligned images generated by Fanar-2-Oryx-IG across different scenarios:

Getting Started

Using Diffusers Library

Tested using diffusers v0.37.1 and peft v0.18.1.

from diffusers import FluxPipeline
import torch

model_name = "black-forest-labs/FLUX.1-schnell"
lora_path = "QCRI/Fanar-2-Oryx-IG"

pipe = FluxPipeline.from_pretrained(model_name, torch_dtype=torch.bfloat16)
pipe.load_lora_weights(lora_path)

prompt = "A falconer at the Falcon Souq in Doha holding a peregrine falcon on a leather glove"

out = pipe(
    prompt=prompt,    
    guidance_scale=0.,
    height=1024,
    width=1024,
    num_inference_steps=4, # Generally between 2 - 6
).images[0]

out.save("image.png")

Prompt Engineering for Cultural Content

Effective Prompts:

✅ "Museum of Islamic Art in Doha at sunset, architectural photography"

✅ "A Qatari woman wearing hijab and abaya shopping in Souq Waqif, traditional market atmosphere"

✅ "Traditional Gulf wedding ceremony with guests in cultural attire, celebration scene"

Generic Prompts (less culturally specific):

❌ "Woman shopping"

❌ "Wedding ceremony"

❌ "Museum building"

Tips:

Include cultural specifics: clothing items, cultural context
Specify regional details: "Gulf", "Qatari", "Arabic"
Add atmospheric details: "modest", "cultural", "ceremonial"

Evaluation

Cultural Alignment Benchmark

Fanar-2-Oryx-IG was evaluated on a custom benchmark of 1,000 culturally-relevant prompts covering landmarks, clothing, food, religious settings, ceremonies, and daily life across the Arab world.

Automated Scoring: Gemini 2.5 Flash judge with 12 criteria aggregated into 5 dimensions:

Instruction Following: Prompt adherence and semantic constraint satisfaction
Visual Accuracy: People accuracy, scene accuracy, visual consistency
Cultural Alignment: Clothing/modesty correctness, Islamic context, Arabic cultural fidelity
Text Quality: Correctness and readability of rendered text (English/Arabic)
Perceptual Quality: Detail richness, sharpness, overall visual quality

Performance Results

Model	#Params	Latency	Overall	Instruction Following	Quality	Accuracy	Cultural Compliance	Text
Fanar-2-Oryx-IG	12B	1.43s	83.76	78.35	93.52	85.71	85.49	43.60
OpenAI ChatGPT	undisclosed (estimated >1T)	50.76s	92.56	96.94	95.87	94.92	85.15	79.35
Alibaba Qwen	20B	36.65s	84.08	83.52	93.24	87.82	78.59	49.85
Flux-schnell (base)	12B	1.43s	78.32	72.70	90.50	80.70	78.90	30.80
Fanar-1-IG	4B	3.05s	75.77	74.40	80.70	80.30	76.60	31.10

Key Findings:

Best Cultural Compliance (85.49) among all evaluated models, including commercial systems
Second-best Quality (93.52), behind only OpenAI ChatGPT
Fastest inference time at 1.43 seconds (35 times faster than OpenAI ChatGPT)
Significant improvement over base model Flux-schnell (+6.59 cultural, +3.02 quality)
Strong performance relative to model size and training data scale

Qualitative Comparison

Visual inspection reveals that Fanar Fanar-2-Oryx-IG consistently generates:

More culturally appropriate clothing (thobe, ghutra, abaya, hijab)
Better recognition of regional landmarks and architecture
Appropriate social contexts and gatherings
Respectful depictions of religious and ceremonial settings

While larger commercial models may achieve higher overall scores, Fanar-2-Oryx-IG excels specifically in cultural alignment for Arabic and Islamic content.

Intended Use, Ethical Considerations & Limitations

Fanar-2-Oryx-IG is built for:

Culturally-appropriate visual content generation for Arabic and Islamic contexts
Marketing and advertising targeting Arab audiences
Educational materials about Arabic culture, history, and traditions
Media production requiring culturally-sensitive imagery
Social media content respecting local norms and values
Cultural preservation and documentation projects
Research on culturally-aligned image generation

Developers are encouraged to:

Implement content moderation for production deployments
Respect cultural sensitivities and local norms
Provide clear disclaimers about AI-generated content
Monitor outputs for appropriateness in target contexts
Consider domain-specific fine-tuning for critical applications
Add watermarks or disclaimers for AI-generated content

It should not be used to generate harmful, illegal, misleading, or culturally insensitive content. While Fanar-2-Oryx-IG demonstrates strong cultural alignment, users should be aware of limitations:

Potential Issues:

May occasionally generate culturally inappropriate content despite training
Text rendering in images (especially Arabic) remains challenging
Cannot guarantee perfect adherence to all cultural norms in every generation
Subject to biases present in training data and base model

Not Suitable For:

Generating realistic images of specific individuals
Creating misleading or deceptive imagery
High-stakes decisions requiring perfect cultural accuracy
Situations where errors could cause significant harm

Kindly refer to our Terms of Service and Privacy Policy.

The output generated by this model is not considered a statement of QCRI, HBKU, Qatar Foundation, MCIT, or any other organization or individual.

Fanar Platform

While Fanar-2-27B-Instruct is a powerful standalone model, it is part of the broader Fanar Platform—an integrated Arabic-centric multimodal AI ecosystem that provides enhanced capabilities and continuous updates. The platform includes:

Core Capabilities:

Text Generation: Multiple conversational models optimized for different tasks
Speech (Aura): Speech-to-text (short-form and long-form) and text-to-speech synthesis with Arabic dialect support and bilingual Arabic-English capabilities
Image Understanding (Oryx-IVU): Vision-language model for culturally-grounded image and video understanding including Arabic calligraphy recognition
Image Generation (Oryx-IG): Culturally-aligned text-to-image generation trained on taxonomy-driven data across 23,000+ cultural search terms
Machine Translation (FanarShaheen): High-quality bilingual Arabic↔English translation across diverse domains (e.g., news, STEM, and medical)
Poetry Generation (Diwan): Classical Arabic poetry generation respecting prosodic meters (Buhur) and maintaining diacritization accuracy

Specialized Systems:

Fanar-Sadiq: Multi-agent Islamic question-answering system with 9 specialized tools (Fiqh reasoning, Quran/Hadith retrieval, zakat/inheritance calculation, prayer times, and Hijri calendar). Deployed in production on IslamWeb and IslamOnline platforms.
Safety & Moderation: Fanar-Guard and culturally-informed content filtering trained on 468K annotated Arabic-English safety examples

Access Points:

Fanar Chat: Web conversational interface integrating all modalities
iOS and Android apps: Mobile apps for on-the-go access to the Fanar Platform
Fanar API: Programmatic access to models and specialized capabilities

The Fanar Platform continuously evolves with model updates, new capabilities, and improved safety mechanisms. For production deployments requiring the latest features, multimodal integration, cross-model orchestration, and ongoing support, we recommend using the Fanar Platform rather than the standalone models published here.

Citation

If you use Fanar-2-Oryx-IG or the Fanar 2.0 GenAI platform in your research or applications, please cite:

@misc{fanarteam2026fanar20arabicgenerative,
      title={Fanar 2.0: Arabic Generative AI Stack}, 
      author={FANAR TEAM and Ummar Abbas and Mohammad Shahmeer Ahmad and Minhaj Ahmad and Abdulaziz Al-Homaid and Anas Al-Nuaimi and Enes Altinisik and Ehsaneddin Asgari and Sanjay Chawla and Shammur Chowdhury and Fahim Dalvi and Kareem Darwish and Nadir Durrani and Mohamed Elfeky and Ahmed Elmagarmid and Mohamed Eltabakh and Asim Ersoy and Masoomali Fatehkia and Mohammed Qusay Hashim and Majd Hawasly and Mohamed Hefeeda and Mus'ab Husaini and Keivin Isufaj and Soon-Gyo Jung and Houssam Lachemat and Ji Kim Lucas and Abubakr Mohamed and Tasnim Mohiuddin and Basel Mousi and Hamdy Mubarak and Ahmad Musleh and Mourad Ouzzani and Amin Sadeghi and Husrev Taha Sencar and Mohammed Shinoy and Omar Sinan and Yifan Zhang},
      year={2026},
      eprint={2603.16397},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.16397}, 
}

Acknowledgements

This project is from Qatar Computing Research Institute (QCRI) at Hamad Bin Khalifa University (HBKU), a member of Qatar Foundation. We thank our engineers, researchers, and support team for their efforts in advancing Arabic-centric large language models.

Special thanks to the Ministry of Communications and Information Technology, State of Qatar for their continued support by providing the compute infrastructure needed to develop and serve the platform through the Google Cloud Platform.

License

This model is licensed under the Apache 2.0 License.

Downloads last month: 8

Model tree for QCRI/Fanar-2-Oryx-IG

Base model

black-forest-labs/FLUX.1-schnell

Finetuned

(61)

this model

Collection including QCRI/Fanar-2-Oryx-IG

Fanar 2.0

Collection

A comprehensive Arabic-centric multimodal generative AI stack with specialized models for text, image and poetry generation and image understanding • 5 items • Updated 4 days ago • 14

Paper for QCRI/Fanar-2-Oryx-IG

Fanar 2.0: Arabic Generative AI Stack

Paper • 2603.16397 • Published 12 days ago