Sungur-3x9B-T1
Sungur-3x9B-T1 is a Domain-Specialized Sparse Mixture of Experts (MoE) model designed for high-performance Turkish text generation. It is built upon the robust `ytu-ce-cosmos/Turkish-Gemma-9b-v0.1 as its base foundation.
This model was constructed using Gemma-2-MoE, a custom library I developed to build Gemma 2 based MoE models. Unlike traditional merges that rely purely on random router initialization, this model uses Semantic Router Initialization followed by a targeted fine-tuning stage.
It combines the strengths of three 9B models—including my own Sungur-9B—into a single efficient inference engine.
Turkish Evaluation Benchmark Results (via malhajar17/lm-evaluation-harness_turkish)
| Metrik | suayptalha/Sungur-3x9B-T1 | suayptalha/Sungur-3x9B-Cosmos | suayptalha/Sungur-3x9B-Gemma | suayptalha/Sungur-9B | ytu-ce-cosmos/Turkish-Gemma-9b-v0.1 | Qwen/Qwen2.5-7B-Instruct | google/gemma-2-9b-it | google/gemma-3-12b-it | Qwen/Qwen2.5-14B-it | Qwen/Qwen2.5-32B-Instruct | google/gemma-2-27b-it | google/gemma-3-27b-it | Qwen/Qwen2.5-72B-Instruct | meta-llama/Llama-3-1-70B-Instruct |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MMLU | 60.12 | 61.22 | 58.49 | 61.19 | 63.85 | 56.31 | 61.07 | 63.92 | 65.28 | 70.93 | 66.49 | 70.20 | 77.28 | 74.00 |
| TruthfulQA | 54.74 | 55.30 | 57.75 | 55.21 | 54.21 | 55.99 | 55.77 | 57.16 | 59.00 | 57.87 | 57.45 | 57.06 | 59.86 | 51.41 |
| ARC (acc_norm) | 58.70 | 59.04 | 56.40 | 55.03 | 59.64 | 42.06 | 56.31 | 60.67 | 50.00 | 57.00 | 63.65 | 66.98 | 61.52 | 59.64 |
| HellaSwag | 64.29 | 64.10 | 60.48 | 64.36 | 64.19 | 44.71 | 56.48 | 62.00 | 52.22 | 57.04 | 63.86 | 66.58 | 61.98 | 64.31 |
| GSM8K (strict) | 73.12 | 72.89 | 70.16 | 74.49 | 73.42 | 64.16 | 63.10 | 72.06 | 76.77 | 77.83 | 76.54 | 77.52 | 83.60 | 66.13 |
| Winogrande | 63.67 | 63.67 | 62.40 | 63.43 | 64.53 | 59.66 | 62.09 | 61.77 | 58.77 | 61.77 | 65.40 | 65.80 | 61.92 | 66.90 |
Note on Benchmark Results: The evaluation results show a performance level that is generally consistent with the strongest individual expert models, rather than a significant leap forward often seen in fully trained MoE models. This is primarily because the model's core expert weights were frozen during the brief fine-tuning stage; only the router was trained to route tokens to the correct specialist. For future iterations, a full fine-tuning pass across all expert weights and the router is anticipated. This will allow the distinct expert models to learn to cooperate and complement each other, leading to a substantial gain in overall benchmark performance and cross-domain synergy.
Expert Composition (Top-2 Routing)
The model dynamically routes each token to the top 2 experts, combining logic, academic knowledge, and creative nuance.
| Expert Slot | Role | Source Model | Specialization |
|---|---|---|---|
| Expert 0 | The Reasoner | ytu-ce-cosmos/Turkish-Gemma-9b-T1 |
Logic, Math, Python/JS Coding, Algorithms, Chain-of-Thought. |
| Expert 1 | The Scholar | ytu-ce-cosmos/Turkish-Gemma-9b-v0.1 |
History, Biology, Physics, Academic Knowledge, Encyclopedic Facts. |
| Expert 2 | The Creative | suayptalha/Sungur-9B |
Chat, Roleplay, Storytelling, Poetry, Daily Conversation. |
Build & Training Methodology
1. Semantic Router Initialization
Using my Gemma-2-MoE builder, the initial router weights were calculated using Concept Algebra on the embedding space.
2. Router Fine-Tuning
After construction, the base model and expert weights were frozen. Only the router (gating) network was fine-tuned for 2 epochs using a synthetic dataset of 1,500 samples. This dataset contained prompt-expert pairs (e.g., a Python question paired with the Reasoner expert) to sharpen the router's decision-making and ensure prompts are sent to the correct domain specialist.
Build Configuration
This model was built using the following YAML configuration, which defines the experts and semantic routing logic:
See MoE config
base_model: ytu-ce-cosmos/Turkish-Gemma-9b-v0.1
dtype: bfloat16
num_experts_per_tok: 2
experts:
# EXPERT 0: THE REASONER (Logic & Code)
- source_model: ytu-ce-cosmos/Turkish-Gemma-9b-T1
positive_prompts:
- "matematiksel çözüm"
- "python kodu yaz"
- "algoritma kur"
- "türev hesapla"
- "javascript fonksiyonu"
- "mantık sorusu"
- "zincirleme düşünce"
- "write python code"
- "solve this equation"
- "calculate derivative"
negative_prompts:
- "şiir yaz"
- "roman anlat"
- "günlük sohbet"
- "write a poem"
- "tell me a story"
# EXPERT 1: THE SCHOLAR (Knowledge & Facts)
- source_model: ytu-ce-cosmos/Turkish-Gemma-9b-v0.1
positive_prompts:
- "tarihsel olay nedir"
- "biyolojik süreç"
- "coğrafi bilgi"
- "fizik kanunları"
- "akademik makale özeti"
- "what is the history of"
- "scientific facts"
- "laws of physics"
negative_prompts:
- "python kodu"
- "hayal et"
- "masal anlat"
- "write code"
# EXPERT 2: THE CREATIVE CHATBOT (Chat & Creative)
- source_model: suayptalha/Sungur-9B
positive_prompts:
- "merhaba nasılsın"
- "bana bir hikaye anlat"
- "duygusal bir şiir yaz"
- "rol yapma oyunu"
- "arkadaşça sohbet"
- "hello how are you"
- "tell me a bedtime story"
- "write a poem about love"
negative_prompts:
- "integral hesapla"
- "c++ kodu"
- "calculate integral"
- "compile error"
Usage
To use this model, you need to set trust_remote_code=True.
AutoModelForCausalLM
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "suayptalha/Turkish-Gemma-2-MoE-3x9B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Example: Reasoning (Routes to Expert 0 - Reasoner)
messages = [
{"role": "user", "content": "5x + 1 = 16. x'i bul ve adımları açıkla."},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7
)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Expert Selection Examples
Prompt (Logic): "Bir Python listesini sıralayan bubble sort algoritmasını yaz."
Activated Expert: The Reasoner (Expert 0)
Prompt (Knowledge): "Sir Isaac Newton kimdir?"
Activated Expert: The Scholar (Expert 1)
Prompt (Creative): "Yüzüklerin Efendisi evreninde geçen kısa, duygusal bir şiir yaz."
Activated Expert: The Creative (Expert 2 - Sungur-9B)
Acknowledgments
This model is a collaborative effort utilizing the best open-source Turkish models:
- Thanks to ytu-ce-cosmos for their amazing Turkish-Gemma-9b-v0.1 and Turkish-Gemma-9b-T1 models.
- Thanks to Google for the original Gemma 2 architecture and models.
- Gemma-2-MoE: The custom library I built to make this architecture possible.
- Thanks to all Turkish open source AI community.
Citation
@misc{sungur_collection_2025,
title = {Sungur (Hugging Face Collection)},
author = {Şuayp Talha Kocabay},
year = {2025},
howpublished = {\url{[https://huggingface.co/collections/suayptalha/sungur-68dcd094da7f8976cdc5898e](https://huggingface.co/collections/suayptalha/sungur-68dcd094da7f8976cdc5898e)}},
note = {Turkish LLM family and dataset collection}
}
Support
license: gemma2
- Downloads last month
- 34
