Sungur-3x9B-T1

Sungur-3x9B-T1 is a Domain-Specialized Sparse Mixture of Experts (MoE) model designed for high-performance Turkish text generation. It is built upon the robust `ytu-ce-cosmos/Turkish-Gemma-9b-v0.1 as its base foundation.

This model was constructed using Gemma-2-MoE, a custom library I developed to build Gemma 2 based MoE models. Unlike traditional merges that rely purely on random router initialization, this model uses Semantic Router Initialization followed by a targeted fine-tuning stage.

It combines the strengths of three 9B models—including my own Sungur-9B—into a single efficient inference engine.

Turkish Evaluation Benchmark Results (via malhajar17/lm-evaluation-harness_turkish)

Metrik suayptalha/Sungur-3x9B-T1 suayptalha/Sungur-3x9B-Cosmos suayptalha/Sungur-3x9B-Gemma suayptalha/Sungur-9B ytu-ce-cosmos/Turkish-Gemma-9b-v0.1 Qwen/Qwen2.5-7B-Instruct google/gemma-2-9b-it google/gemma-3-12b-it Qwen/Qwen2.5-14B-it Qwen/Qwen2.5-32B-Instruct google/gemma-2-27b-it google/gemma-3-27b-it Qwen/Qwen2.5-72B-Instruct meta-llama/Llama-3-1-70B-Instruct
MMLU 60.12 61.22 58.49 61.19 63.85 56.31 61.07 63.92 65.28 70.93 66.49 70.20 77.28 74.00
TruthfulQA 54.74 55.30 57.75 55.21 54.21 55.99 55.77 57.16 59.00 57.87 57.45 57.06 59.86 51.41
ARC (acc_norm) 58.70 59.04 56.40 55.03 59.64 42.06 56.31 60.67 50.00 57.00 63.65 66.98 61.52 59.64
HellaSwag 64.29 64.10 60.48 64.36 64.19 44.71 56.48 62.00 52.22 57.04 63.86 66.58 61.98 64.31
GSM8K (strict) 73.12 72.89 70.16 74.49 73.42 64.16 63.10 72.06 76.77 77.83 76.54 77.52 83.60 66.13
Winogrande 63.67 63.67 62.40 63.43 64.53 59.66 62.09 61.77 58.77 61.77 65.40 65.80 61.92 66.90

Note on Benchmark Results: The evaluation results show a performance level that is generally consistent with the strongest individual expert models, rather than a significant leap forward often seen in fully trained MoE models. This is primarily because the model's core expert weights were frozen during the brief fine-tuning stage; only the router was trained to route tokens to the correct specialist. For future iterations, a full fine-tuning pass across all expert weights and the router is anticipated. This will allow the distinct expert models to learn to cooperate and complement each other, leading to a substantial gain in overall benchmark performance and cross-domain synergy.

Expert Composition (Top-2 Routing)

The model dynamically routes each token to the top 2 experts, combining logic, academic knowledge, and creative nuance.

Expert Slot Role Source Model Specialization
Expert 0 The Reasoner ytu-ce-cosmos/Turkish-Gemma-9b-T1 Logic, Math, Python/JS Coding, Algorithms, Chain-of-Thought.
Expert 1 The Scholar ytu-ce-cosmos/Turkish-Gemma-9b-v0.1 History, Biology, Physics, Academic Knowledge, Encyclopedic Facts.
Expert 2 The Creative suayptalha/Sungur-9B Chat, Roleplay, Storytelling, Poetry, Daily Conversation.

Build & Training Methodology

1. Semantic Router Initialization

Using my Gemma-2-MoE builder, the initial router weights were calculated using Concept Algebra on the embedding space. Expert Vector=Mean(Positive Prompts)0.5×Mean(Negative Prompts)\text{Expert Vector} = \text{Mean}(\text{Positive Prompts}) - 0.5 \times \text{Mean}(\text{Negative Prompts})

2. Router Fine-Tuning

After construction, the base model and expert weights were frozen. Only the router (gating) network was fine-tuned for 2 epochs using a synthetic dataset of 1,500 samples. This dataset contained prompt-expert pairs (e.g., a Python question paired with the Reasoner expert) to sharpen the router's decision-making and ensure prompts are sent to the correct domain specialist.

Build Configuration

This model was built using the following YAML configuration, which defines the experts and semantic routing logic:

See MoE config
base_model: ytu-ce-cosmos/Turkish-Gemma-9b-v0.1
dtype: bfloat16
num_experts_per_tok: 2

experts:
  # EXPERT 0: THE REASONER (Logic & Code)
  - source_model: ytu-ce-cosmos/Turkish-Gemma-9b-T1
    positive_prompts:
      - "matematiksel çözüm"
      - "python kodu yaz"
      - "algoritma kur"
      - "türev hesapla"
      - "javascript fonksiyonu"
      - "mantık sorusu"
      - "zincirleme düşünce"
      - "write python code"
      - "solve this equation"
      - "calculate derivative"
    negative_prompts:
      - "şiir yaz"
      - "roman anlat"
      - "günlük sohbet"
      - "write a poem"
      - "tell me a story"

  # EXPERT 1: THE SCHOLAR (Knowledge & Facts)
  - source_model: ytu-ce-cosmos/Turkish-Gemma-9b-v0.1
    positive_prompts:
      - "tarihsel olay nedir"
      - "biyolojik süreç"
      - "coğrafi bilgi"
      - "fizik kanunları"
      - "akademik makale özeti"
      - "what is the history of"
      - "scientific facts"
      - "laws of physics"
    negative_prompts:
      - "python kodu"
      - "hayal et"
      - "masal anlat"
      - "write code"

  # EXPERT 2: THE CREATIVE CHATBOT (Chat & Creative)
  - source_model: suayptalha/Sungur-9B
    positive_prompts:
      - "merhaba nasılsın"
      - "bana bir hikaye anlat"
      - "duygusal bir şiir yaz"
      - "rol yapma oyunu"
      - "arkadaşça sohbet"
      - "hello how are you"
      - "tell me a bedtime story"
      - "write a poem about love"
    negative_prompts:
      - "integral hesapla"
      - "c++ kodu"
      - "calculate integral"
      - "compile error"

Usage

To use this model, you need to set trust_remote_code=True.

AutoModelForCausalLM

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "suayptalha/Turkish-Gemma-2-MoE-3x9B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Example: Reasoning (Routes to Expert 0 - Reasoner)
messages = [
    {"role": "user", "content": "5x + 1 = 16. x'i bul ve adımları açıkla."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7
)

print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Expert Selection Examples

Prompt (Logic): "Bir Python listesini sıralayan bubble sort algoritmasını yaz."

Activated Expert: The Reasoner (Expert 0)

Prompt (Knowledge): "Sir Isaac Newton kimdir?"

Activated Expert: The Scholar (Expert 1)

Prompt (Creative): "Yüzüklerin Efendisi evreninde geçen kısa, duygusal bir şiir yaz."

Activated Expert: The Creative (Expert 2 - Sungur-9B)

Acknowledgments

This model is a collaborative effort utilizing the best open-source Turkish models:

  • Thanks to ytu-ce-cosmos for their amazing Turkish-Gemma-9b-v0.1 and Turkish-Gemma-9b-T1 models.
  • Thanks to Google for the original Gemma 2 architecture and models.
  • Gemma-2-MoE: The custom library I built to make this architecture possible.
  • Thanks to all Turkish open source AI community.

Citation

@misc{sungur_collection_2025,
  title        = {Sungur (Hugging Face Collection)},
  author       = {Şuayp Talha Kocabay},
  year         = {2025},
  howpublished = {\url{[https://huggingface.co/collections/suayptalha/sungur-68dcd094da7f8976cdc5898e](https://huggingface.co/collections/suayptalha/sungur-68dcd094da7f8976cdc5898e)}},
  note         = {Turkish LLM family and dataset collection}
}

Support

Buy Me A Coffee


license: gemma2

Downloads last month
34
Safetensors
Model size
22B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for suayptalha/Sungur-3x9B-Cosmos

Collection including suayptalha/Sungur-3x9B-Cosmos