--- license: apache-2.0 language: - en - ko tags: - text-generation-inference - conversational - custom_code - text-generation - Motif library_name: transformers --- Last update: 12 Nov. 2025 # Introduction We are pleased to announce **Motif-2-12.7B-Base**, a 12.7-billion-parameter language model. Detailed information is found in the technical report: [https://arxiv.org/abs/2511.07464](https://arxiv.org/abs/2511.07464). # Evaluation All models listed in the table below are **base models**. *The results of Qwen3 and Gemma 3 are sourced directly from their technical reports.* |Benchmark|Evaluation setting|Motif-2-12.7B|Qwen3-14B|Qwen3-32B|Qwen3-30B-A3B|Gemma-3-12B|Gemma-3-27B| |---|---|---|---|---|---|---|---| |MMLU|5-shot|78.1|81.05|83.61|81.38|74.5|78.6| |MMLU-Redux|5-shot|78.68|79.88|83.41|81.17|-|-| |MMLU-Pro|5-shot, CoT|66.38|61.03|65.54|61.49|45.3|52.2| |SuperGPQA|5-shot, CoT|32.68|34.27|39.78|35.72|-|-| |BBH|3-shot, CoT|81.34|81.07|87.38|81.54|-|-| |GPQA|5-shot, CoT|42.18|39.9|49.49|43.94|-|-| |GPQA-Diamond|5-shot, CoT|42.92|-|-|-|25.4|24.3| |GSM8K|4-shot, CoT|93.85|92.49|93.4|91.81|-|-| |GSM8K|8-shot, CoT|94.92|-|-|-|71|82.6| |MATH|4-shot, CoT|73.62|62.02|61.62|59.04|43.3|50| |EvalPlus|0-shot|72.22|72.23|72.05|71.45|-|-| |MBPP|3-shot|81.5|73.4|78.2|74.4|60.4|65.6| |CRUX-O|1-shot|63.1|68.6|72.5|67.2|-|-| |HumanEval|0-shot|65.9|-|-|-|45.7|48.8| |DROP|1-shot|69.9|-|-|-|72.2|77.2| |HellaSwag|10-shot|84|-|-|-|84.2|85.6| |BoolQ|0-shot|78.5|-|-|-|78.8|82.4| |PIQA|0-shot|81.6|-|-|-|81.8|83.3| |SIQA|0-shot|53.8|-|-|-|53.4|54.9| |TriviaQA|5-shot|72.2|-|-|-|78.2|85.5| |Natural Question|5-shot|29.6|-|-|-|31.4|36.1| |ARC-C|25-shot|69.6|-|-|-|68.9|70.6| |ARC-E|0-shot|84.1|-|-|-|88.3|89| |WinoGrande|5-shot|79.6|-|-|-|74.3|78.8| |BBH|few-shot|81.3|-|-|-|72.6|77.7| ## Averages and improvements of the corresponding benchmark scores: ### v.s. Gemma 3-Base ||Motif-2-12.7B|Gemma-3-12B|Gemma-3-27B| |---|---|---|---| |**Average**|71.53|63.87|67.96| |**Improvement**||+11.99%|+5.26%| ### v.s. Qwen3-Base ||Motif-2-12.7B|Qwen3-14B|Qwen3-32B|Qwen3-30B-A3B| |---|---|---|---|---| |**Average**|69.42|67.81|71.54|68.10| |**Improvement**||+2.37%|-2.96%|+1.94%|