A.X K1

Model Summary

A.X K1 is a large-scale Mixture-of-Experts (MoE) language model designed for efficient high-capacity reasoning and instruction following. The model contains 519 billion total parameters, with 33 billion active parameters, enabling strong performance while maintaining practical inference efficiency.

This hybrid design allows the user to choose between in-depth reasoning and response latency depending on task requirements.

Model Details

Architecture: Decoder-only Transformer with Mixture-of-Experts (MoE)
Total parameters: 519B (192 experts + 1 shared expert)
Active parameters: 33B per token (8 experts + 1 shared expert)
Number of layers: 61 (1 dense + 60 MoE)
Number of attention heads: 64
Intermediate size: 7168
Expert intermediate size: 2048
Normalization: RMSNorm applied both before and after the MLP block
Vocab size: 163,840
Context length: 131,072 tokens

Architecture Highlights

Mixture-of-Experts Design

A.X K1 follows a sparse Mixture-of-Experts architecture in which only a subset of experts is activated per token. This design substantially increases model capacity while keeping the computational cost comparable to dense models with much smaller parameter counts.

From a scalability and efficiency perspective, MoE architectures enable model capacity to grow primarily by adding experts, with substantially slower growth in compute compared to dense models. Expert parallelism allows experts to be distributed across devices, supporting large-scale training and serving without activating all parameters on every forward pass. Recent MoE scaling-law studies provide guidance for selecting the number of experts and activation ratios under fixed compute and memory budgets.

Hybrid Reasoning Fusion (Think / Non-Think)

A.X K1 uses a single model to generate responses where reasoning before the answer can be enabled or disabled depending on usage requirements. This design supports controlled trade-offs between reasoning depth and response latency.

Think mode: Generates reasoning steps before producing the answer for complex problem solving and multi-step inferences.
Non-Think mode: Generates concise, direct responses optimized for low-latency usage.

Post-MLP RMSNorm

A.X K1 incorporates an additional RMSNorm applied after the MLP (MoE) block in each Transformer layer. This design choice improves training stability in large-scale sparse MoE settings and enhances robustness for both reasoning-intensive and long-context generations.

Evaluation

Model evaluation results are scheduled for public release on January 4, 2026. Model checkpoints will be publicly released on January 4, 2026.

Running Locally

A.X K1 can be served with SGLang and vLLM. The optimal configuration depends on the runtime and version, GPU type and memory, and system-level factors such as networking and infrastructure. Validated configurations will be shared as upstream support and benchmarks mature.

Limitations

A.X K1 may generate incorrect or misleading information due to its stochastic nature.
Reasoning outputs in Think mode should not be interpreted as faithful representations of the model’s internal decision process.
Performance may vary across domains and languages depending on data coverage.

Citation

If you use A.X K1 in your research, please cite the technical report:

@techreport{axk1,
  title       = {A.X K1 Technical Report},
  author      = {{SK Telecom}},
  institution = {SK Telecom},
  year        = {2025},
  note        = {Technical report to be released}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including skt/A.X-K1

A.X K

Collection

3 items • Updated about 17 hours ago • 2