Instructions to use aciklab/kubernetes-ai-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aciklab/kubernetes-ai-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="aciklab/kubernetes-ai-GGUF",
	filename="kubernetes-ai-IQ3_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use aciklab/kubernetes-ai-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf aciklab/kubernetes-ai-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf aciklab/kubernetes-ai-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf aciklab/kubernetes-ai-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf aciklab/kubernetes-ai-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf aciklab/kubernetes-ai-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf aciklab/kubernetes-ai-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf aciklab/kubernetes-ai-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf aciklab/kubernetes-ai-GGUF:Q4_K_M

Use Docker

docker model run hf.co/aciklab/kubernetes-ai-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use aciklab/kubernetes-ai-GGUF with Ollama:
```
ollama run hf.co/aciklab/kubernetes-ai-GGUF:Q4_K_M
```

Unsloth Studio new

How to use aciklab/kubernetes-ai-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aciklab/kubernetes-ai-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aciklab/kubernetes-ai-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for aciklab/kubernetes-ai-GGUF to start chatting

Docker Model Runner
How to use aciklab/kubernetes-ai-GGUF with Docker Model Runner:
```
docker model run hf.co/aciklab/kubernetes-ai-GGUF:Q4_K_M
```

Lemonade

How to use aciklab/kubernetes-ai-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull aciklab/kubernetes-ai-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.kubernetes-ai-GGUF-Q4_K_M

List all available models

lemonade list

Kubernetes AI - GGUF Quantized Models

Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to GGUF format for efficient local inference.

Model Description

This repository contains GGUF quantized versions of the Kubernetes AI model, optimized for running on consumer hardware without GPU requirements. The model consists of LoRA adapters fine-tuned on unsloth/gemma-3-12b-it-qat-bnb-4bit and converted to GGUF format for llama.cpp compatibility.

Primary Purpose: Answer Kubernetes-related questions in Turkish language on local machines.

Available Models

Model	Size	Download
Unquantized	22.0 GB	kubernetes-ai.gguf
Q8_0	12.5 GB	kubernetes-ai-Q8_0.gguf
Q5_K_M	8.45 GB	kubernetes-ai-Q5_K_M.gguf
Q4_K_M	7.3 GB	kubernetes-ai-Q4_K_M.gguf
Q4_K_S	6.9 GB	kubernetes-ai-Q4_K_S.gguf
Q3_K_M	6.0 GB	kubernetes-ai-Q3_K_M.gguf
IQ3_M	5.6 GB	kubernetes-ai-IQ3_M.gguf

Recommended: Q4_K_M for best balance of quality and size, or IQ3_M for low-end systems.

Quick Start

Using Ollama (Recommended)

Ollama provides the easiest way to run GGUF models locally.

1. Install Ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Windows - Download from https://ollama.com/download

2. Download Model

# Download your preferred quantization
wget https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_M.gguf

3. Create Modelfile

cat > Modelfile << 'EOF'
FROM <path-to-model>/kubernetes-ai.gguf

TEMPLATE """{{ if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end }}{{ if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end }}<start_of_turn>model
{{ .Response }}<end_of_turn>
"""

# Model Parametreleri
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER top_k 64
PARAMETER repeat_penalty 1.05
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"

SYSTEM """Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."""
EOF

4. Create and Run Model

# Create model
ollama create kubernetes-ai -f Modelfile

# Run interactive chat
ollama run kubernetes-ai

# Example query
ollama run kubernetes-ai "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"

Training Details

This model is based on the aciklab/kubernetes-ai LoRA adapters:

Base Model: unsloth/gemma-3-12b-it-qat-bnb-4bit
Training Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 8
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Dataset: ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
Training Time: 28 hours on NVIDIA RTX 5070 12GB
Max Sequence Length: 1024 tokens

Training Dataset Summary

Dataset Category	Count	Description
Kubernetes Official Docs	8,910	Concepts, kubectl, setup, tasks, tutorials
Stack Overflow	52,000	Kubernetes Q&A from community
DevOps Datasets	62,500	General DevOps and Kubernetes content
Configurations & CLI	36,800	Kubernetes configs, kubectl examples, operators
Total	~157,210	Comprehensive Kubernetes knowledge base

Quantization Details

All models were quantized using llama.cpp with importance matrix optimization:

Source: Merged LoRA adapters with base model
Quantization Tool: llama.cpp (latest)
Method: K-quant and IQ-quant mixtures
Optimization: Importance matrix for better quality

Quantization Quality

Q4_K_M: Best balance - recommended for most users
Q4_K_S: Slightly smaller with minimal quality loss
Q3_K_M: Good for memory-constrained systems
IQ3_M: Advanced 3-bit quantization for laptops
Unquantized: Original F16/F32 precision

Hardware Requirements

Minimum

CPU: 4+ cores
RAM: 8GB (for IQ3_M/Q3_K_M quantizations)
Storage: 6-8GB free space
GPU: Not required (CPU inference)

License

This model is released under the MIT License. Free to use in commercial and open-source projects.

Contact

Produced by: HAVELSAN/Açıklab

For questions or feedback, please open an issue on the model repository.

Note: These are GGUF quantized versions ready for immediate use. No additional model loading or merging required.

Downloads last month: 99

GGUF

Model size

12B params

Architecture

gemma3

Hardware compatibility

3-bit

4-bit

5-bit

8-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aciklab/kubernetes-ai-GGUF

Base model

google/gemma-3-12b-pt

Finetuned

google/gemma-3-12b-it

Finetuned

google/gemma-3-12b-it-qat-q4_0-unquantized

Quantized

unsloth/gemma-3-12b-it-qat-bnb-4bit

Adapter

aciklab/kubernetes-ai-lora

Quantized

(2)

this model