Instructions to use aciklab/kubernetes-ai-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use aciklab/kubernetes-ai-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="aciklab/kubernetes-ai-GGUF", filename="kubernetes-ai-IQ3_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use aciklab/kubernetes-ai-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf aciklab/kubernetes-ai-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf aciklab/kubernetes-ai-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf aciklab/kubernetes-ai-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf aciklab/kubernetes-ai-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf aciklab/kubernetes-ai-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf aciklab/kubernetes-ai-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf aciklab/kubernetes-ai-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf aciklab/kubernetes-ai-GGUF:Q4_K_M
Use Docker
docker model run hf.co/aciklab/kubernetes-ai-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use aciklab/kubernetes-ai-GGUF with Ollama:
ollama run hf.co/aciklab/kubernetes-ai-GGUF:Q4_K_M
- Unsloth Studio new
How to use aciklab/kubernetes-ai-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aciklab/kubernetes-ai-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aciklab/kubernetes-ai-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for aciklab/kubernetes-ai-GGUF to start chatting
- Docker Model Runner
How to use aciklab/kubernetes-ai-GGUF with Docker Model Runner:
docker model run hf.co/aciklab/kubernetes-ai-GGUF:Q4_K_M
- Lemonade
How to use aciklab/kubernetes-ai-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull aciklab/kubernetes-ai-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.kubernetes-ai-GGUF-Q4_K_M
List all available models
lemonade list
Kubernetes AI - GGUF Quantized Models
Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to GGUF format for efficient local inference.
Model Description
This repository contains GGUF quantized versions of the Kubernetes AI model, optimized for running on consumer hardware without GPU requirements. The model consists of LoRA adapters fine-tuned on unsloth/gemma-3-12b-it-qat-bnb-4bit and converted to GGUF format for llama.cpp compatibility.
Primary Purpose: Answer Kubernetes-related questions in Turkish language on local machines.
Available Models
| Model | Size | Download |
|---|---|---|
| Unquantized | 22.0 GB | kubernetes-ai.gguf |
| Q8_0 | 12.5 GB | kubernetes-ai-Q8_0.gguf |
| Q5_K_M | 8.45 GB | kubernetes-ai-Q5_K_M.gguf |
| Q4_K_M | 7.3 GB | kubernetes-ai-Q4_K_M.gguf |
| Q4_K_S | 6.9 GB | kubernetes-ai-Q4_K_S.gguf |
| Q3_K_M | 6.0 GB | kubernetes-ai-Q3_K_M.gguf |
| IQ3_M | 5.6 GB | kubernetes-ai-IQ3_M.gguf |
Recommended: Q4_K_M for best balance of quality and size, or IQ3_M for low-end systems.
Quick Start
Using Ollama (Recommended)
Ollama provides the easiest way to run GGUF models locally.
1. Install Ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS
brew install ollama
# Windows - Download from https://ollama.com/download
2. Download Model
# Download your preferred quantization
wget https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_M.gguf
3. Create Modelfile
cat > Modelfile << 'EOF'
FROM <path-to-model>/kubernetes-ai.gguf
TEMPLATE """{{ if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end }}{{ if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end }}<start_of_turn>model
{{ .Response }}<end_of_turn>
"""
# Model Parametreleri
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER top_k 64
PARAMETER repeat_penalty 1.05
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"
SYSTEM """Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."""
EOF
4. Create and Run Model
# Create model
ollama create kubernetes-ai -f Modelfile
# Run interactive chat
ollama run kubernetes-ai
# Example query
ollama run kubernetes-ai "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"
Training Details
This model is based on the aciklab/kubernetes-ai LoRA adapters:
- Base Model: unsloth/gemma-3-12b-it-qat-bnb-4bit
- Training Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 8
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training Dataset: ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
- Training Time: 28 hours on NVIDIA RTX 5070 12GB
- Max Sequence Length: 1024 tokens
Training Dataset Summary
| Dataset Category | Count | Description |
|---|---|---|
| Kubernetes Official Docs | 8,910 | Concepts, kubectl, setup, tasks, tutorials |
| Stack Overflow | 52,000 | Kubernetes Q&A from community |
| DevOps Datasets | 62,500 | General DevOps and Kubernetes content |
| Configurations & CLI | 36,800 | Kubernetes configs, kubectl examples, operators |
| Total | ~157,210 | Comprehensive Kubernetes knowledge base |
Quantization Details
All models were quantized using llama.cpp with importance matrix optimization:
- Source: Merged LoRA adapters with base model
- Quantization Tool: llama.cpp (latest)
- Method: K-quant and IQ-quant mixtures
- Optimization: Importance matrix for better quality
Quantization Quality
- Q4_K_M: Best balance - recommended for most users
- Q4_K_S: Slightly smaller with minimal quality loss
- Q3_K_M: Good for memory-constrained systems
- IQ3_M: Advanced 3-bit quantization for laptops
- Unquantized: Original F16/F32 precision
Hardware Requirements
Minimum
- CPU: 4+ cores
- RAM: 8GB (for IQ3_M/Q3_K_M quantizations)
- Storage: 6-8GB free space
- GPU: Not required (CPU inference)
Recommended
- CPU: 8+ cores
- RAM: 16GB (for Q4_K_M/Q4_K_S quantizations)
- Storage: 10GB free space
- GPU: Optional (can accelerate inference)
License
This model is released under the MIT License. Free to use in commercial and open-source projects.
Contact
Produced by: HAVELSAN/Açıklab
For questions or feedback, please open an issue on the model repository.
Note: These are GGUF quantized versions ready for immediate use. No additional model loading or merging required.
- Downloads last month
- 99
Model tree for aciklab/kubernetes-ai-GGUF
Base model
google/gemma-3-12b-pt