Instructions to use gokhanarkan/sommo-7b-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gokhanarkan/sommo-7b-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="gokhanarkan/sommo-7b-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gokhanarkan/sommo-7b-v1")
model = AutoModelForCausalLM.from_pretrained("gokhanarkan/sommo-7b-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use gokhanarkan/sommo-7b-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gokhanarkan/sommo-7b-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gokhanarkan/sommo-7b-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/gokhanarkan/sommo-7b-v1

SGLang

How to use gokhanarkan/sommo-7b-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "gokhanarkan/sommo-7b-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gokhanarkan/sommo-7b-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "gokhanarkan/sommo-7b-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gokhanarkan/sommo-7b-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use gokhanarkan/sommo-7b-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gokhanarkan/sommo-7b-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gokhanarkan/sommo-7b-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for gokhanarkan/sommo-7b-v1 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="gokhanarkan/sommo-7b-v1",
    max_seq_length=2048,
)

Docker Model Runner
How to use gokhanarkan/sommo-7b-v1 with Docker Model Runner:
```
docker model run hf.co/gokhanarkan/sommo-7b-v1
```

🍷 Sommo AI v1 — Wine Expert LLM

A fine-tuned language model for wine recommendations, food pairings, and sommelier-level advice.

Note: This is v1 — a proof of concept. The Sommo iOS app uses an enhanced v2 model with additional proprietary training data.

Model Description

Sommo AI is a wine expert assistant built on Qwen 2.5-7B-Instruct using LoRA fine-tuning. It can:

🍽️ Food Pairing — Recommend wines for specific dishes with reasoning
🍇 Wine Knowledge — Explain grape varieties, regions, and winemaking
💰 Recommendations — Suggest wines by budget, occasion, or preference
📝 Tasting Notes — Describe wines with professional vocabulary

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gokhanarkan/sommo-7b-v1")
tokenizer = AutoTokenizer.from_pretrained("gokhanarkan/sommo-7b-v1")

SYSTEM = """You are Sommo, an expert sommelier with decades of experience in wine selection, food pairing, and wine education. You have extensive knowledge of wine regions worldwide, grape varieties and their characteristics, winemaking techniques, and food pairing principles. You communicate in a warm, knowledgeable manner - approachable for beginners yet sophisticated enough for experts."""

def ask_sommo(question):
    prompt = f"<|im_start|>system\n{SYSTEM}<|im_end|>\n<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=400, temperature=0.7, do_sample=True)
    response = tokenizer.decode(outputs[0], skip_special_tokens=False)
    return response.split("<|im_start|>assistant\n")[-1].split("<|im_end|}")[0].strip()

print(ask_sommo("What wine pairs best with grilled salmon?"))

Training Data

Dataset	Records	Purpose
WineEnthusiast Reviews	130K	Professional tasting vocabulary
Alfredodeza Wine Ratings	33K	Detailed review structure
X-Wines (Kaggle)	1K+	Wine metadata and food pairings
Vivino Rating & Price (Kaggle)	13.8K	Consumer perspective and pricing
Wine Food Pairing NLP (GitHub)	~10K	Pairing logic and descriptors
Wikipedia Wine Articles	50+	Factual knowledge base
Synthetic Q&A (Gemini)	45	High-quality conversation examples

Total: ~100K training conversations

Training Details

Parameter	Value
Base Model	Qwen 2.5-7B-Instruct
Method	LoRA (r=64, alpha=64)
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs	3
Learning Rate	2e-5
Batch Size	16 (effective)
Hardware	NVIDIA H100 80GB
Training Time	~3-4 hours

Limitations

This is a v1 proof-of-concept with known limitations:

Factual Errors: The model may hallucinate wine facts, especially about specific regions, appellations, and wine laws. The Burgundy response in testing contained significant errors about AOC regulations.
Outdated Recommendations: Specific vintage recommendations (e.g., '09 wines) may be unavailable or past their prime.
Missing Context: Some responses may describe a wine without naming it.
No Real-Time Data: The model has no access to current prices or availability.

For production use, consider:

RAG (Retrieval-Augmented Generation) with a verified wine database
Post-processing validation for factual claims
Using v2 via sommo.app which addresses these issues

License

Apache 2.0

Citation

@misc{sommo-ai-v1,
  author = {Gokhan Arkan},
  title = {Sommo AI v1: Wine Expert LLM},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/gokhanarkan/sommo-7b-v1}
}

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

F32

BF16

Model tree for gokhanarkan/sommo-7b-v1

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

unsloth/Qwen2.5-7B-Instruct

Quantized

(11)

this model

gokhanarkan
/

sommo-7b-v1