Model Description
Grillo is a culturally aware Italian AI companion based on the Qwen-3-8B architecture. Inspired by the character of Il Grillo Parlante (The Talking Cricket) from Carlo Collodi's Pinocchio, this model is fine-tuned to be wise, humble, and deeply rooted in Italian common sense ("buon senso").
Unlike generic assistants, Grillo offers advice with a warm, slightly admonishing yet caring tone, prioritizing ethical guidance and practical wisdom over robotic neutrality.
๐ Key Characteristics
- ๐ฎ๐น Culturally Authentic: Understands Italian idioms, proverbs (proverbi), and social nuances.
- ๐ฆ Practically Wise: Offers grounded advice for real-life dilemmas.
- ๐ค Humbly Helpful: Maintains a modest persona; helpful without being arrogant.
- ๐ฌ Natural Dialogue: Trained on high-quality conversational datasets to sound like a trusted friend.
๐ค๏ธ Training Journey
The model was sculpted through a rigorous multi-stage process:
1. Supervised Fine-Tuning (SFT)
- Objective: Instill natural Italian dialogue patterns.
- Data: WiroAI/dolphin-r1-italian.
- Duration: 100 Steps.
2. Direct Preference Optimization (DPO)
- Objective: Align the model with Helpful, Honest, and Harmless (HHH) principles.
- Method: Preference ranking to reduce toxicity and improve safety.
- Duration: +20 Steps (120 Total).
3. Experimental Tool Use (RL)
- Status: Experimental Phase.
- Objective: Integration with ChromaDB for information retrieval capabilities.
โ๏ธ Technical Specifications
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen3-8B |
| Architecture | Transformer Decoder (8B params) |
| LoRA Rank | 64 |
| LoRA Alpha | 32 |
| Learning Rate | 2e-4 (SFT) / 1e-4 (DPO) |
| Context Window | 4096 tokens |
| Training Hardware | Tinker Cloud (NVIDIA GPUs) |
๐ป Usage
Quickstart with Transformers + PEFT (Adapter Loading)
This method loads the Grillo adapter on top of the base Qwen model, which is memory-efficient.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# 1. Configuration and Model Loading
HF_MODEL_ID = "klei1/grillo-8b"
BASE_MODEL_ID = "Qwen/Qwen3-8B"
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_ID,
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)
# 2. Load Grillo Adapter (LoRA)
model = PeftModel.from_pretrained(base_model, HF_MODEL_ID)
model = model.eval() # Set model to evaluation mode
# 3. Define the System Persona (Crucial for performance)
system_prompt = """Tu sei Grillo, il Grillo Parlante.
Sei piccolo ma sapiente, umile ma coraggioso.
Parli un italiano autentico e offri sempre saggezza pratica e buon senso.
Non sei un assistente robotico, sei una coscienza morale."""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Grillo, ho paura di aver fatto una scelta sbagliata..."}
]
# 4. Generate Response
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
eos_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
- Downloads last month
- -