Instructions to use marioVIC/qwen3-5-2b-arabic-semantic-chunking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use marioVIC/qwen3-5-2b-arabic-semantic-chunking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="marioVIC/qwen3-5-2b-arabic-semantic-chunking")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("marioVIC/qwen3-5-2b-arabic-semantic-chunking", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use marioVIC/qwen3-5-2b-arabic-semantic-chunking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "marioVIC/qwen3-5-2b-arabic-semantic-chunking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marioVIC/qwen3-5-2b-arabic-semantic-chunking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/marioVIC/qwen3-5-2b-arabic-semantic-chunking

SGLang

How to use marioVIC/qwen3-5-2b-arabic-semantic-chunking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "marioVIC/qwen3-5-2b-arabic-semantic-chunking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marioVIC/qwen3-5-2b-arabic-semantic-chunking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "marioVIC/qwen3-5-2b-arabic-semantic-chunking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marioVIC/qwen3-5-2b-arabic-semantic-chunking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use marioVIC/qwen3-5-2b-arabic-semantic-chunking with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for marioVIC/qwen3-5-2b-arabic-semantic-chunking to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for marioVIC/qwen3-5-2b-arabic-semantic-chunking to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for marioVIC/qwen3-5-2b-arabic-semantic-chunking to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="marioVIC/qwen3-5-2b-arabic-semantic-chunking",
    max_seq_length=2048,
)

Docker Model Runner
How to use marioVIC/qwen3-5-2b-arabic-semantic-chunking with Docker Model Runner:
```
docker model run hf.co/marioVIC/qwen3-5-2b-arabic-semantic-chunking
```

Qwen3.5-2B · Arabic Semantic Chunking

A LoRA adapter fine-tuned on top of Qwen/Qwen3.5-2B for Arabic text semantic segmentation.
Given a block of Arabic text, the model splits it into small, self-contained, meaningful sentences and returns them as a structured JSON object.

This model was trained via knowledge distillation from a GPT-OSS-20B teacher model, using Unsloth for efficient 4-bit LoRA fine-tuning.

Intended Use

Use case	Supported
Arabic sentence segmentation	✅
Semantic chunking for RAG pipelines	✅
Pre-processing Arabic documents	✅
Non-Arabic languages	❌
Translation or paraphrasing	❌

Quick Start

import json
import torch
from unsloth import FastLanguageModel

MODEL_ID = "marioVIC/qwen3-5-2b-arabic-semantic-chunking"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = MODEL_ID,
    max_seq_length = 2048,
    dtype          = None,
    load_in_4bit   = True,
)
FastLanguageModel.for_inference(model)

SYSTEM_PROMPT = """
You are an expert Arabic text segmentation assistant. Your task is to split the given Arabic text into small, meaningful sentences.
Follow these rules strictly:
1. Each sentence must be a complete, self-contained meaningful unit.
2. Do NOT merge multiple ideas into one sentence.
3. Do NOT split a single idea across multiple sentences.
4. Preserve the original Arabic text exactly — do not paraphrase, translate, or fix grammar.
5. Remove excessive whitespace or newlines, but keep the words intact.
6. Return ONLY a valid JSON object — no explanation, no markdown, no code fences.
The JSON format must be exactly: {"sentences": ["<sentence1>", "<sentence2>", ...]}
"""

def segment(text: str) -> list[str]:
    prompt = (
        f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
        f"<|im_start|>user\nText to split:\n{text}<|im_end|>\n"
        f"<|im_start|>assistant\n"
    )
    input_ids = tokenizer(
        prompt,
        return_tensors     = "pt",
        add_special_tokens = False,
    ).input_ids.to(model.device)

    with torch.inference_mode():
        output_ids = model.generate(
            input_ids,
            max_new_tokens     = 512,
            do_sample          = False,
            repetition_penalty = 1.1,
            pad_token_id       = tokenizer.eos_token_id,
            eos_token_id       = tokenizer.convert_tokens_to_ids("<|im_end|>"),
        )

    generated  = output_ids[0][input_ids.shape[-1]:]
    raw        = tokenizer.decode(generated, skip_special_tokens=True).strip()
    return json.loads(raw).get("sentences", [])


text = (
    "الذكاء الاصطناعي هو مجال من مجالات علوم الحاسوب يهتم بتطوير أنظمة "
    "قادرة على تنفيذ مهام تتطلب عادةً ذكاءً بشرياً. تشمل هذه المهام التعرف "
    "على الكلام وترجمة اللغات واتخاذ القرارات."
)

for i, s in enumerate(segment(text), 1):
    print(f"[{i}] {s}")

Expected output:

[1] الذكاء الاصطناعي هو مجال من مجالات علوم الحاسوب يهتم بتطوير أنظمة قادرة على تنفيذ مهام تتطلب عادةً ذكاءً بشرياً.
[2] تشمل هذه المهام التعرف على الكلام وترجمة اللغات واتخاذ القرارات.

Output Format

The model always returns a valid JSON object:

{
  "sentences": [
    "الجملة الأولى.",
    "الجملة الثانية.",
    "الجملة الثالثة."
  ]
}

Training Details

Base Model

Qwen/Qwen3.5-2B

Method

Knowledge distillation — a GPT-OSS-20B teacher model was used to generate segmentation labels over an Arabic corpus. The student (Qwen3.5-2B) was then fine-tuned on those labels via supervised fine-tuning (SFT).

Framework

Unsloth + TRL SFTTrainer

LoRA Configuration

Parameter	Value
Rank (`r`)	16
Alpha	16
Dropout	0.1
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Bias	none

Training Hyperparameters

Parameter	Value
Max sequence length	2048
Quantization	4-bit (QLoRA)
Batch size	8
Gradient accumulation steps	4
Effective batch size	32
Learning rate	2e-4
LR scheduler	Linear
Warmup steps	10
Max steps	30
Optimizer	AdamW 8-bit
Weight decay	0.05
Seed	3407

Data Split

Train: 90%
Eval: 10%
Best checkpoint selected by lowest eval_loss

Prompt Template

This model uses the ChatML format. Always use add_special_tokens=False when tokenizing a manually-built prompt.

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
Text to split:
{arabic_text}<|im_end|>
<|im_start|>assistant

Limitations

Optimised for Modern Standard Arabic (MSA); performance on dialects may vary.
Best results on texts up to ~400 tokens. Very long documents should be chunked before inference.
Output is always JSON — downstream parsing is required.
Not suitable for tasks other than segmentation (no Q&A, summarisation, etc.).

License

This adapter inherits the Apache 2.0 license from the base Qwen3.5-2B model.

Citation

If you use this model, please cite the base model:

@misc{qwen3technicalreport,
  title  = {Qwen3.5 Fine Tunned for semantic chunking},
  author = {Omar Abdelmoniem, Mariam Emad},
  year   = {2025},
  url    = {https://huggingface.co/Qwen/Qwen3.5-2B}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for marioVIC/qwen3-5-2b-arabic-semantic-chunking

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Adapter

(93)

this model