Instructions to use moonshotai/Kimi-K2.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use moonshotai/Kimi-K2.5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="moonshotai/Kimi-K2.5", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("moonshotai/Kimi-K2.5", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use moonshotai/Kimi-K2.5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "moonshotai/Kimi-K2.5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/moonshotai/Kimi-K2.5

SGLang

How to use moonshotai/Kimi-K2.5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "moonshotai/Kimi-K2.5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "moonshotai/Kimi-K2.5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use moonshotai/Kimi-K2.5 with Docker Model Runner:
```
docker model run hf.co/moonshotai/Kimi-K2.5
```

The content of response from Kimi-K2.5 is empty when deploying with vllm-0.15.0.

#40

by tuo02 - opened Feb 3

Discussion

tuo02

Feb 3

Im running Kimi-K2.5 with vllm-0.15.0, and I find that when the "max_tokens" in a request is less than 330, the content of response is empty, its so strange.

This is my command to start the engine:
'''
vllm serve moonshotai/Kimi-K2.5
-tp 8
--mm-encoder-tp-mode data
--trust-remote-code
--tool-call-parser kimi_k2
--reasoning-parser kimi_k2
--max-parallel-loading-workers 8
--disable-log-requests
'''

And this is my test request:
'''
curl localhost:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"messages": [
{
"role": "system",
"content": "You are Kimi, an AI assistant created by Moonshot AI."
},
{
"role": "user",
"content": "which one is bigger, 9.11 or 9.9? think carefully."
}
],
"temperature": 0.7,
"max_tokens": 329
}'
'''

while the response is here:
'''
{"id":"chatcmpl-93a3c11a1f87fb5c","object":"chat.completion","created":1770098358,"model":"moonshotai/Kimi-K2.5","choices":[{"index":0,"message":{"role":"assistant","content":null,"refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":" The user is asking which number is bigger: 9.11 or 9.9. This is a classic "trick" question that plays on how people sometimes mistakenly compare decimal numbers as if they were whole numbers or version numbers.\n\nLet me think about this carefully:\n\n1. If we treat these as decimal numbers:\n - 9.11 = 9 + 11/100 = 9.11\n - 9.9 = 9 + 9/10 = 9.90\n \n Comparing 9.11 and 9.90:\n - The integer parts are both 9.\n - The tenths place: 1 vs 9.\n - Since 9 > 1, we have 9.90 > 9.11.\n \n So 9.9 is bigger.\n\n2. Common mistake: People might think 9.11 is bigger because:\n - They compare 11 and 9 as whole numbers (11 > 9)\n - They think of these as version numbers (where 9.11 might come after 9.9)\n - They don't align the decimal places properly\n\n3. Correct alignment:\n 9.11\n 9.90 (or 9.9)\n \n Comparing digit by digit from left to right:\n - 9 = 9\n - 1 < 9\n \n Therefore 9.9 > 9.11.\n\nSo the answer is clearly 9.9.\n\nI should explain this clearly, showing the alignment of decimal places and explaining why the common","reasoning_content":" The user is asking which number is bigger: 9.11 or 9.9. This is a classic "trick" question that plays on how people sometimes mistakenly compare decimal numbers as if they were whole numbers or version numbers.\n\nLet me think about this carefully:\n\n1. If we treat these as decimal numbers:\n - 9.11 = 9 + 11/100 = 9.11\n - 9.9 = 9 + 9/10 = 9.90\n \n Comparing 9.11 and 9.90:\n - The integer parts are both 9.\n - The tenths place: 1 vs 9.\n - Since 9 > 1, we have 9.90 > 9.11.\n \n So 9.9 is bigger.\n\n2. Common mistake: People might think 9.11 is bigger because:\n - They compare 11 and 9 as whole numbers (11 > 9)\n - They think of these as version numbers (where 9.11 might come after 9.9)\n - They don't align the decimal places properly\n\n3. Correct alignment:\n 9.11\n 9.90 (or 9.9)\n \n Comparing digit by digit from left to right:\n - 9 = 9\n - 1 < 9\n \n Therefore 9.9 > 9.11.\n\nSo the answer is clearly 9.9.\n\nI should explain this clearly, showing the alignment of decimal places and explaining why the common"},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":44,"total_tokens":373,"completion_tokens":329,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

courage17340

Moonshot AI org Feb 3

It's because the thinking is not finished yet, and you can check reasoning_content.

CHNtentes

Feb 3

why do you set max_tokens=329? change it to something like 8192 or higher.

shivamashtikar

Feb 4

we are also facing the same issue with Kimi-k2.5 where model generates all the response in reasoning block and content block remains empty

shivamashtikar

Feb 4

•

edited Feb 4

Response Body:
[No content extracted from stream - Raw response saved for debugging]
event: message_start
data: {"type": "message_start", "message": {"id": "msg_ddb0a49a-a3b9-4f59-a391-2dfba4a68e66", "type": "message", "role": "assistant", "content": [], "model": "kimi-k2-5", "stop_reason": null, "stop_sequence": null, "usage": {"input_tokens": 0, "output_tokens": 0}}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "text", "text": ""}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn"}, "usage": {"input_tokens": 44832, "output_tokens": 639}}

event: message_stop
data: {"type": "message_stop"}

pratiknarola

Feb 4

Same here @shivamashtikar

I am also facing similar issue in https://huggingface.co/moonshotai/Kimi-K2.5/discussions/52

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment