Instructions to use moonshotai/Kimi-K2.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use moonshotai/Kimi-K2.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="moonshotai/Kimi-K2.5", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("moonshotai/Kimi-K2.5", trust_remote_code=True, dtype="auto") - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use moonshotai/Kimi-K2.5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "moonshotai/Kimi-K2.5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moonshotai/Kimi-K2.5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/moonshotai/Kimi-K2.5
- SGLang
How to use moonshotai/Kimi-K2.5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "moonshotai/Kimi-K2.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moonshotai/Kimi-K2.5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "moonshotai/Kimi-K2.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moonshotai/Kimi-K2.5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use moonshotai/Kimi-K2.5 with Docker Model Runner:
docker model run hf.co/moonshotai/Kimi-K2.5
The content of response from Kimi-K2.5 is empty when deploying with vllm-0.15.0.
Im running Kimi-K2.5 with vllm-0.15.0, and I find that when the "max_tokens" in a request is less than 330, the content of response is empty, its so strange.
This is my command to start the engine:
'''
vllm serve moonshotai/Kimi-K2.5
-tp 8
--mm-encoder-tp-mode data
--trust-remote-code
--tool-call-parser kimi_k2
--reasoning-parser kimi_k2
--max-parallel-loading-workers 8
--disable-log-requests
'''
And this is my test request:
'''
curl localhost:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"messages": [
{
"role": "system",
"content": "You are Kimi, an AI assistant created by Moonshot AI."
},
{
"role": "user",
"content": "which one is bigger, 9.11 or 9.9? think carefully."
}
],
"temperature": 0.7,
"max_tokens": 329
}'
'''
while the response is here:
'''
{"id":"chatcmpl-93a3c11a1f87fb5c","object":"chat.completion","created":1770098358,"model":"moonshotai/Kimi-K2.5","choices":[{"index":0,"message":{"role":"assistant","content":null,"refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":" The user is asking which number is bigger: 9.11 or 9.9. This is a classic "trick" question that plays on how people sometimes mistakenly compare decimal numbers as if they were whole numbers or version numbers.\n\nLet me think about this carefully:\n\n1. If we treat these as decimal numbers:\n - 9.11 = 9 + 11/100 = 9.11\n - 9.9 = 9 + 9/10 = 9.90\n \n Comparing 9.11 and 9.90:\n - The integer parts are both 9.\n - The tenths place: 1 vs 9.\n - Since 9 > 1, we have 9.90 > 9.11.\n \n So 9.9 is bigger.\n\n2. Common mistake: People might think 9.11 is bigger because:\n - They compare 11 and 9 as whole numbers (11 > 9)\n - They think of these as version numbers (where 9.11 might come after 9.9)\n - They don't align the decimal places properly\n\n3. Correct alignment:\n 9.11\n 9.90 (or 9.9)\n \n Comparing digit by digit from left to right:\n - 9 = 9\n - 1 < 9\n \n Therefore 9.9 > 9.11.\n\nSo the answer is clearly 9.9.\n\nI should explain this clearly, showing the alignment of decimal places and explaining why the common","reasoning_content":" The user is asking which number is bigger: 9.11 or 9.9. This is a classic "trick" question that plays on how people sometimes mistakenly compare decimal numbers as if they were whole numbers or version numbers.\n\nLet me think about this carefully:\n\n1. If we treat these as decimal numbers:\n - 9.11 = 9 + 11/100 = 9.11\n - 9.9 = 9 + 9/10 = 9.90\n \n Comparing 9.11 and 9.90:\n - The integer parts are both 9.\n - The tenths place: 1 vs 9.\n - Since 9 > 1, we have 9.90 > 9.11.\n \n So 9.9 is bigger.\n\n2. Common mistake: People might think 9.11 is bigger because:\n - They compare 11 and 9 as whole numbers (11 > 9)\n - They think of these as version numbers (where 9.11 might come after 9.9)\n - They don't align the decimal places properly\n\n3. Correct alignment:\n 9.11\n 9.90 (or 9.9)\n \n Comparing digit by digit from left to right:\n - 9 = 9\n - 1 < 9\n \n Therefore 9.9 > 9.11.\n\nSo the answer is clearly 9.9.\n\nI should explain this clearly, showing the alignment of decimal places and explaining why the common"},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":44,"total_tokens":373,"completion_tokens":329,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
It's because the thinking is not finished yet, and you can check reasoning_content.
why do you set max_tokens=329? change it to something like 8192 or higher.
we are also facing the same issue with Kimi-k2.5 where model generates all the response in reasoning block and content block remains empty
Response Body:
[No content extracted from stream - Raw response saved for debugging]
event: message_start
data: {"type": "message_start", "message": {"id": "msg_ddb0a49a-a3b9-4f59-a391-2dfba4a68e66", "type": "message", "role": "assistant", "content": [], "model": "kimi-k2-5", "stop_reason": null, "stop_sequence": null, "usage": {"input_tokens": 0, "output_tokens": 0}}}
event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "text", "text": ""}}
event: content_block_stop
data: {"type": "content_block_stop", "index": 0}
event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn"}, "usage": {"input_tokens": 44832, "output_tokens": 639}}
event: message_stop
data: {"type": "message_stop"}
Same here @shivamashtikar
I am also facing similar issue in https://huggingface.co/moonshotai/Kimi-K2.5/discussions/52