moonshotai/kimi-k2.5 🖼️🔢📝 → 📝

⭐ Official ▶️ 17.3K runs 📅 Jan 2026 ⚙️ Cog 0.16.9
image-analysis image-to-text text-generation

About

Moonshot AI's latest open model. It unifies vision and text, thinking and non-thinking modes, and single-agent and multi-agent execution into one model

Example Output

Prompt:

"What is happening in this image?"

Output

This is a promotional graphic for Kimi K2.5, an AI model developed by Moonshot AI (月之暗面). The image is showcasing the three key capabilities of this version:

  1. Vision (top left): Indicates multimodal capabilities— the ability to understand and process images, with the box showing binary/matrix-like data representing visual processing.

  2. Coding (top right): Shows programming capabilities, with the box containing HTML/code snippets suggesting the model can write, understand, and debug code.

  3. Agentic (bottom): Represents autonomous agent capabilities— the AI's ability to perform tasks independently, plan, and execute actions on behalf of users.

The central text "Kimi K2.5" is overlaid on a motion-blurred image of what appears to be a kiwi fruit (likely a visual pun on "Kimi" or symbolizing speed/agility). The overall dark, sleek aesthetic with motion blur effects suggests high performance, speed, and cutting-edge technology.

This appears to be marketing material announcing or highlighting the features of the K2.5 model update, positioning it as a multimodal AI assistant capable of vision tasks, coding, and autonomous agentic behavior.

Performance Metrics

4.30s Prediction Time
4.31s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/OUAMSshKIobhnzYKMaAzmzUPJIJHd6IWZ9ROwtT1CnMyQtIS/tmpp0cja6hm.png",
  "top_p": 1,
  "prompt": "What is happening in this image?",
  "max_tokens": 1024,
  "temperature": 0.1,
  "presence_penalty": 0,
  "frequency_penalty": 0
}
Input Parameters
image Type: string
Image file to analyze (optional). Will be resized if larger than 1024px.
top_p Type: numberDefault: 1Range: 0 - 1
Top-p (nucleus) sampling
prompt Type: stringDefault:
Text prompt
max_tokens Type: integerDefault: 1024Range: 1 - 131072
Maximum number of tokens to generate.
temperature Type: numberDefault: 0.1Range: 0 - 2
Sampling temperature.
presence_penalty Type: numberDefault: 0Range: -2 - 2
Presence penalty
frequency_penalty Type: numberDefault: 0Range: -2 - 2
Frequency penalty
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
Prompt: What is happening in this image?
Image provided: /tmp/tmp70865u7_tmpp0cja6hm.png
Resizing image from 2279x1455 to 1024x653
Input token count: 921
Output token count: 596
TTFT: 0.63s
Tokens per second: 146.73
Total time: 4.06s
Version Details
Version ID
b4d8427a98a2de294f719d281c5218daebd44895b308ace34792d0746f6670ba
Version Created
January 27, 2026
Run on Replicate →