microsoft/phi-4-multimodal-instruct 📝🖼️🔢 → 📝

▶️ 13.2K runs 📅 Feb 2025 ⚙️ Cog 0.13.7 🔗 GitHub 📄 Paper ⚖️ License
image-to-text speech-to-text text-generation

About

Phi-4-multimodal-instruct is a lightweight open multimodal foundation model that leverages the language, vision, and speech research and datasets used for Phi-3.5 and 4.0 models.

Example Output

Output

A stop sign in front of a building with Chinese writing on it.

Performance Metrics

1.30s Prediction Time
109.66s Total Time
All Input Parameters
{
  "text": "What is shown in this image?",
  "images": [
    "https://replicate.delivery/pbxt/Ma2T6ufUKuLsiP3VzC9r1owPT0ObFQ4LTgG6f7LJ6Shg0Aez/australia.jpg"
  ],
  "max_tokens": 1000,
  "temperature": 0.7,
  "system_prompt": "You are a helpful assistant."
}
Input Parameters
task Type: array
Available tasks: transcribe, summarize, describe
text Type: stringDefault:
Input text
audio Type: array
Audio files
images Type: array
Image URLs
max_tokens Type: integerDefault: 1000
Max tokens
temperature Type: numberDefault: 0.7
Temperature
system_prompt Type: stringDefault: You are a helpful assistant.
System prompt
Output Schema

Output

Type: string

Example Execution Logs
/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:628: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:628: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
Version Details
Version ID
40c8f5c03ce250441855e776528bafd11cdb302c6677613acc0942c58dbd0afa
Version Created
February 28, 2025
Run on Replicate →