microsoft/phi-4-multimodal-instruct 📝🖼️🔢 → 📝
About
Phi-4-multimodal-instruct is a lightweight open multimodal foundation model that leverages the language, vision, and speech research and datasets used for Phi-3.5 and 4.0 models.
Example Output
Output
A stop sign in front of a building with Chinese writing on it.
Performance Metrics
1.30s
Prediction Time
109.66s
Total Time
All Input Parameters
{
"text": "What is shown in this image?",
"images": [
"https://replicate.delivery/pbxt/Ma2T6ufUKuLsiP3VzC9r1owPT0ObFQ4LTgG6f7LJ6Shg0Aez/australia.jpg"
],
"max_tokens": 1000,
"temperature": 0.7,
"system_prompt": "You are a helpful assistant."
}
Input Parameters
- task
- Available tasks: transcribe, summarize, describe
- text
- Input text
- audio
- Audio files
- images
- Image URLs
- max_tokens
- Max tokens
- temperature
- Temperature
- system_prompt
- System prompt
Output Schema
Output
Example Execution Logs
/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:628: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:628: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. warnings.warn(
Version Details
- Version ID
40c8f5c03ce250441855e776528bafd11cdb302c6677613acc0942c58dbd0afa- Version Created
- February 28, 2025