lucataco/magma-8b 🖼️📝✓🔢 → 📝

▶️ 28 runs 📅 Mar 2025 ⚙️ Cog 0.13.7 📄 Paper ⚖️ License
image-captioning image-to-text visual-understanding

About

Microsoft Magma: A Foundation Model for Multimodal AI Agents

Example Output

Prompt:

"The figure represents a 3x3 grid containing various animals where each one by one square is considered a block and each block contains an animal from bird, tiger, parrot, mouse. What is the animal of the block located at the first row third column of the grid?"

Output

The animal in the first row, third column of the grid is a parrot.

Performance Metrics

0.83s Prediction Time
0.84s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/McWFm9sGqiPzDUxsLW5T9NuReWvYe8Z343emGIbiFIgEdfyr/replicate-prediction-3sp1e3c2v1rme0cneb5b8k5h6c.jpg",
  "prompt": "The figure represents a 3x3 grid containing various animals where each one by one square is considered a block and each block contains an animal from bird, tiger, parrot, mouse. What is the animal of the block located at the first row third column of the grid?",
  "do_sample": false,
  "num_beams": 1,
  "temperature": 0,
  "system_prompt": "You are agent that can see, talk and act.",
  "max_new_tokens": 128
}
Input Parameters
image (required) Type: string
Input image
prompt Type: stringDefault: What is in this image?
Text prompt to guide the model's response
do_sample Type: booleanDefault: false
Whether to use sampling or greedy decoding
num_beams Type: integerDefault: 1Range: 1 - 5
Number of beams for beam search
temperature Type: numberDefault: 0Range: 0 - 2
Sampling temperature
system_prompt Type: stringDefault: You are agent that can see, talk and act.
System prompt to set the context
max_new_tokens Type: integerDefault: 128Range: 1 - 1024
Maximum number of tokens to generate
Output Schema

Output

Type: string

Version Details
Version ID
11a0822d2a06ac641c3d2baa92b885d1309c9e6fa50472295e55eaf4e3a4b3d3
Version Created
March 7, 2025
Run on Replicate →