lucataco/magma-8b 🖼️📝✓🔢 → 📝
About
Microsoft Magma: A Foundation Model for Multimodal AI Agents

Example Output
Prompt:
"The figure represents a 3x3 grid containing various animals where each one by one square is considered a block and each block contains an animal from bird, tiger, parrot, mouse. What is the animal of the block located at the first row third column of the grid?"
Output
The animal in the first row, third column of the grid is a parrot.
Performance Metrics
0.83s
Prediction Time
0.84s
Total Time
All Input Parameters
{ "image": "https://replicate.delivery/pbxt/McWFm9sGqiPzDUxsLW5T9NuReWvYe8Z343emGIbiFIgEdfyr/replicate-prediction-3sp1e3c2v1rme0cneb5b8k5h6c.jpg", "prompt": "The figure represents a 3x3 grid containing various animals where each one by one square is considered a block and each block contains an animal from bird, tiger, parrot, mouse. What is the animal of the block located at the first row third column of the grid?", "do_sample": false, "num_beams": 1, "temperature": 0, "system_prompt": "You are agent that can see, talk and act.", "max_new_tokens": 128 }
Input Parameters
- image (required)
- Input image
- prompt
- Text prompt to guide the model's response
- do_sample
- Whether to use sampling or greedy decoding
- num_beams
- Number of beams for beam search
- temperature
- Sampling temperature
- system_prompt
- System prompt to set the context
- max_new_tokens
- Maximum number of tokens to generate
Output Schema
Output
Version Details
- Version ID
11a0822d2a06ac641c3d2baa92b885d1309c9e6fa50472295e55eaf4e3a4b3d3
- Version Created
- March 7, 2025