andreasjansson/blip-2 🖼️✓📝🔢 → 📝

▶️ 32.0M runs 📅 Feb 2023 ⚙️ Cog 0.8.3 🔗 GitHub 📄 Paper

image-captioning image-to-text visual-question-answering visual-understanding vqa

Performance

0.9sTypical run time

32.0MTotal runs

About

Answers questions about images

Example Output

Output

san francisco bay

Performance Metrics

0.95s Prediction Time

1.01s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/IJEPmgAlL2zNBNDoRRKFegTEcxnlRhoQxlNjPHSZEy0pSIKn/gg_bridge.jpeg",
  "caption": false,
  "question": "what body of water does this bridge cross?",
  "temperature": 1
}

Input Parameters

image (required) Type: string: Input image to query or caption
caption Type: booleanDefault: false: Select if you want to generate image captions instead of asking questions
context Type: string: Optional - previous questions and answers to be used as context for answering current question
question Type: stringDefault: What is this a picture of?: Question to ask about this image. Leave blank for captioning
temperature Type: numberDefault: 1Range: 0.5 - 1: Temperature for use with nucleus sampling
use_nucleus_sampling Type: booleanDefault: false: Toggles the model using nucleus sampling to generate responses

Output Schema

Output

Type: string

Example Execution Logs

input for question answering: Question: what body of water does this bridge cross? Answer:

Version Details

Version ID: f677695e5e89f8b236e52ecd1d3f01beb44c34606419bcc19345e046d8f786f9
Version Created: November 20, 2023

Run on Replicate →