lucataco/blip3-phi3-mini-instruct-r-v1 🖼️📝🔢 → 📝

▶️ 398 runs 📅 May 2024 ⚙️ Cog 0.9.6 🔗 GitHub ⚖️ License
image-to-text visual-question-answering

About

BLIP3(XGen-MM) is a series of foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research

Example Output

Output

There is one dog in the picture.

Performance Metrics

1.40s Prediction Time
1.41s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KtacIzXNav6KQBhoK4XorduzIZpxPWLjnMgayp07TPS0oS6T/blip-demo.jpg",
  "question": "how many dogs are in the picture?",
  "system_prompt": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions",
  "max_new_tokens": 768
}
Input Parameters
image (required) Type: string
Input image
question Type: stringDefault: how many dogs are in the picture?
Question to ask about this image
system_prompt Type: stringDefault: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions
System prompt
max_new_tokens Type: integerDefault: 768Range: 512 - 2047
Maximum number of tokens to generate
Output Schema

Output

Type: string

Example Execution Logs
prediction took:  1.2730693817138672
Version Details
Version ID
01188b5c7796f5cbd4a301135932d5294fe10bd75fdca9520e7e92fd4b73321d
Version Created
May 10, 2024
Run on Replicate →