lucataco/blip3-phi3-mini-instruct-r-v1 🖼️📝🔢 → 📝

▶️ 403 runs 📅 May 2024 ⚙️ Cog 0.9.6 🔗 GitHub ⚖️ License

image-to-text ocr visual-question-answering

Performance

1.4sTypical run time

403Total runs

About

BLIP3(XGen-MM) is a series of foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research

Example Output

Output

There is one dog in the picture.

Performance Metrics

1.40s Prediction Time

1.41s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KtacIzXNav6KQBhoK4XorduzIZpxPWLjnMgayp07TPS0oS6T/blip-demo.jpg",
  "question": "how many dogs are in the picture?",
  "system_prompt": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions",
  "max_new_tokens": 768
}

Input Parameters

image (required) Type: string: Input image
question Type: stringDefault: how many dogs are in the picture?: Question to ask about this image
system_prompt Type: stringDefault: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions: System prompt
max_new_tokens Type: integerDefault: 768Range: 512 - 2047: Maximum number of tokens to generate

Output Schema

Output

Type: string

Example Execution Logs

prediction took:  1.2730693817138672

Version Details

Version ID: 01188b5c7796f5cbd4a301135932d5294fe10bd75fdca9520e7e92fd4b73321d
Version Created: May 10, 2024

Run on Replicate →