zsxkib/blip-3 🖼️🔢✓📝 → 📝

▶️ 1.3M runs 📅 May 2024 ⚙️ Cog v0.9.5+dev 🔗 GitHub ⚖️ License
image-captioning image-to-text visual-question-answering

About

Blip 3 / XGen-MM, Answers questions about images ({blip3,xgen-mm}-phi3-mini-base-r-v1)

Example Output

Output

The meme is a humorous representation of the varying levels of effort put into different sections of a handwritten exam. The top section, labeled "First two pages", shows a neatly written page with a clear and legible handwriting. The middle section, labeled "Middle pages", shows a page with messy and illegible handwriting, suggesting that the student may have rushed through these pages. The bottom section, labeled "Last two pages", shows a page with a heartbeat graph, indicating that the student may have been in a state of panic or stress during the exam, leading to a hurried and messy handwriting. The meme is a light-hearted way to comment on the common experience of students who may not put in the same level of effort throughout an exam.

Performance Metrics

8.91s Prediction Time
205.63s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KtaXKzjetYsIqKFZoJPiX9SP8IVJEiCTIAXn1DZbmLp5iQCJ/image.png",
  "query": "Can you explain this meme?",
  "do_sample": false,
  "num_beams": 1,
  "max_new_tokens": 768
}
Input Parameters
image (required) Type: string
Input image
top_k Type: integerDefault: 50Range: 1 - ∞
The number of highest probability vocabulary tokens to keep for top-k sampling
top_p Type: numberDefault: 1Range: 0 - 1
The cumulative probability threshold for top-p sampling
caption Type: booleanDefault: false
Select if you want to generate image captions instead of asking questions
context Type: string
Optional - previous questions and answers to be used as context for answering current question
question Type: stringDefault: What is shown in the image?
Question to ask about this image
do_sample Type: booleanDefault: false
Whether to use sampling or not
num_beams Type: integerDefault: 1Range: 1 - 10
Number of beams for beam search
temperature Type: numberDefault: 1Range: 0.5 - 1
Temperature for use with nucleus sampling
system_prompt Type: stringDefault: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
System prompt
length_penalty Type: numberDefault: 1Range: 0 - ∞
The parameter for length penalty
max_new_tokens Type: integerDefault: 768Range: 1 - 2048
Maximum number of new tokens to generate
repetition_penalty Type: numberDefault: 1Range: 0 - ∞
The parameter for repetition penalty
Output Schema

Output

Type: string

Example Execution Logs
<class 'transformers_modules.Salesforce.blip3-phi3-mini-instruct-r-v1.0e928237baf450032f063e295be64238fc9d6fd5.modeling_blip_3.Blip3ModelForConditionalGeneration'>
You are not running the flash-attention implementation, expect numerical differences.
Version Details
Version ID
499bec581d8f64060fd695ec0c34d7595c6824c4118259aa8b0788e0d2d903e1
Version Created
May 13, 2024
Run on Replicate →