zsxkib/blip-3 🖼️🔢✓📝 → 📝

▶️ 1.3M runs 📅 May 2024 ⚙️ Cog v0.9.5+dev 🔗 GitHub ⚖️ License

image-captioning image-to-text visual-question-answering

About

Blip 3 / XGen-MM, Answers questions about images ({blip3,xgen-mm}-phi3-mini-base-r-v1)

Example Output

Output

The meme is a humorous representation of the varying levels of effort put into different sections of a handwritten exam. The top section, labeled "First two pages", shows a neatly written page with a clear and legible handwriting. The middle section, labeled "Middle pages", shows a page with messy and illegible handwriting, suggesting that the student may have rushed through these pages. The bottom section, labeled "Last two pages", shows a page with a heartbeat graph, indicating that the student may have been in a state of panic or stress during the exam, leading to a hurried and messy handwriting. The meme is a light-hearted way to comment on the common experience of students who may not put in the same level of effort throughout an exam.

Performance Metrics

8.91s Prediction Time

205.63s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KtaXKzjetYsIqKFZoJPiX9SP8IVJEiCTIAXn1DZbmLp5iQCJ/image.png",
  "query": "Can you explain this meme?",
  "do_sample": false,
  "num_beams": 1,
  "max_new_tokens": 768
}

Input Parameters

image (required) Type: string: Input image
top_k Type: integerDefault: 50Range: 1 - ∞: The number of highest probability vocabulary tokens to keep for top-k sampling
top_p Type: numberDefault: 1Range: 0 - 1: The cumulative probability threshold for top-p sampling
caption Type: booleanDefault: false: Select if you want to generate image captions instead of asking questions
context Type: string: Optional - previous questions and answers to be used as context for answering current question
question Type: stringDefault: What is shown in the image?: Question to ask about this image
do_sample Type: booleanDefault: false: Whether to use sampling or not
num_beams Type: integerDefault: 1Range: 1 - 10: Number of beams for beam search
temperature Type: numberDefault: 1Range: 0.5 - 1: Temperature for use with nucleus sampling
system_prompt Type: stringDefault: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.: System prompt
length_penalty Type: numberDefault: 1Range: 0 - ∞: The parameter for length penalty
max_new_tokens Type: integerDefault: 768Range: 1 - 2048: Maximum number of new tokens to generate
repetition_penalty Type: numberDefault: 1Range: 0 - ∞: The parameter for repetition penalty

Output Schema

Output

Type: string

Example Execution Logs

<class 'transformers_modules.Salesforce.blip3-phi3-mini-instruct-r-v1.0e928237baf450032f063e295be64238fc9d6fd5.modeling_blip_3.Blip3ModelForConditionalGeneration'>
You are not running the flash-attention implementation, expect numerical differences.

Version Details

Version ID: 499bec581d8f64060fd695ec0c34d7595c6824c4118259aa8b0788e0d2d903e1
Version Created: May 13, 2024

Run on Replicate →