zsxkib/idefics3 📝🖼️🔢❓ → 📝

▶️ 2.5K runs 📅 Aug 2024 ⚙️ Cog 0.9.14 📄 Paper ⚖️ License
image-captioning image-to-text visual-question-answering

About

Idefics3-8B-Llama3, Answers questions and caption about images

Example Output

Output

A white dog is sitting on the bench. The background of the image is blurred, but we can still see trees and dry grass in the background. There are clouds visible in the sky.

Performance Metrics

3.45s Prediction Time
161.23s Total Time
All Input Parameters
{
  "text": "What do you see? Give me a detailed answer",
  "image": "https://replicate.delivery/pbxt/LRy82RONNFuqeS0JjwoxJQVxJMkxQ73xdshWr9mhXmRPJWjy/dogonbench.png",
  "top_p": 0.8,
  "temperature": 0.4,
  "max_new_tokens": 512,
  "assistant_prefix": "Let's think step by step.",
  "decoding_strategy": "top-p-sampling",
  "repetition_penalty": 1.2
}
Input Parameters
text (required) Type: string
Text query
image (required) Type: string
Upload your Image
top_p Type: numberDefault: 0.8Range: 0.01 - 0.99
Top P for sampling
temperature Type: numberDefault: 0.4Range: 0 - 5
Temperature for sampling
max_new_tokens Type: integerDefault: 512Range: 8 - 1024
Maximum number of new tokens
assistant_prefix Type: stringDefault: Let's think step by step.
Assistant Prefix
decoding_strategy Default: greedy
Decoding strategy
repetition_penalty Type: numberDefault: 1.2Range: 0.01 - 5
Repetition penalty
Output Schema

Output

Type: string

Version Details
Version ID
b06f5f6b6249b27d0b00d1b794240e5641190d1582ad68c40ef53778459bb593
Version Created
August 15, 2024
Run on Replicate →