lucataco/idefics-8b 🖼️📝🔢 → 📝

▶️ 1.1K runs 📅 Apr 2024 ⚙️ Cog 0.9.6 🔗 GitHub 📄 Paper ⚖️ License
image-to-text ocr visual-question-answering

About

Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs

Example Output

Prompt:

"Where is this pastry from?"

Output

Turkey.

Performance Metrics

3.99s Prediction Time
158.63s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KnG23ICcKFDi6YLBeGt9N3pncNTShrG6oxiekeG7KwlgQugr/baklava.png",
  "prompt": "Where is this pastry from?",
  "max_new_tokens": 512,
  "repetition_penalty": 1.2
}
Input Parameters
image (required) Type: string
Grayscale input image
prompt Type: stringDefault: What is this?
Imput prompt
max_new_tokens Type: integerDefault: 512Range: 8 - 1024
Maximum number of tokens to generate
repetition_penalty Type: numberDefault: 1.2Range: 0.01 - 5
Repetition penalty
Output Schema

Output

Type: string

Example Execution Logs
No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
INPUT: User:<image>Where is this pastry from?<end_of_utterance>
Assistant: |OUTPUT: ['Turkey.']
Version Details
Version ID
7ab312514f213130c4a2db68b93a1719f5cc7c3246c408ba91d507b212a24303
Version Created
April 22, 2024
Run on Replicate →