lucataco/idefics-8b 🖼️📝🔢 → 📝

▶️ 1.2K runs 📅 Apr 2024 ⚙️ Cog 0.9.6 🔗 GitHub 📄 Paper ⚖️ License

image-to-text ocr visual-question-answering

About

Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs

Example Output

Prompt:

"Where is this pastry from?"

Output

Turkey.

Performance Metrics

3.99s Prediction Time

158.63s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KnG23ICcKFDi6YLBeGt9N3pncNTShrG6oxiekeG7KwlgQugr/baklava.png",
  "prompt": "Where is this pastry from?",
  "max_new_tokens": 512,
  "repetition_penalty": 1.2
}

Input Parameters

image (required) Type: string: Grayscale input image
prompt Type: stringDefault: What is this?: Imput prompt
max_new_tokens Type: integerDefault: 512Range: 8 - 1024: Maximum number of tokens to generate
repetition_penalty Type: numberDefault: 1.2Range: 0.01 - 5: Repetition penalty

Output Schema

Output

Type: string

Example Execution Logs

No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
INPUT: User:<image>Where is this pastry from?<end_of_utterance>
Assistant: |OUTPUT: ['Turkey.']

Version Details

Version ID: 7ab312514f213130c4a2db68b93a1719f5cc7c3246c408ba91d507b212a24303
Version Created: April 22, 2024

Run on Replicate →