lucataco/llama-3-vision-alpha 🖼️📝 → 📝

▶️ 6.8K runs 📅 Apr 2024 ⚙️ Cog 0.12.0 🔗 GitHub 📄 Paper ⚖️ License

image-analysis image-captioning image-to-text ocr text-generation visual-question-answering visual-understanding

Performance

7.2sTypical run time

~214sCold start (first call)

6.8KTotal runs

About

Projection module trained to add vision capabilties to Llama 3 using SigLIP

Example Output

Prompt:

"Describe the image"

Output

The image is of a young girl with short, curly hair and bright blue eyes. She has a sweet and cheerful face, with a hint of mischief in her eyes. She is holding a giant hamburger in her hand, with a big bite taken out of it. She looks like she's enjoying every bite of her meal, with a satisfied expression on her face. She is sitting at a table in a medieval - style tavern, surrounded by wooden tables and chairs, with a fireplace crackling in the background. The atmosphere is cozy and inviting, with a warm glow emanating from the fireplace. The girl looks like she's in her element, enjoying a hearty meal with friends and family.

Performance Metrics

7.22s Prediction Time

213.54s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/Kq17Ws2RLIXdeFeep2N56psrMVq57TPssPrffeF8HawmOhvD/frieren.jpg",
  "prompt": "Describe the image"
}

Input Parameters

image (required) Type: string: Input image
prompt Type: stringDefault: Describe the image: Input prompt

Output Schema

Output

Type: array • Items Type: string

Version Details

Version ID: 79f9f3418f65fb5d180978a969d5646e33f8ca67430a2fe903c9a4be82565925
Version Created: November 5, 2024

Run on Replicate →