lucataco/llama-3-vision-alpha 🖼️📝 → 📝

▶️ 5.8K runs 📅 Apr 2024 ⚙️ Cog 0.12.0 🔗 GitHub 📄 Paper ⚖️ License
image-analysis image-captioning image-to-text ocr visual-question-answering visual-understanding

About

Projection module trained to add vision capabilties to Llama 3 using SigLIP

Example Output

Prompt:

"Describe the image"

Output

The image is of a young girl with short, curly hair and bright blue eyes. She has a sweet and cheerful face, with a hint of mischief in her eyes. She is holding a giant hamburger in her hand, with a big bite taken out of it. She looks like she's enjoying every bite of her meal, with a satisfied expression on her face. She is sitting at a table in a medieval - style tavern, surrounded by wooden tables and chairs, with a fireplace crackling in the background. The atmosphere is cozy and inviting, with a warm glow emanating from the fireplace. The girl looks like she's in her element, enjoying a hearty meal with friends and family.

Performance Metrics

7.22s Prediction Time
213.54s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/Kq17Ws2RLIXdeFeep2N56psrMVq57TPssPrffeF8HawmOhvD/frieren.jpg",
  "prompt": "Describe the image"
}
Input Parameters
image (required) Type: string
Input image
prompt Type: stringDefault: Describe the image
Input prompt
Output Schema

Output

Type: arrayItems Type: string

Version Details
Version ID
79f9f3418f65fb5d180978a969d5646e33f8ca67430a2fe903c9a4be82565925
Version Created
November 5, 2024
Run on Replicate →