zsxkib/uform-gen 🖼️📝 → 📝

▶️ 2.4K runs 📅 Feb 2024 ⚙️ Cog 0.8.6 🔗 GitHub ⚖️ License
image-captioning image-to-text visual-question-answering

About

🖼️ Super fast 1.5B Image Captioning/VQA Multimodal LLM (Image-to-Text) 🖋️

Example Output

Prompt:

"Describe the image in great detail"

Output

A cat in a suit and bow tie stands in front of a gray background, looking at the camera with a curious expression. The suit and bow tie convey a formal and stylish appearance. The cat's attire and the suit's design imply it may be a pet or a formal event.

Performance Metrics

17.16s Prediction Time
117.24s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KKRtJipem0i7snNIhIMyzqBDKkib8ze6WbxBC2Lqrd73hgpN/tux.png",
  "prompt": "Describe the image in great detail"
}
Input Parameters
image (required) Type: string
Input image
prompt Type: stringDefault: Describe the image in great detail
Prompt to guide the caption generation
Output Schema

Output

Type: string

Version Details
Version ID
e6fa8e2d076907b45a0b535a14ddb22402548c2e478310cd18daa1c4c01f422b
Version Created
February 1, 2024
Run on Replicate →