zsxkib/uform-gen 🖼️📝 → 📝

▶️ 2.4K runs 📅 Feb 2024 ⚙️ Cog 0.8.6 🔗 GitHub ⚖️ License

image-captioning image-to-text text-generation visual-question-answering visual-understanding

Performance

17.2sTypical run time

~117sCold start (first call)

2.4KTotal runs

About

🖼️ Super fast 1.5B Image Captioning/VQA Multimodal LLM (Image-to-Text) 🖋️

Example Output

Prompt:

"Describe the image in great detail"

Output

A cat in a suit and bow tie stands in front of a gray background, looking at the camera with a curious expression. The suit and bow tie convey a formal and stylish appearance. The cat's attire and the suit's design imply it may be a pet or a formal event.

Performance Metrics

17.16s Prediction Time

117.24s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KKRtJipem0i7snNIhIMyzqBDKkib8ze6WbxBC2Lqrd73hgpN/tux.png",
  "prompt": "Describe the image in great detail"
}

Input Parameters

image (required) Type: string: Input image
prompt Type: stringDefault: Describe the image in great detail: Prompt to guide the caption generation

Output Schema

Output

Type: string

Version Details

Version ID: e6fa8e2d076907b45a0b535a14ddb22402548c2e478310cd18daa1c4c01f422b
Version Created: February 1, 2024

Run on Replicate →