zsxkib/uform-gen 🖼️📝 → 📝
About
🖼️ Super fast 1.5B Image Captioning/VQA Multimodal LLM (Image-to-Text) 🖋️
Example Output
Prompt:
"Describe the image in great detail"
Output
A cat in a suit and bow tie stands in front of a gray background, looking at the camera with a curious expression. The suit and bow tie convey a formal and stylish appearance. The cat's attire and the suit's design imply it may be a pet or a formal event.
Performance Metrics
17.16s
Prediction Time
117.24s
Total Time
All Input Parameters
{
"image": "https://replicate.delivery/pbxt/KKRtJipem0i7snNIhIMyzqBDKkib8ze6WbxBC2Lqrd73hgpN/tux.png",
"prompt": "Describe the image in great detail"
}
Input Parameters
- image (required)
- Input image
- prompt
- Prompt to guide the caption generation
Output Schema
Output
Version Details
- Version ID
e6fa8e2d076907b45a0b535a14ddb22402548c2e478310cd18daa1c4c01f422b- Version Created
- February 1, 2024