cjwbw/uform-gen2-qwen-500m 🖼️📝🔢 → 📝

▶️ 399 runs 📅 Feb 2024 ⚙️ Cog 0.9.4 ⚖️ License
image-captioning image-to-text visual-question-answering

About

Pocket-Sized Multimodal AI For Content Understanding and Generation

Example Output

Prompt:

"Describe the image in three sentences."

Output

A white and orange cat stands on its hind legs, reaching for a white teapot on a wooden table in a garden. The teapot is on a white tablecloth, and a basket of red raspberries is nearby. The cat's position and actions create a playful and charming scene.<|im_end|>

Performance Metrics

10.03s Prediction Time
352.47s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KPrmoP0t3TNpwsHNV5TmwJjcK1xQb0Vhw2AAtu9P7x7Sca4F/cat.jpg",
  "prompt": "Describe the image in three sentences.",
  "max_new_tokens": 256
}
Input Parameters
image (required) Type: string
Input image.
prompt Type: stringDefault: Describe the image in three sentences.
Question or Instruction.
max_new_tokens Type: integerDefault: 256
Max num of token to generate.
Output Schema

Output

Type: string

Version Details
Version ID
9b09566caa6585d066ae5006e587e4f8de4c4a72881459ae1cb21b65229f0d57
Version Created
February 16, 2024
Run on Replicate →