cjwbw/uform-gen2-qwen-500m 🖼️📝🔢 → 📝

▶️ 413 runs 📅 Feb 2024 ⚙️ Cog 0.9.4 ⚖️ License

image-analysis image-captioning image-to-text visual-question-answering visual-understanding

Performance

10.0sTypical run time

~352sCold start (first call)

413Total runs

About

Pocket-Sized Multimodal AI For Content Understanding and Generation

Example Output

Prompt:

"Describe the image in three sentences."

Output

A white and orange cat stands on its hind legs, reaching for a white teapot on a wooden table in a garden. The teapot is on a white tablecloth, and a basket of red raspberries is nearby. The cat's position and actions create a playful and charming scene.<|im_end|>

Performance Metrics

10.03s Prediction Time

352.47s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KPrmoP0t3TNpwsHNV5TmwJjcK1xQb0Vhw2AAtu9P7x7Sca4F/cat.jpg",
  "prompt": "Describe the image in three sentences.",
  "max_new_tokens": 256
}

Input Parameters

image (required) Type: string: Input image.
prompt Type: stringDefault: Describe the image in three sentences.: Question or Instruction.
max_new_tokens Type: integerDefault: 256: Max num of token to generate.

Output Schema

Output

Type: string

Version Details

Version ID: 9b09566caa6585d066ae5006e587e4f8de4c4a72881459ae1cb21b65229f0d57
Version Created: February 16, 2024

Run on Replicate →