lucataco/qwen-vl-chat 🖼️📝 → 📝

▶️ 826.7K runs 📅 Sep 2023 ⚙️ Cog 0.8.6 🔗 GitHub 📄 Paper

image-analysis image-captioning image-to-text ocr text-generation visual-question-answering visual-understanding

Performance

3.3sTypical run time

826.7KTotal runs

About

A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.

Example Output

Prompt:

"How much would I pay if I want to order two Salmon Burger and three Meat Lover\'s Pizza? Think carefully step by step."

Output

If you want to order two Salmon Burgers and three Meat Lover's Pizzas, the total cost would depend on the price of each item on the menu.

Let's assume that the price of a Salmon Burger is $10 and the price of a Meat Lover's Pizza is $12. In this case, the total cost for two Salmon Burgers would be $20 and the total cost for three Meat Lover's Pizzas would be $36.

So, the total cost for two Salmon Burgers and three Meat Lover's Pizzas would be $56.

Performance Metrics

3.29s Prediction Time

3.30s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/JSwt0WCMKtolbjYYo6WYIE01Iemz3etQD6ugKxxeiVVlMgjF/Menu.jpeg",
  "prompt": "How much would I pay if I want to order two Salmon Burger and three Meat Lover\\'s Pizza? Think carefully step by step."
}

Input Parameters

image (required) Type: string: Input image
prompt Type: stringDefault: What is the name of the movie in the poster?: Question

Output Schema

Output

Type: string

Version Details

Version ID: 50881b153b4d5f72b3db697e2bbad23bb1277ab741c5b52d80cd6ee17ea660e9
Version Created: October 15, 2023

Run on Replicate →