lucataco/qwen-vl-chat 🖼️📝 → 📝

▶️ 825.5K runs 📅 Sep 2023 ⚙️ Cog 0.8.6 🔗 GitHub 📄 Paper
image-analysis image-captioning image-to-text visual-understanding

About

A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.

Example Output

Prompt:

"How much would I pay if I want to order two Salmon Burger and three Meat Lover\'s Pizza? Think carefully step by step."

Output

If you want to order two Salmon Burgers and three Meat Lover's Pizzas, the total cost would depend on the price of each item on the menu.

Let's assume that the price of a Salmon Burger is $10 and the price of a Meat Lover's Pizza is $12. In this case, the total cost for two Salmon Burgers would be $20 and the total cost for three Meat Lover's Pizzas would be $36.

So, the total cost for two Salmon Burgers and three Meat Lover's Pizzas would be $56.

Performance Metrics

3.29s Prediction Time
3.30s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/JSwt0WCMKtolbjYYo6WYIE01Iemz3etQD6ugKxxeiVVlMgjF/Menu.jpeg",
  "prompt": "How much would I pay if I want to order two Salmon Burger and three Meat Lover\\'s Pizza? Think carefully step by step."
}
Input Parameters
image (required) Type: string
Input image
prompt Type: stringDefault: What is the name of the movie in the poster?
Question
Output Schema

Output

Type: string

Version Details
Version ID
50881b153b4d5f72b3db697e2bbad23bb1277ab741c5b52d80cd6ee17ea660e9
Version Created
October 15, 2023
Run on Replicate →