nomagick/qwen-vl-chat 🔢🖼️📝 → 📝

▶️ 1.1K runs 📅 Oct 2023 ⚙️ Cog 0.8.6 🔗 GitHub 📄 Paper ⚖️ License
image-analysis image-captioning image-object-detection image-to-text ocr visual-understanding

About

Qwen-VL-Chat but with raw ChatML prompt interface and streaming

Example Output

Prompt:

"<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Given this image: image1, point out where the dog is<|im_end|>
<|im_start|>assistant"

Output

<ref> the dog</ref><box>(215,422),(578,892)</box>

Performance Metrics

3.17s Prediction Time
282.25s Total Time
All Input Parameters
{
  "top_p": 0.8,
  "image1": "https://replicate.delivery/pbxt/JfWlCzhD5GBoiDRSmNhMpyYCrmd78lKkLU2JFgr1imbJZIIN/demo.jpeg",
  "prompt": "<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nGiven this image: <img>image1</img>, point out where the dog is<|im_end|>\n<|im_start|>assistant\n",
  "max_tokens": 2048,
  "temperature": 0.75
}
Input Parameters
top_p Type: numberDefault: 0.8Range: 0 - 1
Top_p
image1 Type: string
Optional image you may use in your prompt known as image1
image2 Type: string
Optional image you may use in your prompt known as image2
image3 Type: string
Optional image you may use in your prompt known as image3
prompt Type: stringDefault: <|im_start|>system You are a helpful assistant<|im_end|> <|im_start|>user Given this image: <img>image1</img>, point out where the dog is<|im_end|> <|im_start|>assistant
Prompt for completion, in chatml format
max_tokens Type: integerDefault: 2048Range: 1 - 8192
Max new tokens to generate
temperature Type: numberDefault: 0.75Range: 0 - 5
Temperature
files_archive Type: string
An archive of files you mentioned in your prompt if more images are needed
Output Schema

Output

Type: arrayItems Type: string

Version Details
Version ID
e48b42673e1ad0adb765d2a2783338960c0f8a2453ca5aab1445b4afc69749dd
Version Created
October 22, 2023
Run on Replicate →