yimi81/yi-vl-6b 🖼️📝 → 🖼️

▶️ 309 runs 📅 Jan 2024 ⚙️ Cog 0.9.4 🔗 GitHub
image-analysis image-captioning image-to-text visual-question-answering

About

Yi-VL-34B is the first open-source 34B VL model worldwide. It demonstrates exceptional performance, ranking first among all existing open-source models in the latest benchmarks including MMMU and CMMMU.

Example Output

Output

In the heart of a bustling city, a man in a vibrant yellow shirt and a blue baseball cap is engrossed in the task of ironing his clothes on the back of a taxi. The taxi, a striking yellow with a contrasting red stripe, is parked on the side of a busy street. The man, standing on the back of the taxi, is holding a blue ironing board, which is holding a pile of clothes. The backdrop of this scene is a cityscape of tall buildings and a street lined with pink flags. The image captures a unique moment of everyday life in the city.

Performance Metrics

6.49s Prediction Time
725.65s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KKavLA9ZqUtUBRxlS8FpprmzydSzYnfTfYGbAkcTCCwMxy14/extreme_ironing.jpg",
  "query": "Describe this image."
}
Input Parameters
image (required) Type: string
Input image.
query Type: stringDefault: Describe this image.
Input query.
Output Schema

Output

Type: stringFormat: uri

Version Details
Version ID
601601e6b24714022046c21c2124b4f8c67541ef78ff298cfee49efcb700e579
Version Created
February 1, 2024
Run on Replicate →