nelsonjchen/minigpt-4_vicuna-7b 🖼️📝🔢 → 📝

▶️ 9.9K runs 📅 Apr 2023 ⚙️ Cog 0.6.1 🔗 GitHub ⚖️ License

image-captioning image-to-text visual-question-answering

About

MiniGPT-4 w/ Vicuna-7B (Image Question/Captioning Use)

Example Output

Output

The image shows a group of people standing at a train station platform, waiting for a train to arrive. The platform is made of concrete and has a metal railing on one side. There are several people in the image, all wearing different types of clothing and carrying various items such as bags and umbrellas. Some of the people are looking at their phones or talking to each other, while others are just standing around waiting. The sky is overcast and there are a few clouds visible in the distance.

Performance Metrics

9.72s Prediction Time

9.88s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/Ii9Eo0VGYLq2KIfgz3zKMr2QEQRl9n4a45E910Ctu4btAoxY/pexels-alexandra-folster-6307862.jpg",
  "message": "Please describe the image.",
  "num_beams": 10,
  "temperature": 1,
  "max_new_tokens": 500
}

Input Parameters

image (required) Type: string: Input image to discuss
message Type: stringDefault: Please describe the image.: Message to send to MiniGPT-4.
num_beams Type: integerDefault: 1Range: 1 - 10: beam search numbers. More beams require more VRAM.
temperature Type: numberDefault: 0.75Range: 0.1 - 2: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.
max_new_tokens Type: integerDefault: 500Range: 1 - ∞: Maximum number of tokens to generate. A word is generally 2-3 tokens (minimum: 1)

Output Schema

Output

Type: string

Version Details

Version ID: 4786ec1df0917116b1f58996d93a25b06e8d4dc1641da94a7ac6b97e26cd6394
Version Created: April 25, 2023

Run on Replicate →