yorickvp/llava-13b 🖼️🔢📝 → 📝

▶️ 31.6M runs 📅 Oct 2023 ⚙️ Cog 0.9.5 🔗 GitHub 📄 Paper ⚖️ License
image-analysis image-captioning image-to-text visual-question-answering visual-understanding

About

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

Example Output

Prompt:

"Are you allowed to swim here?"

Output

Yes, you are allowed to swim in the lake near the pier. The image shows a pier extending out into the water, and the water appears to be calm, making it a suitable spot for swimming. However, it is always important to be cautious and aware of any potential hazards or regulations in the area before swimming.

Performance Metrics

5.33s Prediction Time
108.06s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KRULC43USWlEx4ZNkXltJqvYaHpEx2uJ4IyUQPRPwYb8SzPf/view.jpg",
  "top_p": 1,
  "prompt": "Are you allowed to swim here?",
  "max_tokens": 1024,
  "temperature": 0.2
}
Input Parameters
image (required) Type: string
Input image
top_p Type: numberDefault: 1Range: 0 - 1
When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
prompt (required) Type: string
Prompt to use for text generation
max_tokens Type: integerDefault: 1024Range: 0 - ∞
Maximum number of tokens to generate. A word is generally 2-3 tokens
temperature Type: numberDefault: 0.2Range: 0 - ∞
Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic
Output Schema

Output

Type: arrayItems Type: string

Version Details
Version ID
80537f9eead1a5bfa72d5ac6ea6414379be41d4d4f6679fd776e9535d1eb58bb
Version Created
July 17, 2024
Run on Replicate →