yorickvp/llava-13b 🖼️🔢📝 → 📝

▶️ 34.1M runs 📅 Oct 2023 ⚙️ Cog 0.9.5 🔗 GitHub 📄 Paper ⚖️ License

image-analysis image-captioning image-to-text visual-question-answering visual-understanding

About

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

Example Output

Prompt:

"Are you allowed to swim here?"

Output

Yes, you are allowed to swim in the lake near the pier. The image shows a pier extending out into the water, and the water appears to be calm, making it a suitable spot for swimming. However, it is always important to be cautious and aware of any potential hazards or regulations in the area before swimming.

Performance Metrics

5.33s Prediction Time

108.06s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KRULC43USWlEx4ZNkXltJqvYaHpEx2uJ4IyUQPRPwYb8SzPf/view.jpg",
  "top_p": 1,
  "prompt": "Are you allowed to swim here?",
  "max_tokens": 1024,
  "temperature": 0.2
}

Input Parameters

image (required) Type: string: Input image
top_p Type: numberDefault: 1Range: 0 - 1: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
prompt (required) Type: string: Prompt to use for text generation
max_tokens Type: integerDefault: 1024Range: 0 - ∞: Maximum number of tokens to generate. A word is generally 2-3 tokens
temperature Type: numberDefault: 0.2Range: 0 - ∞: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic

Output Schema

Output

Type: array • Items Type: string

Version Details

Version ID: 80537f9eead1a5bfa72d5ac6ea6414379be41d4d4f6679fd776e9535d1eb58bb
Version Created: July 17, 2024

Run on Replicate →