yorickvp/llava-v1.6-mistral-7b 🖼️🔢📝 → 📝
About
LLaVA v1.6: Large Language and Vision Assistant (Mistral-7B)

Example Output
Prompt:
"What is unusual about this image?"
Output
The unusual aspect of this image is that a man is standing on the back of a yellow SUV, ironing clothes. This is not a typical scene, as one would expect to see the man either inside the vehicle or on the ground, rather than standing on the back of the SUV. The act of ironing clothes while standing on the back of a moving vehicle is both unusual and potentially dangerous.
Performance Metrics
4.45s
Prediction Time
144.84s
Total Time
All Input Parameters
{ "image": "https://replicate.delivery/pbxt/KKNB7w6pjN79j5pHDSyYXa5EwaQE9FL5fx6Qa83XMn1HYuKm/extreme_ironing.jpg", "top_p": 1, "prompt": "What is unusual about this image?", "max_tokens": 1024, "temperature": 0.2 }
Input Parameters
- image
- Input image
- top_p
- When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
- prompt (required)
- Prompt to use for text generation
- history
- List of earlier chat messages, alternating roles, starting with user input. Include <image> to specify which message to attach the image to.
- max_tokens
- Maximum number of tokens to generate. A word is generally 2-3 tokens
- temperature
- Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic
Output Schema
Output
Example Execution Logs
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Version Details
- Version ID
19be067b589d0c46689ffa7cc3ff321447a441986a7694c01225973c2eafc874
- Version Created
- February 2, 2024