adirik/vila-2.7b 🖼️🔢📝 → 📝
About
[Non-commerical] A multi-image visual language model

Example Output
Prompt:
"Can you describe this image?"
Output
The image captures a moment of creativity and playfulness. A person's hand, adorned with a rainbow of colors, is the central focus. The hand is painted with a vibrant array of colors, including hues of red, orange, yellow, green, blue, and purple. The colors are not just confined to the hand, but also extend to the person's wrist, creating a striking contrast against the white background. The image is taken from a low angle, adding a sense of grandeur to the hand and its colors. The overall composition of the image suggests a celebration of individuality and the joy of self-expression through art.
Performance Metrics
5.64s
Prediction Time
85.54s
Total Time
All Input Parameters
{ "image": "https://replicate.delivery/pbxt/KYxCJBjNTIOz189qT2R55a8otIFVM1igj8jiDcO8qMx39WXV/3.jpg", "top_p": 1, "prompt": "Can you describe this image?", "num_beams": 1, "max_tokens": 512, "temperature": 0.2 }
Input Parameters
- image (required)
- Image 1 to discuss
- top_p
- When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
- prompt (required)
- Query to ask the model about the image
- num_beams
- Number of beams to use when decoding text; higher values are slower but more accurate
- max_tokens
- Maximum number of tokens to generate
- temperature
- When decoding text, higher values make the model more creative
Output Schema
Output
Version Details
- Version ID
f8a85ee8d0ab7c84433c5f6c00855142edf4783b52a1b66626d47ef260028750
- Version Created
- March 13, 2024