lucataco/nemotron-nano-vl-8b-v1 🖼️📝🔢 → 📝
About
a document intelligence vision language model (VLMs) that enables the ability to query and summarize images from the physical or virtual world

Example Output
Output
The image captures a serene scene of a wooden boardwalk meandering through a lush, green field. The boardwalk, constructed from wooden planks, is the central focus of the image, leading the viewer's eye towards the horizon. It's surrounded by tall grass and wildflowers, adding a sense of depth and texture to the scene. The sky above is a clear blue, dotted with fluffy white clouds, suggesting a calm, sunny day. The perspective of the image is from ground level, looking down the boardwalk, which enhances the sense of depth and invites the viewer to imagine walking down this peaceful path.
Performance Metrics
4.40s
Prediction Time
148.31s
Total Time
All Input Parameters
{ "image": "https://replicate.delivery/pbxt/NtGUwxfmzk4qfbMgLKK6MzkrCFF8TQZyONYXEt09cUpLMEab/field.jpeg", "question": "Describe the image.", "temperature": 0, "max_new_tokens": 1024 }
Input Parameters
- image (required)
- Single input image
- question
- Question or prompt about the image
- temperature
- Temperature for sampling. Set to 0 for greedy decoding.
- max_new_tokens
- Maximum number of tokens to generate
Output Schema
Output
Example Execution Logs
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
Version Details
- Version ID
0bbc267266cd70dfb9fc411b3918869d34588adbfa6527b02112648a75681891
- Version Created
- October 15, 2025