lucataco/nemotron-nano-vl-8b-v1 🖼️📝🔢 → 📝

▶️ 9 runs 📅 Oct 2025 ⚙️ Cog 0.16.8 ⚖️ License
image-analysis image-captioning image-to-text visual-question-answering visual-understanding

About

a document intelligence vision language model (VLMs) that enables the ability to query and summarize images from the physical or virtual world

Example Output

Output

The image captures a serene scene of a wooden boardwalk meandering through a lush, green field. The boardwalk, constructed from wooden planks, is the central focus of the image, leading the viewer's eye towards the horizon. It's surrounded by tall grass and wildflowers, adding a sense of depth and texture to the scene. The sky above is a clear blue, dotted with fluffy white clouds, suggesting a calm, sunny day. The perspective of the image is from ground level, looking down the boardwalk, which enhances the sense of depth and invites the viewer to imagine walking down this peaceful path.

Performance Metrics

4.40s Prediction Time
148.31s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/NtGUwxfmzk4qfbMgLKK6MzkrCFF8TQZyONYXEt09cUpLMEab/field.jpeg",
  "question": "Describe the image.",
  "temperature": 0,
  "max_new_tokens": 1024
}
Input Parameters
image (required) Type: string
Single input image
question Type: stringDefault: Describe the image.
Question or prompt about the image
temperature Type: numberDefault: 0Range: 0 - 2
Temperature for sampling. Set to 0 for greedy decoding.
max_new_tokens Type: integerDefault: 1024Range: 1 - 4096
Maximum number of tokens to generate
Output Schema

Output

Type: string

Example Execution Logs
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
Version Details
Version ID
0bbc267266cd70dfb9fc411b3918869d34588adbfa6527b02112648a75681891
Version Created
October 15, 2025
Run on Replicate →