lucataco/nemotron-nano-vl-8b-v1 🖼️📝🔢 → 📝

▶️ 17 runs 📅 Oct 2025 ⚙️ Cog 0.16.8 ⚖️ License

document-to-json document-understanding image-analysis image-captioning image-to-text ocr visual-question-answering visual-understanding

Performance

4.4sTypical run time

~148sCold start (first call)

17Total runs

About

a document intelligence vision language model (VLMs) that enables the ability to query and summarize images from the physical or virtual world

Example Output

Output

The image captures a serene scene of a wooden boardwalk meandering through a lush, green field. The boardwalk, constructed from wooden planks, is the central focus of the image, leading the viewer's eye towards the horizon. It's surrounded by tall grass and wildflowers, adding a sense of depth and texture to the scene. The sky above is a clear blue, dotted with fluffy white clouds, suggesting a calm, sunny day. The perspective of the image is from ground level, looking down the boardwalk, which enhances the sense of depth and invites the viewer to imagine walking down this peaceful path.

Performance Metrics

4.40s Prediction Time

148.31s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/NtGUwxfmzk4qfbMgLKK6MzkrCFF8TQZyONYXEt09cUpLMEab/field.jpeg",
  "question": "Describe the image.",
  "temperature": 0,
  "max_new_tokens": 1024
}

Input Parameters

image (required) Type: string: Single input image
question Type: stringDefault: Describe the image.: Question or prompt about the image
temperature Type: numberDefault: 0Range: 0 - 2: Temperature for sampling. Set to 0 for greedy decoding.
max_new_tokens Type: integerDefault: 1024Range: 1 - 4096: Maximum number of tokens to generate

Output Schema

Output

Type: string

Example Execution Logs

Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.

Version Details

Version ID: 0bbc267266cd70dfb9fc411b3918869d34588adbfa6527b02112648a75681891
Version Created: October 15, 2025

Run on Replicate →