lucataco/internvl3_5-30b 🔢🖼️📝✓ → 📝
About
An advanced open-source multimodal large language model from the InternVL3.5 family, specializing in versatile vision-language tasks, and enhanced reasoning
Example Output
Prompt:
"Please describe this video in detail"
Output
In a sunlit park with lush greenery and people in the background, a monkey is seen riding a miniature skateboard. The skateboard has a vibrant design with red wheels and a black deck. The monkey appears focused, skillfully balancing and propelling itself forward. The environment is lively yet relaxed, with people sitting under umbrellas and enjoying the park. The light is bright, indicating daytime, casting soft shadows on the pavement. The monkey's fur is a mix of light brown and beige, and it seems to be enjoying the ride, navigating the skateboard with apparent ease. The path is clear, surrounded by trees and benches, creating a serene atmosphere. The colors are vivid, with the green of the trees contrasting against the warm tones of the monkey and the colorful skateboard.
Performance Metrics
28.18s
Prediction Time
28.19s
Total Time
All Input Parameters
{
"top_k": 50,
"top_p": 0.95,
"video": "https://replicate.delivery/pbxt/NviKBcXqpiFp2k6G0h8LALACiuE6vkw9T7oXb2ycpvQ6Bsma/monkey_on_skates.mp4",
"prompt": "Please describe this video in detail",
"num_beams": 1,
"max_frames": 8,
"temperature": 0.2,
"thinking_mode": false,
"max_new_tokens": 1024,
"repetition_penalty": 1
}
Input Parameters
- top_k
- Top-k sampling parameter
- top_p
- Top-p sampling parameter
- video
- Input video file for video understanding (alternative to images)
- images
- Input image(s) - can be a single image or multiple images as a list (for multi-image understanding)
- prompt
- Text prompt/question about the image(s) or video
- num_beams
- Number of beams for beam search (1 = no beam search)
- max_frames
- Maximum number of frames to extract from video (evenly spaced)
- temperature
- Temperature for sampling (higher = more random)
- thinking_mode
- Enable thinking mode for step-by-step reasoning
- max_new_tokens
- Maximum number of tokens to generate
- repetition_penalty
- Repetition penalty (1.0 = no penalty)
Output Schema
Output
Example Execution Logs
Processing video: /tmp/tmpqy2x4p3imonkey_on_skates.mp4 Extracted 8 frames from video Processing 8 image(s)... Generating response with max_new_tokens=1024, temperature=0.2... Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation. Generated response...
Version Details
- Version ID
92713bb56e2e4827477501fc512d80d7287bc34e4ccb7c306afd9bdaef8b7eeb- Version Created
- October 22, 2025