hexiaochun/minicpm_v26 📝🖼️❓ → 📝
About
minicpm 视频理解
Example Output
Output
视频展示了几个场景,强调了学习和实践的重要性。首先,一个穿着围裙的人正在用蓝色胶带做标记,暗示着一个装修或艺术项目。接着,一个小组在办公室环境中开会,强调了团队合作和知识分享。然后,视频展示了一个人在篮球场上投篮,象征着通过实践学习和成长。最后,展示了一个人在厨房里撒糖粉,可能是在烘焙,强调了将知识应用于实践。这些场景共同传达了不断学习和实践知识的重要性,无论是在个人成长还是职业发展中。
Performance Metrics
24.42s
Prediction Time
24.43s
Total Time
All Input Parameters
{
"prompt": "",
"file_url": "https://replicate.delivery/pbxt/LajEeMwVnZJlbenXHUvj2MJzl5fnmwjr5J5LjPYMn9reMqqj/1712390716301.mp4",
"file_type": "video"
}
Input Parameters
- prompt
- 提示文本
- file_url (required)
- 输入文件的 URL(图片或视频)
- file_type
- 输入类型
Output Schema
Output
Example Execution Logs
video_duration 25.166667
frame_rate 0.47682118573746773
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-001.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-002.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-003.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-004.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-005.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-006.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-007.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-008.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-009.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-010.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-011.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-012.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-013.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-014.jpg
<|im_start|>system
You are an assistant who perfectly describes videos.<|im_end|>
<|im_start|>user
file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-001.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-002.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-003.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-004.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-005.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-006.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-007.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-008.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-009.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-010.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-011.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-012.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-013.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-014.jpg请描述图片/视频的主要内容。<|im_end|>
<|im_start|>assistant
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 76.18 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 68.48 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 68.30 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 67.74 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 67.90 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 70.24 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 69.58 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 67.44 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 72.91 ms
encode_image_with_clip: all 9 segments encoded in 629.70 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 635.01 ms by CLIP ( 1.10 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 75.07 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 67.31 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 67.79 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 67.09 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 68.31 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 68.71 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 68.09 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 69.19 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 68.31 ms
encode_image_with_clip: all 9 segments encoded in 620.75 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 623.90 ms by CLIP ( 1.08 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 75.80 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 68.15 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 67.69 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 67.95 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 67.96 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 66.99 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 68.83 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 72.49 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 69.42 ms
encode_image_with_clip: all 9 segments encoded in 626.20 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 630.69 ms by CLIP ( 1.09 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 75.24 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 67.14 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 67.72 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 67.78 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 67.05 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 69.41 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 67.53 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 68.79 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 68.12 ms
encode_image_with_clip: all 9 segments encoded in 619.69 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 622.69 ms by CLIP ( 1.08 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 76.58 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 72.30 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 69.15 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 68.02 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 68.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 69.00 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 68.73 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 69.57 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 72.13 ms
encode_image_with_clip: all 9 segments encoded in 635.34 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 638.89 ms by CLIP ( 1.11 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 75.07 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 67.20 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 66.82 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 67.00 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 68.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 67.78 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 69.76 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 68.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 69.41 ms
encode_image_with_clip: all 9 segments encoded in 621.76 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 624.50 ms by CLIP ( 1.08 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 82.98 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 70.69 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 69.41 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 68.73 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 69.15 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 71.77 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 68.08 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 67.70 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 67.07 ms
encode_image_with_clip: all 9 segments encoded in 636.33 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 639.31 ms by CLIP ( 1.11 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 75.25 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 67.76 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 66.55 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 66.54 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 67.33 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 67.30 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 66.60 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 67.22 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 66.98 ms
encode_image_with_clip: all 9 segments encoded in 612.47 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 615.20 ms by CLIP ( 1.07 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 75.58 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 67.99 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 69.02 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 67.96 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 67.24 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 66.92 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 67.82 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 68.33 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 74.76 ms
encode_image_with_clip: all 9 segments encoded in 626.42 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 630.41 ms by CLIP ( 1.09 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 76.98 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 68.00 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 68.49 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 67.54 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 68.20 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 68.70 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 69.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 70.58 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 69.32 ms
encode_image_with_clip: all 9 segments encoded in 628.41 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 631.18 ms by CLIP ( 1.10 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 77.13 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 68.67 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 69.19 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 68.17 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 68.60 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 68.87 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 68.22 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 68.45 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 68.14 ms
encode_image_with_clip: all 9 segments encoded in 626.37 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 629.28 ms by CLIP ( 1.09 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 76.45 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 68.84 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 68.41 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 69.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 68.76 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 69.66 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 68.41 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 68.61 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 68.33 ms
encode_image_with_clip: all 9 segments encoded in 628.53 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 632.16 ms by CLIP ( 1.10 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 75.76 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 66.59 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 67.18 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 67.11 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 67.03 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 66.82 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 66.88 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 67.33 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 66.89 ms
encode_image_with_clip: all 9 segments encoded in 612.42 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 615.14 ms by CLIP ( 1.07 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in 74.95 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in 66.29 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in 67.42 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in 66.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in 66.38 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in 66.38 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in 66.28 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in 66.59 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in 69.87 ms
encode_image_with_clip: all 9 segments encoded in 612.04 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in 615.48 ms by CLIP ( 1.07 ms per image patch)
Llama.generate: 8093 prefix-match hit, remaining 1 prompt tokens to eval
llama_print_timings: load time = 20518.66 ms
llama_print_timings: sample time = 11.62 ms / 114 runs ( 0.10 ms per token, 9807.30 tokens per second)
llama_print_timings: prompt eval time = 0.00 ms / 0 tokens ( -nan ms per token, -nan tokens per second)
llama_print_timings: eval time = 1460.56 ms / 114 runs ( 12.81 ms per token, 78.05 tokens per second)
llama_print_timings: total time = 1563.14 ms / 114 tokens
{'id': 'chatcmpl-d42d4cad-fda8-421c-bfa6-aabe064ddd65', 'object': 'chat.completion', 'created': 1725860551, 'model': './models/MiniCPM-V-2_6/ggml-model-Q4_K_M.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '视频展示了几个场景,强调了学习和实践的重要性。首先,一个穿着围裙的人正在用蓝色胶带做标记,暗示着一个装修或艺术项目。接着,一个小组在办公室环境中开会,强调了团队合作和知识分享。然后,视频展示了一个人在篮球场上投篮,象征着通过实践学习和成长。最后,展示了一个人在厨房里撒糖粉,可能是在烘焙,强调了将知识应用于实践。这些场景共同传达了不断学习和实践知识的重要性,无论是在个人成长还是职业发展中。'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 8094, 'completion_tokens': 113, 'total_tokens': 8207}}
Version Details
- Version ID
448756a7ca13d59b63384116ce755a677c3b5526173b5585bb58ab221f844d7f- Version Created
- September 9, 2024