hexiaochun/minicpm_v26 📝🖼️❓ → 📝
About
minicpm 视频理解
Example Output
Output
视频展示了几个场景,强调了学习和实践的重要性。首先,一个穿着围裙的人正在用蓝色胶带做标记,暗示着一个装修或艺术项目。接着,一个小组在办公室环境中开会,强调了团队合作和知识分享。然后,视频展示了一个人在篮球场上投篮,象征着通过实践学习和成长。最后,展示了一个人在厨房里撒糖粉,可能是在烘焙,强调了将知识应用于实践。这些场景共同传达了不断学习和实践知识的重要性,无论是在个人成长还是职业发展中。
Performance Metrics
24.42s
Prediction Time
24.43s
Total Time
All Input Parameters
{ "prompt": "", "file_url": "https://replicate.delivery/pbxt/LajEeMwVnZJlbenXHUvj2MJzl5fnmwjr5J5LjPYMn9reMqqj/1712390716301.mp4", "file_type": "video" }
Input Parameters
- prompt
- 提示文本
- file_url (required)
- 输入文件的 URL(图片或视频)
- file_type
- 输入类型
Output Schema
Output
Example Execution Logs
video_duration 25.166667 frame_rate 0.47682118573746773 url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-001.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-002.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-003.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-004.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-005.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-006.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-007.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-008.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-009.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-010.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-011.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-012.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-013.jpg url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-014.jpg <|im_start|>system You are an assistant who perfectly describes videos.<|im_end|> <|im_start|>user file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-001.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-002.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-003.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-004.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-005.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-006.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-007.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-008.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-009.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-010.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-011.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-012.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-013.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-014.jpg请描述图片/视频的主要内容。<|im_end|> <|im_start|>assistant uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 76.18 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 68.48 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 68.30 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 67.74 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 67.90 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 70.24 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 69.58 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 67.44 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 72.91 ms encode_image_with_clip: all 9 segments encoded in 629.70 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 635.01 ms by CLIP ( 1.10 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 75.07 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 67.31 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 67.79 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 67.09 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 68.31 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 68.71 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 68.09 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 69.19 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 68.31 ms encode_image_with_clip: all 9 segments encoded in 620.75 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 623.90 ms by CLIP ( 1.08 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 75.80 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 68.15 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 67.69 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 67.95 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 67.96 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 66.99 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 68.83 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 72.49 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 69.42 ms encode_image_with_clip: all 9 segments encoded in 626.20 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 630.69 ms by CLIP ( 1.09 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 75.24 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 67.14 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 67.72 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 67.78 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 67.05 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 69.41 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 67.53 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 68.79 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 68.12 ms encode_image_with_clip: all 9 segments encoded in 619.69 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 622.69 ms by CLIP ( 1.08 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 76.58 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 72.30 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 69.15 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 68.02 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 68.91 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 69.00 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 68.73 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 69.57 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 72.13 ms encode_image_with_clip: all 9 segments encoded in 635.34 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 638.89 ms by CLIP ( 1.11 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 75.07 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 67.20 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 66.82 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 67.00 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 68.91 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 67.78 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 69.76 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 68.91 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 69.41 ms encode_image_with_clip: all 9 segments encoded in 621.76 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 624.50 ms by CLIP ( 1.08 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 82.98 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 70.69 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 69.41 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 68.73 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 69.15 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 71.77 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 68.08 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 67.70 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 67.07 ms encode_image_with_clip: all 9 segments encoded in 636.33 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 639.31 ms by CLIP ( 1.11 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 75.25 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 67.76 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 66.55 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 66.54 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 67.33 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 67.30 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 66.60 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 67.22 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 66.98 ms encode_image_with_clip: all 9 segments encoded in 612.47 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 615.20 ms by CLIP ( 1.07 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 75.58 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 67.99 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 69.02 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 67.96 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 67.24 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 66.92 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 67.82 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 68.33 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 74.76 ms encode_image_with_clip: all 9 segments encoded in 626.42 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 630.41 ms by CLIP ( 1.09 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 76.98 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 68.00 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 68.49 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 67.54 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 68.20 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 68.70 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 69.91 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 70.58 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 69.32 ms encode_image_with_clip: all 9 segments encoded in 628.41 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 631.18 ms by CLIP ( 1.10 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 77.13 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 68.67 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 69.19 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 68.17 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 68.60 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 68.87 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 68.22 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 68.45 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 68.14 ms encode_image_with_clip: all 9 segments encoded in 626.37 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 629.28 ms by CLIP ( 1.09 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 76.45 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 68.84 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 68.41 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 69.91 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 68.76 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 69.66 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 68.41 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 68.61 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 68.33 ms encode_image_with_clip: all 9 segments encoded in 628.53 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 632.16 ms by CLIP ( 1.10 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 75.76 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 66.59 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 67.18 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 67.11 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 67.03 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 66.82 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 66.88 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 67.33 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 66.89 ms encode_image_with_clip: all 9 segments encoded in 612.42 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 615.14 ms by CLIP ( 1.07 ms per image patch) uhd_slice_image: multiple 9 uhd_slice_image: image_size: 1080 1920; source_image size: 336 602 uhd_slice_image: image_size: 1080 1920; best_grid: 2 4 uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680 clip_image_preprocess: 336 602 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_preprocess: 476 420 clip_image_build_graph: 336 602 encode_image_with_clip: step 1 of 9 encoded in 74.95 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 2 of 9 encoded in 66.29 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 3 of 9 encoded in 67.42 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 4 of 9 encoded in 66.91 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 5 of 9 encoded in 66.38 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 6 of 9 encoded in 66.38 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 7 of 9 encoded in 66.28 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 8 of 9 encoded in 66.59 ms clip_image_build_graph: 476 420 encode_image_with_clip: step 9 of 9 encoded in 69.87 ms encode_image_with_clip: all 9 segments encoded in 612.04 ms encode_image_with_clip: load_image_size 1080 1920 encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 615.48 ms by CLIP ( 1.07 ms per image patch) Llama.generate: 8093 prefix-match hit, remaining 1 prompt tokens to eval llama_print_timings: load time = 20518.66 ms llama_print_timings: sample time = 11.62 ms / 114 runs ( 0.10 ms per token, 9807.30 tokens per second) llama_print_timings: prompt eval time = 0.00 ms / 0 tokens ( -nan ms per token, -nan tokens per second) llama_print_timings: eval time = 1460.56 ms / 114 runs ( 12.81 ms per token, 78.05 tokens per second) llama_print_timings: total time = 1563.14 ms / 114 tokens {'id': 'chatcmpl-d42d4cad-fda8-421c-bfa6-aabe064ddd65', 'object': 'chat.completion', 'created': 1725860551, 'model': './models/MiniCPM-V-2_6/ggml-model-Q4_K_M.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '视频展示了几个场景,强调了学习和实践的重要性。首先,一个穿着围裙的人正在用蓝色胶带做标记,暗示着一个装修或艺术项目。接着,一个小组在办公室环境中开会,强调了团队合作和知识分享。然后,视频展示了一个人在篮球场上投篮,象征着通过实践学习和成长。最后,展示了一个人在厨房里撒糖粉,可能是在烘焙,强调了将知识应用于实践。这些场景共同传达了不断学习和实践知识的重要性,无论是在个人成长还是职业发展中。'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 8094, 'completion_tokens': 113, 'total_tokens': 8207}}
Version Details
- Version ID
448756a7ca13d59b63384116ce755a677c3b5526173b5585bb58ab221f844d7f
- Version Created
- September 9, 2024