hexiaochun/minicpm_v26 📝🖼️❓ → 📝

▶️ 518 runs 📅 Sep 2024 ⚙️ Cog 0.9.20 🔗 GitHub ⚖️ License
image-to-text video-to-text

About

minicpm 视频理解

Example Output

Output

视频展示了几个场景,强调了学习和实践的重要性。首先,一个穿着围裙的人正在用蓝色胶带做标记,暗示着一个装修或艺术项目。接着,一个小组在办公室环境中开会,强调了团队合作和知识分享。然后,视频展示了一个人在篮球场上投篮,象征着通过实践学习和成长。最后,展示了一个人在厨房里撒糖粉,可能是在烘焙,强调了将知识应用于实践。这些场景共同传达了不断学习和实践知识的重要性,无论是在个人成长还是职业发展中。

Performance Metrics

24.42s Prediction Time
24.43s Total Time
All Input Parameters
{
  "prompt": "",
  "file_url": "https://replicate.delivery/pbxt/LajEeMwVnZJlbenXHUvj2MJzl5fnmwjr5J5LjPYMn9reMqqj/1712390716301.mp4",
  "file_type": "video"
}
Input Parameters
prompt Type: stringDefault:
提示文本
file_url (required) Type: string
输入文件的 URL(图片或视频)
file_type Default: image
输入类型
Output Schema

Output

Type: string

Example Execution Logs
video_duration 25.166667
frame_rate 0.47682118573746773
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-001.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-002.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-003.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-004.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-005.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-006.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-007.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-008.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-009.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-010.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-011.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-012.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-013.jpg
url file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-014.jpg
<|im_start|>system
You are an assistant who perfectly describes videos.<|im_end|>
<|im_start|>user
file:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-001.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-002.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-003.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-004.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-005.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-006.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-007.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-008.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-009.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-010.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-011.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-012.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-013.jpgfile:///tmp/10dc9111-b11f-47dc-9423-e82b3d253399/keyframe-014.jpg请描述图片/视频的主要内容。<|im_end|>
<|im_start|>assistant
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    76.18 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    68.48 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    68.30 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    67.74 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    67.90 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    70.24 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    69.58 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    67.44 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    72.91 ms
encode_image_with_clip: all 9 segments encoded in   629.70 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   635.01 ms by CLIP (    1.10 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    75.07 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    67.31 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    67.79 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    67.09 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    68.31 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    68.71 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    68.09 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    69.19 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    68.31 ms
encode_image_with_clip: all 9 segments encoded in   620.75 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   623.90 ms by CLIP (    1.08 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    75.80 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    68.15 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    67.69 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    67.95 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    67.96 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    66.99 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    68.83 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    72.49 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    69.42 ms
encode_image_with_clip: all 9 segments encoded in   626.20 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   630.69 ms by CLIP (    1.09 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    75.24 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    67.14 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    67.72 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    67.78 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    67.05 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    69.41 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    67.53 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    68.79 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    68.12 ms
encode_image_with_clip: all 9 segments encoded in   619.69 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   622.69 ms by CLIP (    1.08 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    76.58 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    72.30 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    69.15 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    68.02 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    68.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    69.00 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    68.73 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    69.57 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    72.13 ms
encode_image_with_clip: all 9 segments encoded in   635.34 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   638.89 ms by CLIP (    1.11 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    75.07 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    67.20 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    66.82 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    67.00 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    68.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    67.78 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    69.76 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    68.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    69.41 ms
encode_image_with_clip: all 9 segments encoded in   621.76 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   624.50 ms by CLIP (    1.08 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    82.98 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    70.69 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    69.41 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    68.73 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    69.15 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    71.77 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    68.08 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    67.70 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    67.07 ms
encode_image_with_clip: all 9 segments encoded in   636.33 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   639.31 ms by CLIP (    1.11 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    75.25 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    67.76 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    66.55 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    66.54 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    67.33 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    67.30 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    66.60 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    67.22 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    66.98 ms
encode_image_with_clip: all 9 segments encoded in   612.47 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   615.20 ms by CLIP (    1.07 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    75.58 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    67.99 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    69.02 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    67.96 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    67.24 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    66.92 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    67.82 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    68.33 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    74.76 ms
encode_image_with_clip: all 9 segments encoded in   626.42 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   630.41 ms by CLIP (    1.09 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    76.98 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    68.00 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    68.49 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    67.54 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    68.20 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    68.70 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    69.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    70.58 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    69.32 ms
encode_image_with_clip: all 9 segments encoded in   628.41 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   631.18 ms by CLIP (    1.10 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    77.13 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    68.67 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    69.19 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    68.17 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    68.60 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    68.87 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    68.22 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    68.45 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    68.14 ms
encode_image_with_clip: all 9 segments encoded in   626.37 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   629.28 ms by CLIP (    1.09 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    76.45 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    68.84 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    68.41 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    69.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    68.76 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    69.66 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    68.41 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    68.61 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    68.33 ms
encode_image_with_clip: all 9 segments encoded in   628.53 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   632.16 ms by CLIP (    1.10 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    75.76 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    66.59 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    67.18 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    67.11 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    67.03 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    66.82 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    66.88 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    67.33 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    66.89 ms
encode_image_with_clip: all 9 segments encoded in   612.42 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   615.14 ms by CLIP (    1.07 ms per image patch)
uhd_slice_image: multiple 9
uhd_slice_image: image_size: 1080 1920; source_image size: 336 602
uhd_slice_image: image_size: 1080 1920; best_grid: 2 4
uhd_slice_image: refine_image_size: 952 1680; refine_size: 952 1680
clip_image_preprocess: 336 602
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_preprocess: 476 420
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 9 encoded in    74.95 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 2 of 9 encoded in    66.29 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 3 of 9 encoded in    67.42 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 4 of 9 encoded in    66.91 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 5 of 9 encoded in    66.38 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 6 of 9 encoded in    66.38 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 7 of 9 encoded in    66.28 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 8 of 9 encoded in    66.59 ms
clip_image_build_graph: 476 420
encode_image_with_clip: step 9 of 9 encoded in    69.87 ms
encode_image_with_clip: all 9 segments encoded in   612.04 ms
encode_image_with_clip: load_image_size 1080 1920
encode_image_with_clip: image embedding created: 576 tokens
encode_image_with_clip: image encoded in   615.48 ms by CLIP (    1.07 ms per image patch)
Llama.generate: 8093 prefix-match hit, remaining 1 prompt tokens to eval
llama_print_timings:        load time =   20518.66 ms
llama_print_timings:      sample time =      11.62 ms /   114 runs   (    0.10 ms per token,  9807.30 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1460.56 ms /   114 runs   (   12.81 ms per token,    78.05 tokens per second)
llama_print_timings:       total time =    1563.14 ms /   114 tokens
{'id': 'chatcmpl-d42d4cad-fda8-421c-bfa6-aabe064ddd65', 'object': 'chat.completion', 'created': 1725860551, 'model': './models/MiniCPM-V-2_6/ggml-model-Q4_K_M.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '视频展示了几个场景,强调了学习和实践的重要性。首先,一个穿着围裙的人正在用蓝色胶带做标记,暗示着一个装修或艺术项目。接着,一个小组在办公室环境中开会,强调了团队合作和知识分享。然后,视频展示了一个人在篮球场上投篮,象征着通过实践学习和成长。最后,展示了一个人在厨房里撒糖粉,可能是在烘焙,强调了将知识应用于实践。这些场景共同传达了不断学习和实践知识的重要性,无论是在个人成长还是职业发展中。'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 8094, 'completion_tokens': 113, 'total_tokens': 8207}}
Version Details
Version ID
448756a7ca13d59b63384116ce755a677c3b5526173b5585bb58ab221f844d7f
Version Created
September 9, 2024
Run on Replicate →