aodianyun/minicpm-v-26 🖼️📝 → 📝

▶️ 15 runs 📅 Sep 2024 ⚙️ Cog 0.9.21
image-to-text video-to-text

Example Output

Prompt:

"Describe the video in great detail."

Output

The video depicts a a cooking process involving a b. Here is the detailed step-by-step description:

  1. Initial Preparation: The video begins with an empty pan on a and a utensils placed next to it.

  2. Adding Ingredients: A person starts adding ingredients into the pan, which appears to be some form of liquid or sauce.

  3. Cooking Process Begins: As more ingredients are added, the pan starts heating up, indicating the commencement of the cooking process.

  4. Monitoring Cooking: The person continues to monitor the cooking process, ensuring that everything is being cooked evenly.

  5. Adding Vegetables: At one point, green vegetables are added to the pan, suggesting a multi-ingredient dish is being prepared.

  6. Stirring and Mixing: Throughout the cooking process, the individual uses a utensil to stir and mix the ingredients, ensuring they cook uniformly.

  7. Final Stages: Near the end of the video, the contents in the pan appears well-cooked, with steam rising from it.

  8. Plating: The final part of the video shows the pan being lifted off the heat source, and the cooked mixture being transferred into a container, likely for serving.

  9. Conclusion: The video concludes with the pan being removed completely, leaving only the now container filled with the cooked mixture.

This sequence provides a visual representation of how a can be used for cooking various dishes, emphasizing the importance of monitoring and stirring during the cooking process.

Performance Metrics

15.14s Prediction Time
186.94s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/LYg9UV9q67McivPuwvpQPjClOZ7KqaOHH5F2DgbfCZspIhOQ/input.mp4",
  "prompt": "Describe the video in great detail."
}
Input Parameters
image (required) Type: string
Input image or video
prompt Type: stringDefault:
Prompt
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
video_duration 20.099
frame_rate 0.49753719090502013
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Version Details
Version ID
4e42bfc1891e0899109f0bd64e58948dabd872d9c5ebd7ecc1cc290a4b7e8f51
Version Created
September 13, 2024
Run on Replicate →