cjwbw/unival βπΌοΈπ β β
About
Unified Model for Image, Video, Audio and Language Tasks
Example Output
Output
{"answer":null,"output":"https://replicate.delivery/pbxt/Yd2zn7zhfM1kcSEPox8xdt9tejoaGc8nRypYBp6yJc49cGZRA/out.png"}
Performance Metrics
2.32s
Prediction Time
57.74s
Total Time
All Input Parameters
{
"task_type": "Visual Grounding",
"input_image": "https://replicate.delivery/pbxt/JKlg8Ru5XgAnr03vzxqrP5V9ztpyfYuaNpUm0hVXs0XEKmu1/1.png",
"instruction": "detached banana"
}
Input Parameters
- task_type
- Choose a task.
- input_audio
- Input audio.
- input_image
- Input image.
- input_video
- Input video.
- instruction
- Provide question for the VQA task, region for Visual Grounding task, and instruction for General tasks. The default instruction for Captioning task is βWhat does the image/video/audio describe?β
Output Schema
Version Details
- Version ID
00a9af2b0889db2a73f5002b3a3000ba4eec8c5fed0cb4fd842a3fc75ab0e98f- Version Created
- August 11, 2023