character-ai/ovi-i2v 🔢🖼️📝🔊🎥 → 🖼️

⭐ Official ▶️ 3.6K runs 📅 Oct 2025 ⚙️ Cog 0.16.1 🔗 GitHub 📄 Paper ⚖️ License
image-to-video-with-audio

About

Ovi: generate videos with audio from image and text inputs

Example Output

Prompt:

"A bearded man wearing large dark sunglasses and a blue patterned cardigan sits in a studio, actively speaking into a large, suspended microphone. He has headphones on and gestures with his hands, displaying rings on his fingers. Behind him, a wall is covered with red, textured sound-dampening foam on the left, and a white banner on the right features the "CHOICE FM" logo and various social media handles like "@ilovechoicefm" with "RALEIGH" below it. The man intently addresses the microphone, articulating, <S>is talent. It's all about authenticity. You gotta be who you really are, especially if you're working<E>. He leans forward slightly as he speaks, maintaining a serious expression behind his sunglasses.. <AUDCAP>Clear male voice speaking into a microphone, a low background hum.<ENDAUDCAP>"

Output

Performance Metrics

36.14s Prediction Time
43.03s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/Nq2nAZ74W84SJcREj08DJryQe6JZpBG0cp2FDJYvrLJqA4pk/podcast.png",
  "prompt": "A bearded man wearing large dark sunglasses and a blue patterned cardigan sits in a studio, actively speaking into a large, suspended microphone. He has headphones on and gestures with his hands, displaying rings on his fingers. Behind him, a wall is covered with red, textured sound-dampening foam on the left, and a white banner on the right features the \"CHOICE FM\" logo and various social media handles like \"@ilovechoicefm\" with \"RALEIGH\" below it. The man intently addresses the microphone, articulating, <S>is talent. It's all about authenticity. You gotta be who you really are, especially if you're working<E>. He leans forward slightly as he speaks, maintaining a serious expression behind his sunglasses.. <AUDCAP>Clear male voice speaking into a microphone, a low background hum.<ENDAUDCAP>",
  "audio_negative_prompt": "robotic, muffled, echo, distorted",
  "video_negative_prompt": "jitter, bad hands, blur, distortion"
}
Input Parameters
seed Type: integer
Random seed. Set for reproducible generation.
image (required) Type: string
Input image to generate video from.
prompt (required) Type: string
Prompt for generated video.
audio_negative_prompt Type: stringDefault: robotic, muffled, echo, distorted
Negative prompt for audio generation.
video_negative_prompt Type: stringDefault: jitter, bad hands, blur, distortion
Negative prompt for video generation.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 968675
Time taken: 34.31402373313904 seconds
Version Details
Version ID
8a2539bdbc0721aba314c6eed8481ca48ede9f25c37535669526e5e64fddd966
Version Created
October 6, 2025
Run on Replicate →