character-ai/ovi-i2v 🔢🖼️📝🔊🎥 → 🖼️

⭐ Official ▶️ 14.6K runs 📅 Oct 2025 ⚙️ Cog 0.16.1 🔗 GitHub 📄 Paper ⚖️ License

image-to-video-with-audio

Performance

36.1sTypical run time

14.6KTotal runs

About

Ovi: generate videos with audio from image and text inputs

Example Output

Prompt:

"A bearded man wearing large dark sunglasses and a blue patterned cardigan sits in a studio, actively speaking into a large, suspended microphone. He has headphones on and gestures with his hands, displaying rings on his fingers. Behind him, a wall is covered with red, textured sound-dampening foam on the left, and a white banner on the right features the "CHOICE FM" logo and various social media handles like "@ilovechoicefm" with "RALEIGH" below it. The man intently addresses the microphone, articulating, <S>is talent. It's all about authenticity. You gotta be who you really are, especially if you're working<E>. He leans forward slightly as he speaks, maintaining a serious expression behind his sunglasses.. <AUDCAP>Clear male voice speaking into a microphone, a low background hum.<ENDAUDCAP>"

Output

Performance Metrics

36.14s Prediction Time

43.03s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/Nq2nAZ74W84SJcREj08DJryQe6JZpBG0cp2FDJYvrLJqA4pk/podcast.png",
  "prompt": "A bearded man wearing large dark sunglasses and a blue patterned cardigan sits in a studio, actively speaking into a large, suspended microphone. He has headphones on and gestures with his hands, displaying rings on his fingers. Behind him, a wall is covered with red, textured sound-dampening foam on the left, and a white banner on the right features the \"CHOICE FM\" logo and various social media handles like \"@ilovechoicefm\" with \"RALEIGH\" below it. The man intently addresses the microphone, articulating, <S>is talent. It's all about authenticity. You gotta be who you really are, especially if you're working<E>. He leans forward slightly as he speaks, maintaining a serious expression behind his sunglasses.. <AUDCAP>Clear male voice speaking into a microphone, a low background hum.<ENDAUDCAP>",
  "audio_negative_prompt": "robotic, muffled, echo, distorted",
  "video_negative_prompt": "jitter, bad hands, blur, distortion"
}

Input Parameters

seed Type: integer: Random seed. Set for reproducible generation.
image (required) Type: string: Input image to generate video from.
prompt (required) Type: string: Prompt for generated video.
audio_negative_prompt Type: stringDefault: robotic, muffled, echo, distorted: Negative prompt for audio generation.
video_negative_prompt Type: stringDefault: jitter, bad hands, blur, distortion: Negative prompt for video generation.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 968675
Time taken: 34.31402373313904 seconds

Version Details

Version ID: 8a2539bdbc0721aba314c6eed8481ca48ede9f25c37535669526e5e64fddd966
Version Created: October 6, 2025

Run on Replicate →