bytedance/omni-human 🖼️ → 🖼️

⭐ Official ▶️ 161.7K runs 📅 Jul 2025 ⚙️ Cog 0.16.8

audio-to-video image-to-video image-to-video-with-audio lipsync

Performance

157.0sTypical run time

161.7KTotal runs

About

Turns your audio/video/images into professional-quality animated videos

Example Output

Output

Performance Metrics

157.05s Prediction Time

157.06s Total Time

All Input Parameters

{
  "audio": "https://replicate.delivery/pbxt/NSVCaWKzBxsoYSGGDvXNbyYAANdTI1LtQPGpHxyKTCxtaMo3/ElevenLabs_2025-08-01T09_30_34_Laura_pre_sp100_s50_sb75_v3.mp3",
  "image": "https://replicate.delivery/pbxt/NSVCZtHBKfQIQnoDEhoehnlyvDxrbZIyN4bUlpB4YiKCzl5e/image.png"
}

Input Parameters

audio (required) Type: string: Input audio file (MP3, WAV, etc.). For the best quality outputs audio should be no longer than 15 seconds. After 15 seconds the video quality will begin to degrade. If you have a lot of audio you want to process, we recommend splitting it into 15 second chunks.
image (required) Type: string: Input image containing a human subject, face or character.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Checking for human subject...
Generating video...
Generated video in 153.5sec
Downloading 2384667 bytes
Downloaded 2.27MB in 2.82sec

Version Details

Version ID: 566f1b03016969ac39e242c1ae4a39034686ca8850fc3dba83dceaceb96f74b2
Version Created: November 10, 2025

Run on Replicate →