bytedance/omni-human-1.5 🔢🖼️📝✓ → 🖼️

⭐ Official ▶️ 9.8K runs 📅 Oct 2025 ⚙️ Cog 0.16.9
image-to-video image-to-video-with-audio lipsync video-consistent-character-generation

About

A film-grade digital human model that generates realistic video from a single image, audio clip, and optional text prompt.

Example Output

Prompt:

"A woman sings and strums her guitar"

Output

Performance Metrics

196.65s Prediction Time
196.66s Total Time
All Input Parameters
{
  "audio": "https://replicate.delivery/pbxt/Nw6ta10DEp6Q1RMCjtGBosVhDah8fN0JulfN95bBonZ1DiCx/replicate-prediction-gpnzzkjeghrme0ct2a59v9h038.wav",
  "image": "https://replicate.delivery/pbxt/Nw6tZuQCrJRv1wgMAkQaJhlNQJNLRX60UKJBrzCYW5RvmShk/replicate-prediction-vkwqkpxagnrm80ct2a4b0bye7g.webp",
  "prompt": "A woman sings and strums her guitar",
  "fast_mode": true
}
Input Parameters
seed Type: integer
Random seed for reproducible generation.
audio (required) Type: string
Input audio file (MP3, WAV, etc.). Duration must be less than 35 seconds. If the audio exceeds 35 seconds, an error will be generated and the generation will fail.
image (required) Type: string
Input image containing a human subject, face or character.
prompt Type: string
Optional prompt for precise control of the scene, movements, camera movements, etc. Supports Chinese, English, Japanese, Korean, Spanish, and Indonesian.
fast_mode Type: booleanDefault: false
Enable fast mode to speed up generation by sacrificing some effects.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 912089419
Generating video...
Generated video in 192.6sec
Downloading 6074126 bytes
Downloaded 5.79MB in 2.25sec
Version Details
Version ID
b0f93aebf8c36a35da4ba2e7a5ce22c443df7e9e9c3bc1e8b170d55bc05f9b62
Version Created
November 14, 2025
Run on Replicate →