bytedance/omni-human-1.5 🔢🖼️📝✓ → 🖼️

⭐ Official ▶️ 9.8K runs 📅 Oct 2025 ⚙️ Cog 0.16.9

About

A film-grade digital human model that generates realistic video from a single image, audio clip, and optional text prompt.

Example Output

Prompt:

"A woman sings and strums her guitar"

Output

Performance Metrics

196.65s Prediction Time

196.66s Total Time

All Input Parameters

{
  "audio": "https://replicate.delivery/pbxt/Nw6ta10DEp6Q1RMCjtGBosVhDah8fN0JulfN95bBonZ1DiCx/replicate-prediction-gpnzzkjeghrme0ct2a59v9h038.wav",
  "image": "https://replicate.delivery/pbxt/Nw6tZuQCrJRv1wgMAkQaJhlNQJNLRX60UKJBrzCYW5RvmShk/replicate-prediction-vkwqkpxagnrm80ct2a4b0bye7g.webp",
  "prompt": "A woman sings and strums her guitar",
  "fast_mode": true
}

Input Parameters

seed Type: integer: Random seed for reproducible generation.
audio (required) Type: string: Input audio file (MP3, WAV, etc.). Duration must be less than 35 seconds. If the audio exceeds 35 seconds, an error will be generated and the generation will fail.
image (required) Type: string: Input image containing a human subject, face or character.
prompt Type: string: Optional prompt for precise control of the scene, movements, camera movements, etc. Supports Chinese, English, Japanese, Korean, Spanish, and Indonesian.
fast_mode Type: booleanDefault: false: Enable fast mode to speed up generation by sacrificing some effects.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 912089419
Generating video...
Generated video in 192.6sec
Downloading 6074126 bytes
Downloaded 5.79MB in 2.25sec

Version Details

Version ID: b0f93aebf8c36a35da4ba2e7a5ce22c443df7e9e9c3bc1e8b170d55bc05f9b62
Version Created: November 14, 2025

Run on Replicate →