bytedance/omni-human-1.5 🔢🖼️📝✓ → 🖼️
About
A film-grade digital human model that generates realistic video from a single image, audio clip, and optional text prompt.
Example Output
Prompt:
"A woman sings and strums her guitar"
Output
Performance Metrics
196.65s
Prediction Time
196.66s
Total Time
All Input Parameters
{
"audio": "https://replicate.delivery/pbxt/Nw6ta10DEp6Q1RMCjtGBosVhDah8fN0JulfN95bBonZ1DiCx/replicate-prediction-gpnzzkjeghrme0ct2a59v9h038.wav",
"image": "https://replicate.delivery/pbxt/Nw6tZuQCrJRv1wgMAkQaJhlNQJNLRX60UKJBrzCYW5RvmShk/replicate-prediction-vkwqkpxagnrm80ct2a4b0bye7g.webp",
"prompt": "A woman sings and strums her guitar",
"fast_mode": true
}
Input Parameters
- seed
- Random seed for reproducible generation.
- audio (required)
- Input audio file (MP3, WAV, etc.). Duration must be less than 35 seconds. If the audio exceeds 35 seconds, an error will be generated and the generation will fail.
- image (required)
- Input image containing a human subject, face or character.
- prompt
- Optional prompt for precise control of the scene, movements, camera movements, etc. Supports Chinese, English, Japanese, Korean, Spanish, and Indonesian.
- fast_mode
- Enable fast mode to speed up generation by sacrificing some effects.
Output Schema
Output
Example Execution Logs
Using seed: 912089419 Generating video... Generated video in 192.6sec Downloading 6074126 bytes Downloaded 5.79MB in 2.25sec
Version Details
- Version ID
b0f93aebf8c36a35da4ba2e7a5ce22c443df7e9e9c3bc1e8b170d55bc05f9b62- Version Created
- November 14, 2025