zsxkib/humo 🔢🖼️❓📝 → 🖼️

▶️ 58 runs 📅 Sep 2025 ⚙️ Cog 0.16.2
audio-to-video image-to-video lipsync text-to-video video-consistent-character-generation

About

Example Output

Prompt:

"A person walking confidently down a busy street"

Output

Performance Metrics

1419.74s Prediction Time
1465.44s Total Time
All Input Parameters
{
  "width": 1280,
  "height": 720,
  "prompt": "A person walking confidently down a busy street",
  "num_frames": 49,
  "guidance_scale": 5,
  "negative_prompt": "blurry, low quality, distorted, bad anatomy",
  "num_inference_steps": 50,
  "audio_guidance_scale": 5.5
}
Input Parameters
seed Type: integerRange: 0 - 2147483647
Random seed for reproducible generation
audio Type: string
Audio file for lip-sync and movement synchronization (optional)
width Default: 1280
Video width in pixels
height Default: 720
Video height in pixels
prompt Type: stringDefault: A person walking confidently down a busy street
Text description of the video. Be detailed about the person, actions, and scene.
num_frames Type: integerDefault: 49Range: 9 - 97
Number of frames (25 fps, so 25 frames = 1 second). Model trained on up to 97 frames.
guidance_scale Type: numberDefault: 5Range: 2 - 15
Text guidance strength. Research default is 5.0. Lower values (3-5) often produce more natural lighting.
negative_prompt Type: stringDefault: blurry, low quality, distorted, bad anatomy
What to avoid in the video
reference_image Type: string
Reference image to control the person's appearance (optional)
num_inference_steps Type: integerDefault: 50Range: 10 - 100
Denoising steps. More steps = higher quality but slower. Research default is 50.
audio_guidance_scale Type: numberDefault: 5.5Range: 2 - 15
Audio guidance strength (when audio provided). Higher = better sync. Research default is 5.5.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
🎬 Generating 2.0s video (1280x720, 49 frames)
📝 Mode: T (engine=TA) | Steps: 50 | Seed: 25775
🎥 Generation complete!
🎬 Saved video
✅ Success: 2.0s video at 1280x720
Version Details
Version ID
d9b5555b1e87f11ef46b96834ecc379fabdaff97006b48564fe3d841561ab4ef
Version Created
September 18, 2025
Run on Replicate →