zsxkib/humo 🔢🖼️❓📝 → 🖼️

▶️ 117 runs 📅 Sep 2025 ⚙️ Cog 0.16.2

audio-to-video image-to-video lipsync text-to-video video-consistent-character-generation

Performance

1419.7sTypical run time

117Total runs

About

Example Output

Prompt:

"A person walking confidently down a busy street"

Output

Performance Metrics

1419.74s Prediction Time

1465.44s Total Time

All Input Parameters

{
  "width": 1280,
  "height": 720,
  "prompt": "A person walking confidently down a busy street",
  "num_frames": 49,
  "guidance_scale": 5,
  "negative_prompt": "blurry, low quality, distorted, bad anatomy",
  "num_inference_steps": 50,
  "audio_guidance_scale": 5.5
}

Input Parameters

seed Type: integerRange: 0 - 2147483647: Random seed for reproducible generation
audio Type: string: Audio file for lip-sync and movement synchronization (optional)
width Default: 1280: Video width in pixels
height Default: 720: Video height in pixels
prompt Type: stringDefault: A person walking confidently down a busy street: Text description of the video. Be detailed about the person, actions, and scene.
num_frames Type: integerDefault: 49Range: 9 - 97: Number of frames (25 fps, so 25 frames = 1 second). Model trained on up to 97 frames.
guidance_scale Type: numberDefault: 5Range: 2 - 15: Text guidance strength. Research default is 5.0. Lower values (3-5) often produce more natural lighting.
negative_prompt Type: stringDefault: blurry, low quality, distorted, bad anatomy: What to avoid in the video
reference_image Type: string: Reference image to control the person's appearance (optional)
num_inference_steps Type: integerDefault: 50Range: 10 - 100: Denoising steps. More steps = higher quality but slower. Research default is 50.
audio_guidance_scale Type: numberDefault: 5.5Range: 2 - 15: Audio guidance strength (when audio provided). Higher = better sync. Research default is 5.5.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

🎬 Generating 2.0s video (1280x720, 49 frames)
📝 Mode: T (engine=TA) | Steps: 50 | Seed: 25775
🎥 Generation complete!
🎬 Saved video
✅ Success: 2.0s video at 1280x720

Version Details

Version ID: d9b5555b1e87f11ef46b96834ecc379fabdaff97006b48564fe3d841561ab4ef
Version Created: September 18, 2025

Run on Replicate →