tencent/hunyuanvideo-foley 🖼️📝✓🔢 → 🖼️

▶️ 6.1K runs 📅 Sep 2025 ⚙️ Cog 0.16.2
foley sound-effect-generation video-to-audio

About

(Research & Non-commercial use only) Text-Video-to-Audio Synthesis: Generate realistic audio from video and text descriptions

Example Output

Prompt:

"splash of water and loud thud as person hits the surface"

Output

Performance Metrics

15.46s Prediction Time
15.47s Total Time
All Input Parameters
{
  "video": "https://replicate.delivery/pbxt/Ng6XsciCNoSaPKYexAwkVvruQl3uGkMqgXoB5sLcwKNd9Vqs/8_video.mp4",
  "prompt": "splash of water and loud thud as person hits the surface",
  "return_audio": false,
  "guidance_scale": 4.5,
  "num_inference_steps": 50
}
Input Parameters
video (required) Type: string
Input video file (e.g., .mp4, .mov)
prompt Type: stringDefault:
Optional text prompt describing the scene
neg_prompt Type: string
Negative prompt to avoid unwanted sounds
return_audio Type: booleanDefault: false
Return audio only
guidance_scale Type: numberDefault: 4.5Range: 0 - 20
Guidance strength
num_inference_steps Type: integerDefault: 50Range: 1 - 200
Denoising steps
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Denoising steps:   0%|          | 0/50 [00:00<?, ?it/s]
Denoising steps:   2%|▏         | 1/50 [00:00<00:07,  6.57it/s]
Denoising steps:   4%|▍         | 2/50 [00:00<00:07,  6.58it/s]
Denoising steps:   6%|▌         | 3/50 [00:00<00:07,  6.57it/s]
Denoising steps:   8%|▊         | 4/50 [00:00<00:06,  6.58it/s]
Denoising steps:  10%|█         | 5/50 [00:00<00:06,  6.58it/s]
Denoising steps:  12%|█▏        | 6/50 [00:00<00:06,  6.57it/s]
Denoising steps:  14%|█▍        | 7/50 [00:01<00:06,  6.58it/s]
Denoising steps:  16%|█▌        | 8/50 [00:01<00:06,  6.56it/s]
Denoising steps:  18%|█▊        | 9/50 [00:01<00:06,  6.56it/s]
Denoising steps:  20%|██        | 10/50 [00:01<00:06,  6.56it/s]
Denoising steps:  22%|██▏       | 11/50 [00:01<00:05,  6.56it/s]
Denoising steps:  24%|██▍       | 12/50 [00:01<00:05,  6.56it/s]
Denoising steps:  26%|██▌       | 13/50 [00:01<00:05,  6.56it/s]
Denoising steps:  28%|██▊       | 14/50 [00:02<00:05,  6.55it/s]
Denoising steps:  30%|███       | 15/50 [00:02<00:05,  6.56it/s]
Denoising steps:  32%|███▏      | 16/50 [00:02<00:05,  6.56it/s]
Denoising steps:  34%|███▍      | 17/50 [00:02<00:05,  6.56it/s]
Denoising steps:  36%|███▌      | 18/50 [00:02<00:04,  6.57it/s]
Denoising steps:  38%|███▊      | 19/50 [00:02<00:04,  6.55it/s]
Denoising steps:  40%|████      | 20/50 [00:03<00:04,  6.55it/s]
Denoising steps:  42%|████▏     | 21/50 [00:03<00:04,  6.54it/s]
Denoising steps:  44%|████▍     | 22/50 [00:03<00:04,  6.55it/s]
Denoising steps:  46%|████▌     | 23/50 [00:03<00:04,  6.54it/s]
Denoising steps:  48%|████▊     | 24/50 [00:03<00:03,  6.53it/s]
Denoising steps:  50%|█████     | 25/50 [00:03<00:03,  6.54it/s]
Denoising steps:  52%|█████▏    | 26/50 [00:03<00:03,  6.55it/s]
Denoising steps:  54%|█████▍    | 27/50 [00:04<00:03,  6.56it/s]
Denoising steps:  56%|█████▌    | 28/50 [00:04<00:03,  6.57it/s]
Denoising steps:  58%|█████▊    | 29/50 [00:04<00:03,  6.57it/s]
Denoising steps:  60%|██████    | 30/50 [00:04<00:03,  6.58it/s]
Denoising steps:  62%|██████▏   | 31/50 [00:04<00:02,  6.59it/s]
Denoising steps:  64%|██████▍   | 32/50 [00:04<00:02,  6.59it/s]
Denoising steps:  66%|██████▌   | 33/50 [00:05<00:02,  6.58it/s]
Denoising steps:  68%|██████▊   | 34/50 [00:05<00:02,  6.55it/s]
Denoising steps:  70%|███████   | 35/50 [00:05<00:02,  6.56it/s]
Denoising steps:  72%|███████▏  | 36/50 [00:05<00:02,  6.58it/s]
Denoising steps:  74%|███████▍  | 37/50 [00:05<00:01,  6.58it/s]
Denoising steps:  76%|███████▌  | 38/50 [00:05<00:01,  6.58it/s]
Denoising steps:  78%|███████▊  | 39/50 [00:05<00:01,  6.58it/s]
Denoising steps:  80%|████████  | 40/50 [00:06<00:01,  6.58it/s]
Denoising steps:  82%|████████▏ | 41/50 [00:06<00:01,  6.57it/s]
Denoising steps:  84%|████████▍ | 42/50 [00:06<00:01,  6.57it/s]
Denoising steps:  86%|████████▌ | 43/50 [00:06<00:01,  6.57it/s]
Denoising steps:  88%|████████▊ | 44/50 [00:06<00:00,  6.58it/s]
Denoising steps:  90%|█████████ | 45/50 [00:06<00:00,  6.55it/s]
Denoising steps:  92%|█████████▏| 46/50 [00:07<00:00,  6.55it/s]
Denoising steps:  94%|█████████▍| 47/50 [00:07<00:00,  6.55it/s]
Denoising steps:  96%|█████████▌| 48/50 [00:07<00:00,  6.55it/s]
Denoising steps:  98%|█████████▊| 49/50 [00:07<00:00,  6.56it/s]
Denoising steps: 100%|██████████| 50/50 [00:07<00:00,  6.56it/s]
Denoising steps: 100%|██████████| 50/50 [00:07<00:00,  6.56it/s]
2025-09-08 17:18:55.608 | INFO     | hunyuanvideo_foley.utils.media_utils:merge_audio_video:77 - Merging audio '/tmp/output.wav' with video '/tmp/tmp07d6a94i8_video.mp4'
2025-09-08 17:18:55.801 | INFO     | hunyuanvideo_foley.utils.media_utils:merge_audio_video:91 - Successfully merged video saved to: /tmp/output.mp4
Version Details
Version ID
88045928bb97971cffefabfc05a4e55e5bb1c96d475ad4ecc3d229d9169758ae
Version Created
September 8, 2025
Run on Replicate →