zsxkib/mmaudio 🔢🖼️📝 → 🖼️

▶️ 3.6M runs 📅 Dec 2024 ⚙️ Cog 0.14.3 🔗 GitHub 📄 Paper ⚖️ License
sound-effect-generation video-to-audio

About

Add sound to video using the MMAudio V2 model. An advanced AI model that synthesizes high-quality audio from video content, enabling seamless video-to-audio transformation.

Example Output

Prompt:

"galloping"

Output

Performance Metrics

3.77s Prediction Time
78.31s Total Time
All Input Parameters
{
  "seed": -1,
  "video": "https://huggingface.co/hkchengrex/MMAudio/resolve/main/examples/sora_galloping.mp4",
  "prompt": "galloping",
  "duration": 8,
  "num_steps": 25,
  "cfg_strength": 4.5,
  "negative_prompt": "music"
}
Input Parameters
seed Type: integerRange: -1 - ∞
Random seed. Use -1 or leave blank to randomize the seed
image Type: string
Optional image file for image-to-audio generation (experimental)
video Type: string
Optional video file for video-to-audio generation
prompt Type: stringDefault:
Text prompt for generated audio
duration Type: numberDefault: 8Range: 1 - ∞
Duration of output in seconds
num_steps Type: integerDefault: 25
Number of inference steps
cfg_strength Type: numberDefault: 4.5Range: 1 - ∞
Guidance strength (CFG)
negative_prompt Type: stringDefault: music
Negative prompt to avoid certain sounds
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 47716
Processing video: /tmp/tmp5skuo71gsora_galloping.mp4
[WARNING ]: Clip video is too short: 5.00 < 8.00
[WARNING ]: Truncating to 5.00 sec
Version Details
Version ID
62871fb59889b2d7c13777f08deb3b36bdff88f7e1d53a50ad7694548a41b484
Version Created
April 2, 2025
Run on Replicate →