zsxkib/mmaudio 🔢🖼️📝 → 🖼️
About
Add sound to video using the MMAudio V2 model. An advanced AI model that synthesizes high-quality audio from video content, enabling seamless video-to-audio transformation.

Example Output
Prompt:
"galloping"
Output
Performance Metrics
3.77s
Prediction Time
78.31s
Total Time
All Input Parameters
{ "seed": -1, "video": "https://huggingface.co/hkchengrex/MMAudio/resolve/main/examples/sora_galloping.mp4", "prompt": "galloping", "duration": 8, "num_steps": 25, "cfg_strength": 4.5, "negative_prompt": "music" }
Input Parameters
- seed
- Random seed. Use -1 or leave blank to randomize the seed
- image
- Optional image file for image-to-audio generation (experimental)
- video
- Optional video file for video-to-audio generation
- prompt
- Text prompt for generated audio
- duration
- Duration of output in seconds
- num_steps
- Number of inference steps
- cfg_strength
- Guidance strength (CFG)
- negative_prompt
- Negative prompt to avoid certain sounds
Output Schema
Output
Example Execution Logs
Using seed: 47716 Processing video: /tmp/tmp5skuo71gsora_galloping.mp4 [[33mWARNING [0m]: [33mClip video is too short: 5.00 < 8.00[0m [[33mWARNING [0m]: [33mTruncating to 5.00 sec[0m
Version Details
- Version ID
62871fb59889b2d7c13777f08deb3b36bdff88f7e1d53a50ad7694548a41b484
- Version Created
- April 2, 2025