zsxkib/mmaudio 🔢🖼️📝 → 🖼️

▶️ 5.4M runs 📅 Dec 2024 ⚙️ Cog 0.14.3 🔗 GitHub 📄 Paper ⚖️ License

image-to-audio sound-effect-generation video-to-audio

Performance

3.8sTypical run time

~78sCold start (first call)

5.4MTotal runs

About

Add sound to video using the MMAudio V2 model. An advanced AI model that synthesizes high-quality audio from video content, enabling seamless video-to-audio transformation.

Example Output

Prompt:

"galloping"

Output

Performance Metrics

3.77s Prediction Time

78.31s Total Time

All Input Parameters

{
  "seed": -1,
  "video": "https://huggingface.co/hkchengrex/MMAudio/resolve/main/examples/sora_galloping.mp4",
  "prompt": "galloping",
  "duration": 8,
  "num_steps": 25,
  "cfg_strength": 4.5,
  "negative_prompt": "music"
}

Input Parameters

seed Type: integerRange: -1 - ∞: Random seed. Use -1 or leave blank to randomize the seed
image Type: string: Optional image file for image-to-audio generation (experimental)
video Type: string: Optional video file for video-to-audio generation
prompt Type: stringDefault:: Text prompt for generated audio
duration Type: numberDefault: 8Range: 1 - ∞: Duration of output in seconds
num_steps Type: integerDefault: 25: Number of inference steps
cfg_strength Type: numberDefault: 4.5Range: 1 - ∞: Guidance strength (CFG)
negative_prompt Type: stringDefault: music: Negative prompt to avoid certain sounds

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 47716
Processing video: /tmp/tmp5skuo71gsora_galloping.mp4
[[33mWARNING [0m]: [33mClip video is too short: 5.00 < 8.00[0m
[[33mWARNING [0m]: [33mTruncating to 5.00 sec[0m

Version Details

Version ID: 62871fb59889b2d7c13777f08deb3b36bdff88f7e1d53a50ad7694548a41b484
Version Created: April 2, 2025

Run on Replicate →