zsxkib/mmaudio-t4 🔢🖼️📝 → 🖼️

▶️ 2.1K runs 📅 Apr 2025 ⚙️ Cog 0.14.3 🔗 GitHub 📄 Paper ⚖️ License

image-to-audio sound-effect-generation video-to-audio

Performance

37.2sTypical run time

~340sCold start (first call)

2.1KTotal runs

About

Cost-optimized MMAudio V2 (T4 GPU): Add sound to video using this version running on T4 hardware for lower cost. Synthesizes high-quality audio from video content.

Example Output

Prompt:

"waves, storm"

Output

Performance Metrics

37.18s Prediction Time

340.33s Total Time

All Input Parameters

{
  "video": "https://huggingface.co/hkchengrex/MMAudio/resolve/main/examples/sora_kraken.mp4",
  "prompt": "waves, storm",
  "duration": 10,
  "num_steps": 25,
  "cfg_strength": 4.5,
  "negative_prompt": "music"
}

Input Parameters

seed Type: integerRange: -1 - ∞: Random seed. Use -1 or leave blank to randomize the seed
image Type: string: Optional image file for image-to-audio generation (experimental)
video Type: string: Optional video file for video-to-audio generation
prompt Type: stringDefault:: Text prompt for generated audio
duration Type: numberDefault: 8Range: 1 - ∞: Duration of output in seconds
num_steps Type: integerDefault: 25: Number of inference steps
cfg_strength Type: numberDefault: 4.5Range: 1 - ∞: Guidance strength (CFG)
negative_prompt Type: stringDefault: music: Negative prompt to avoid certain sounds

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 30956
Processing video: /tmp/tmpke0qe0x5sora_kraken.mp4
[[33mWARNING [0m]: [33mClip video is too short: 5.00 < 10.00[0m
[[33mWARNING [0m]: [33mTruncating to 5.00 sec[0m
[[33mWARNING [0m]: [33mSync video is too short: 4.96 < 5.00[0m
[[33mWARNING [0m]: [33mTruncating to 4.96 sec[0m

Version Details

Version ID: 330393ae234d739f3261ae389a5506a73a1bae8c77dc6c6faebc5bde78b6e972
Version Created: April 2, 2025

Run on Replicate →