zsxkib/mmaudio-t4 🔢🖼️📝 → 🖼️

▶️ 1.6K runs 📅 Apr 2025 ⚙️ Cog 0.14.3 🔗 GitHub 📄 Paper ⚖️ License
sound-effect-generation video-to-audio

About

Cost-optimized MMAudio V2 (T4 GPU): Add sound to video using this version running on T4 hardware for lower cost. Synthesizes high-quality audio from video content.

Example Output

Prompt:

"waves, storm"

Output

Performance Metrics

37.18s Prediction Time
340.33s Total Time
All Input Parameters
{
  "video": "https://huggingface.co/hkchengrex/MMAudio/resolve/main/examples/sora_kraken.mp4",
  "prompt": "waves, storm",
  "duration": 10,
  "num_steps": 25,
  "cfg_strength": 4.5,
  "negative_prompt": "music"
}
Input Parameters
seed Type: integerRange: -1 - ∞
Random seed. Use -1 or leave blank to randomize the seed
image Type: string
Optional image file for image-to-audio generation (experimental)
video Type: string
Optional video file for video-to-audio generation
prompt Type: stringDefault:
Text prompt for generated audio
duration Type: numberDefault: 8Range: 1 - ∞
Duration of output in seconds
num_steps Type: integerDefault: 25
Number of inference steps
cfg_strength Type: numberDefault: 4.5Range: 1 - ∞
Guidance strength (CFG)
negative_prompt Type: stringDefault: music
Negative prompt to avoid certain sounds
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 30956
Processing video: /tmp/tmpke0qe0x5sora_kraken.mp4
[WARNING ]: Clip video is too short: 5.00 < 10.00
[WARNING ]: Truncating to 5.00 sec
[WARNING ]: Sync video is too short: 4.96 < 5.00
[WARNING ]: Truncating to 4.96 sec
Version Details
Version ID
330393ae234d739f3261ae389a5506a73a1bae8c77dc6c6faebc5bde78b6e972
Version Created
April 2, 2025
Run on Replicate →