acappemin/video-to-audio-and-piano 🖼️📝✓🔢 → 🖼️

▶️ 145 runs 📅 Apr 2025 ⚙️ Cog 0.14.7 🔗 GitHub 📄 Paper ⚖️ License

music-generation sound-effect-generation video-to-audio

About

Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization

Example Output

Prompt:

"the sound of race car, auto racing"

Output

Performance Metrics

18.23s Prediction Time

18.24s Total Time

All Input Parameters

{
  "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4",
  "prompt": "the sound of race car, auto racing",
  "if_piano": false,
  "v2a_num_steps": 25
}

Input Parameters

video Type: string: Input Video
prompt Type: stringDefault:: Video-to-Audio Text Prompt
if_piano Type: booleanDefault: false: If Generating Piano Music
v2a_num_steps Type: integerDefault: 25: Video-to-Audio Num Steps

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

torch.Size([1, 753, 128]) tensor([753], dtype=torch.int32) ['the sound of race car, auto racing'] ['/tmp/tmprdangfkr.mp4'] [False] None None None None
2025-04-27 06:49:43.474 start
frames_embed midis cond torch.Size([1, 753, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 753, 128]) tensor(14.5177, device='cuda:0')
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
2025-04-27 06:49:59.069 sample
duration 10.04 10.04
Moviepy - Building video /tmp/tmprdangfkr.mp4.mp4.
MoviePy - Writing audio in tmprdangfkr.mp4TEMP_MPY_wvf_snd.mp4
chunk:   0%|          | 0/222 [00:00<?, ?it/s, now=None]
chunk:  76%|███████▌  | 169/222 [00:00<00:00, 1668.18it/s, now=None]
MoviePy - Done.
Moviepy - Writing video /tmp/tmprdangfkr.mp4.mp4
t:   0%|          | 0/251 [00:00<?, ?it/s, now=None]
t:  12%|█▏        | 30/251 [00:00<00:00, 297.99it/s, now=None]
t:  25%|██▌       | 63/251 [00:00<00:00, 311.38it/s, now=None]
t:  38%|███▊      | 95/251 [00:00<00:00, 291.66it/s, now=None]
t:  50%|████▉     | 125/251 [00:00<00:00, 215.98it/s, now=None]
t:  59%|█████▉    | 149/251 [00:00<00:00, 190.23it/s, now=None]
t:  68%|██████▊   | 170/251 [00:00<00:00, 169.79it/s, now=None]
t:  75%|███████▌  | 189/251 [00:00<00:00, 166.17it/s, now=None]
t:  82%|████████▏ | 207/251 [00:01<00:00, 154.44it/s, now=None]
t:  90%|█████████ | 226/251 [00:01<00:00, 159.56it/s, now=None]
t:  97%|█████████▋| 244/251 [00:01<00:00, 158.32it/s, now=None]
Moviepy - Done !
Moviepy - video ready /tmp/tmprdangfkr.mp4.mp4
paths /tmp/tmprdangfkr.mp4 /tmp/tmprdangfkr.mp4.wav /tmp/tmprdangfkr.mp4.mp4

Version Details

Version ID: d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1
Version Created: April 27, 2025

Run on Replicate →