acappemin/video-to-audio-and-piano 🖼️📝✓🔢 → 🖼️

▶️ 137 runs 📅 Apr 2025 ⚙️ Cog 0.14.7 🔗 GitHub 📄 Paper ⚖️ License
music-generation sound-effect-generation video-to-audio

About

Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization

Example Output

Prompt:

"the sound of race car, auto racing"

Output

Performance Metrics

18.23s Prediction Time
18.24s Total Time
All Input Parameters
{
  "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4",
  "prompt": "the sound of race car, auto racing",
  "if_piano": false,
  "v2a_num_steps": 25
}
Input Parameters
video Type: string
Input Video
prompt Type: stringDefault:
Video-to-Audio Text Prompt
if_piano Type: booleanDefault: false
If Generating Piano Music
v2a_num_steps Type: integerDefault: 25
Video-to-Audio Num Steps
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
torch.Size([1, 753, 128]) tensor([753], dtype=torch.int32) ['the sound of race car, auto racing'] ['/tmp/tmprdangfkr.mp4'] [False] None None None None
2025-04-27 06:49:43.474 start
frames_embed midis cond torch.Size([1, 753, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 753, 128]) tensor(14.5177, device='cuda:0')
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
2025-04-27 06:49:59.069 sample
duration 10.04 10.04
Moviepy - Building video /tmp/tmprdangfkr.mp4.mp4.
MoviePy - Writing audio in tmprdangfkr.mp4TEMP_MPY_wvf_snd.mp4
chunk:   0%|          | 0/222 [00:00<?, ?it/s, now=None]
chunk:  76%|███████▌  | 169/222 [00:00<00:00, 1668.18it/s, now=None]
MoviePy - Done.
Moviepy - Writing video /tmp/tmprdangfkr.mp4.mp4
t:   0%|          | 0/251 [00:00<?, ?it/s, now=None]
t:  12%|█▏        | 30/251 [00:00<00:00, 297.99it/s, now=None]
t:  25%|██▌       | 63/251 [00:00<00:00, 311.38it/s, now=None]
t:  38%|███▊      | 95/251 [00:00<00:00, 291.66it/s, now=None]
t:  50%|████▉     | 125/251 [00:00<00:00, 215.98it/s, now=None]
t:  59%|█████▉    | 149/251 [00:00<00:00, 190.23it/s, now=None]
t:  68%|██████▊   | 170/251 [00:00<00:00, 169.79it/s, now=None]
t:  75%|███████▌  | 189/251 [00:00<00:00, 166.17it/s, now=None]
t:  82%|████████▏ | 207/251 [00:01<00:00, 154.44it/s, now=None]
t:  90%|█████████ | 226/251 [00:01<00:00, 159.56it/s, now=None]
t:  97%|█████████▋| 244/251 [00:01<00:00, 158.32it/s, now=None]
Moviepy - Done !
Moviepy - video ready /tmp/tmprdangfkr.mp4.mp4
paths /tmp/tmprdangfkr.mp4 /tmp/tmprdangfkr.mp4.wav /tmp/tmprdangfkr.mp4.mp4
Version Details
Version ID
d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1
Version Created
April 27, 2025
Run on Replicate →