acappemin/video-to-audio-and-piano 🖼️📝✓🔢 → 🖼️
About
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Example Output
Prompt:
"the sound of race car, auto racing"
Output
Performance Metrics
18.23s
Prediction Time
18.24s
Total Time
All Input Parameters
{ "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4", "prompt": "the sound of race car, auto racing", "if_piano": false, "v2a_num_steps": 25 }
Input Parameters
- video
- Input Video
- prompt
- Video-to-Audio Text Prompt
- if_piano
- If Generating Piano Music
- v2a_num_steps
- Video-to-Audio Num Steps
Output Schema
Output
Example Execution Logs
torch.Size([1, 753, 128]) tensor([753], dtype=torch.int32) ['the sound of race car, auto racing'] ['/tmp/tmprdangfkr.mp4'] [False] None None None None 2025-04-27 06:49:43.474 start frames_embed midis cond torch.Size([1, 753, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 753, 128]) tensor(14.5177, device='cuda:0') No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) 2025-04-27 06:49:59.069 sample duration 10.04 10.04 Moviepy - Building video /tmp/tmprdangfkr.mp4.mp4. MoviePy - Writing audio in tmprdangfkr.mp4TEMP_MPY_wvf_snd.mp4 chunk: 0%| | 0/222 [00:00<?, ?it/s, now=None] chunk: 76%|███████▌ | 169/222 [00:00<00:00, 1668.18it/s, now=None] MoviePy - Done. Moviepy - Writing video /tmp/tmprdangfkr.mp4.mp4 t: 0%| | 0/251 [00:00<?, ?it/s, now=None] t: 12%|█▏ | 30/251 [00:00<00:00, 297.99it/s, now=None] t: 25%|██▌ | 63/251 [00:00<00:00, 311.38it/s, now=None] t: 38%|███▊ | 95/251 [00:00<00:00, 291.66it/s, now=None] t: 50%|████▉ | 125/251 [00:00<00:00, 215.98it/s, now=None] t: 59%|█████▉ | 149/251 [00:00<00:00, 190.23it/s, now=None] t: 68%|██████▊ | 170/251 [00:00<00:00, 169.79it/s, now=None] t: 75%|███████▌ | 189/251 [00:00<00:00, 166.17it/s, now=None] t: 82%|████████▏ | 207/251 [00:01<00:00, 154.44it/s, now=None] t: 90%|█████████ | 226/251 [00:01<00:00, 159.56it/s, now=None] t: 97%|█████████▋| 244/251 [00:01<00:00, 158.32it/s, now=None] Moviepy - Done ! Moviepy - video ready /tmp/tmprdangfkr.mp4.mp4 paths /tmp/tmprdangfkr.mp4 /tmp/tmprdangfkr.mp4.wav /tmp/tmprdangfkr.mp4.mp4
Version Details
- Version ID
d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1
- Version Created
- April 27, 2025