smaerdlatigid/stable-audio 🔢📝 → 🖼️
About
Create audio clips from text
Example Output
Prompt:
"A gentle rainfall with distant thunder"
Output
Performance Metrics
17.76s
Prediction Time
133.58s
Total Time
All Input Parameters
{ "cfg": 7, "steps": 120, "prompt": "A gentle rainfall with distant thunder", "seconds_total": 60 }
Input Parameters
- cfg
- CFG value for the model
- steps
- Number of steps for the model
- prompt
- Describe the image
- seconds_total
- Total duration in seconds
Output Schema
Output
Example Execution Logs
Prompt received: A gentle rainfall with distant thunder Settings: Duration=60s, Steps=120, CFG Scale=7.0 Sample rate: 44100, Sample size: 2646000 Conditioning: [{'prompt': 'A gentle rainfall with distant thunder', 'seconds_start': 0, 'seconds_total': 60}] Generating audio... 528137451 /src/stable_audio_tools/models/conditioners.py:314: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(dtype=torch.float16) and torch.set_grad_enabled(self.enable_grad): /src/stable_audio_tools/inference/sampling.py:176: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(): 0%| | 0/120 [00:00<?, ?it/s]/root/.pyenv/versions/3.9.20/lib/python3.9/contextlib.py:87: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature. self.gen = func(*args, **kwds) /root/.pyenv/versions/3.9.20/lib/python3.9/site-packages/torchsde/_brownian/brownian_interval.py:608: UserWarning: Should have tb<=t1 but got tb=500.00006103515625 and t1=500.000061. warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.") 1%| | 1/120 [00:00<00:24, 4.81it/s] 2%|▏ | 2/120 [00:00<00:17, 6.91it/s] 2%|▎ | 3/120 [00:00<00:14, 8.00it/s] 3%|▎ | 4/120 [00:00<00:13, 8.67it/s] 4%|▍ | 5/120 [00:00<00:12, 9.06it/s] 5%|▌ | 6/120 [00:00<00:12, 9.32it/s] 6%|▌ | 7/120 [00:00<00:11, 9.51it/s] 7%|▋ | 8/120 [00:00<00:11, 9.64it/s] 8%|▊ | 9/120 [00:01<00:11, 9.73it/s] 8%|▊ | 10/120 [00:01<00:11, 9.79it/s] 9%|▉ | 11/120 [00:01<00:11, 9.83it/s] 10%|█ | 12/120 [00:01<00:10, 9.84it/s] 11%|█ | 13/120 [00:01<00:10, 9.87it/s] 12%|█▏ | 14/120 [00:01<00:10, 9.86it/s] 12%|█▎ | 15/120 [00:01<00:10, 9.87it/s] 13%|█▎ | 16/120 [00:01<00:10, 9.88it/s] 14%|█▍ | 17/120 [00:01<00:10, 9.88it/s] 15%|█▌ | 18/120 [00:01<00:10, 9.88it/s] 16%|█▌ | 19/120 [00:02<00:10, 9.86it/s] 17%|█▋ | 20/120 [00:02<00:10, 9.89it/s] 18%|█▊ | 21/120 [00:02<00:10, 9.88it/s] 18%|█▊ | 22/120 [00:02<00:09, 9.88it/s] 19%|█▉ | 23/120 [00:02<00:09, 9.88it/s] 20%|██ | 24/120 [00:02<00:09, 9.88it/s] 21%|██ | 25/120 [00:02<00:09, 9.88it/s] 22%|██▏ | 26/120 [00:02<00:09, 9.87it/s] 22%|██▎ | 27/120 [00:02<00:09, 9.87it/s] 23%|██▎ | 28/120 [00:02<00:09, 9.88it/s] 24%|██▍ | 29/120 [00:03<00:09, 9.88it/s] 25%|██▌ | 30/120 [00:03<00:09, 9.87it/s] 26%|██▌ | 31/120 [00:03<00:09, 9.87it/s] 27%|██▋ | 32/120 [00:03<00:08, 9.85it/s] 28%|██▊ | 33/120 [00:03<00:08, 9.87it/s] 28%|██▊ | 34/120 [00:03<00:08, 9.85it/s] 29%|██▉ | 35/120 [00:03<00:08, 9.85it/s] 30%|███ | 36/120 [00:03<00:08, 9.85it/s] 31%|███ | 37/120 [00:03<00:08, 9.85it/s] 32%|███▏ | 38/120 [00:03<00:08, 9.85it/s] 32%|███▎ | 39/120 [00:04<00:08, 9.88it/s] 33%|███▎ | 40/120 [00:04<00:08, 9.88it/s] 34%|███▍ | 41/120 [00:04<00:08, 9.87it/s] 35%|███▌ | 42/120 [00:04<00:07, 9.88it/s] 36%|███▌ | 43/120 [00:04<00:07, 9.89it/s] 37%|███▋ | 44/120 [00:04<00:07, 9.88it/s] 38%|███▊ | 45/120 [00:04<00:07, 9.89it/s] 38%|███▊ | 46/120 [00:04<00:07, 9.81it/s] 39%|███▉ | 47/120 [00:04<00:07, 9.72it/s] 40%|████ | 48/120 [00:04<00:07, 9.73it/s] 41%|████ | 49/120 [00:05<00:07, 9.74it/s] 42%|████▏ | 50/120 [00:05<00:07, 9.78it/s] 42%|████▎ | 51/120 [00:05<00:07, 9.81it/s] 43%|████▎ | 52/120 [00:05<00:06, 9.80it/s] 44%|████▍ | 53/120 [00:05<00:06, 9.79it/s] 45%|████▌ | 54/120 [00:05<00:06, 9.80it/s] 46%|████▌ | 55/120 [00:05<00:06, 9.80it/s] 47%|████▋ | 56/120 [00:05<00:06, 9.79it/s] 48%|████▊ | 57/120 [00:05<00:06, 9.77it/s] 48%|████▊ | 58/120 [00:05<00:06, 9.81it/s] 49%|████▉ | 59/120 [00:06<00:06, 9.83it/s] 50%|█████ | 60/120 [00:06<00:06, 9.86it/s] 51%|█████ | 61/120 [00:06<00:05, 9.87it/s] 52%|█████▏ | 62/120 [00:06<00:05, 9.78it/s] 52%|█████▎ | 63/120 [00:06<00:05, 9.77it/s] 53%|█████▎ | 64/120 [00:06<00:05, 9.82it/s] 54%|█████▍ | 65/120 [00:06<00:05, 9.84it/s] 55%|█████▌ | 66/120 [00:06<00:05, 9.87it/s] 56%|█████▌ | 67/120 [00:06<00:05, 9.88it/s] 57%|█████▋ | 68/120 [00:07<00:05, 9.90it/s] 57%|█████▊ | 69/120 [00:07<00:05, 9.92it/s] 58%|█████▊ | 70/120 [00:07<00:05, 9.93it/s] 59%|█████▉ | 71/120 [00:07<00:04, 9.92it/s] 60%|██████ | 72/120 [00:07<00:04, 9.80it/s] 61%|██████ | 73/120 [00:07<00:04, 9.86it/s] 62%|██████▏ | 74/120 [00:07<00:04, 9.87it/s] 62%|██████▎ | 75/120 [00:07<00:04, 9.90it/s] 63%|██████▎ | 76/120 [00:07<00:04, 9.91it/s] 64%|██████▍ | 77/120 [00:07<00:04, 9.90it/s] 65%|██████▌ | 78/120 [00:08<00:04, 9.93it/s] 66%|██████▌ | 79/120 [00:08<00:04, 9.92it/s] 67%|██████▋ | 80/120 [00:08<00:04, 9.92it/s] 68%|██████▊ | 81/120 [00:08<00:03, 9.91it/s] 68%|██████▊ | 82/120 [00:08<00:03, 9.91it/s] 69%|██████▉ | 83/120 [00:08<00:03, 9.85it/s] 70%|███████ | 84/120 [00:08<00:03, 9.87it/s] 71%|███████ | 85/120 [00:08<00:03, 9.89it/s] 72%|███████▏ | 86/120 [00:08<00:03, 9.89it/s] 72%|███████▎ | 87/120 [00:08<00:03, 9.90it/s] 73%|███████▎ | 88/120 [00:09<00:03, 9.91it/s] 74%|███████▍ | 89/120 [00:09<00:03, 9.92it/s] 75%|███████▌ | 90/120 [00:09<00:03, 9.91it/s] 76%|███████▌ | 91/120 [00:09<00:02, 9.92it/s] 77%|███████▋ | 92/120 [00:09<00:02, 9.93it/s] 78%|███████▊ | 93/120 [00:09<00:02, 9.93it/s] 78%|███████▊ | 94/120 [00:09<00:02, 9.93it/s] 79%|███████▉ | 95/120 [00:09<00:02, 9.94it/s] 80%|████████ | 96/120 [00:09<00:02, 9.95it/s] 81%|████████ | 97/120 [00:09<00:02, 9.96it/s] 82%|████████▏ | 98/120 [00:10<00:02, 9.96it/s] 83%|████████▎ | 100/120 [00:10<00:02, 9.98it/s] 84%|████████▍ | 101/120 [00:10<00:01, 9.97it/s] 85%|████████▌ | 102/120 [00:10<00:01, 9.97it/s] 86%|████████▌ | 103/120 [00:10<00:01, 9.95it/s] 87%|████████▋ | 104/120 [00:10<00:01, 9.92it/s] 88%|████████▊ | 105/120 [00:10<00:01, 9.92it/s] 88%|████████▊ | 106/120 [00:10<00:01, 9.93it/s] 89%|████████▉ | 107/120 [00:10<00:01, 9.94it/s] 91%|█████████ | 109/120 [00:11<00:01, 9.95it/s] 92%|█████████▏| 110/120 [00:11<00:01, 9.95it/s] 92%|█████████▎| 111/120 [00:11<00:00, 9.95it/s] 93%|█████████▎| 112/120 [00:11<00:00, 9.94it/s] 94%|█████████▍| 113/120 [00:11<00:00, 9.93it/s] 96%|█████████▌| 115/120 [00:11<00:00, 9.97it/s] 98%|█████████▊| 117/120 [00:11<00:00, 9.99it/s] 98%|█████████▊| 118/120 [00:12<00:00, 9.99it/s] 100%|██████████| 120/120 [00:12<00:00, 10.09it/s] 100%|██████████| 120/120 [00:12<00:00, 9.81it/s] Audio generated. Audio rearranged. Audio normalized and converted. Saving audio to file: /tmp/outputs/output_9f0f5406daaf43d88fffc76dc0b62c02.wav Audio saved: /tmp/outputs/output_9f0f5406daaf43d88fffc76dc0b62c02.wav Failed to upload image: {'statusCode': 400, 'error': 'Duplicate', 'message': 'The resource already exists'} Failed to upload metadata: {'statusCode': 400, 'error': 'Duplicate', 'message': 'The resource already exists'}
Version Details
- Version ID
80d7a3ff48781aadfe37bd4c0c0317ffa94c67698d661f4792b1b01129a29689
- Version Created
- November 3, 2024