smaerdlatigid/stable-audio 🔢📝 → 🖼️
About
Create audio clips from text
Example Output
Prompt:
"A gentle rainfall with distant thunder"
Output
Performance Metrics
17.76s
Prediction Time
133.58s
Total Time
All Input Parameters
{
"cfg": 7,
"steps": 120,
"prompt": "A gentle rainfall with distant thunder",
"seconds_total": 60
}
Input Parameters
- cfg
- CFG value for the model
- steps
- Number of steps for the model
- prompt
- Describe the image
- seconds_total
- Total duration in seconds
Output Schema
Output
Example Execution Logs
Prompt received: A gentle rainfall with distant thunder
Settings: Duration=60s, Steps=120, CFG Scale=7.0
Sample rate: 44100, Sample size: 2646000
Conditioning: [{'prompt': 'A gentle rainfall with distant thunder', 'seconds_start': 0, 'seconds_total': 60}]
Generating audio...
528137451
/src/stable_audio_tools/models/conditioners.py:314: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(dtype=torch.float16) and torch.set_grad_enabled(self.enable_grad):
/src/stable_audio_tools/inference/sampling.py:176: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast():
0%| | 0/120 [00:00<?, ?it/s]/root/.pyenv/versions/3.9.20/lib/python3.9/contextlib.py:87: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
self.gen = func(*args, **kwds)
/root/.pyenv/versions/3.9.20/lib/python3.9/site-packages/torchsde/_brownian/brownian_interval.py:608: UserWarning: Should have tb<=t1 but got tb=500.00006103515625 and t1=500.000061.
warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.")
1%| | 1/120 [00:00<00:24, 4.81it/s]
2%|▏ | 2/120 [00:00<00:17, 6.91it/s]
2%|▎ | 3/120 [00:00<00:14, 8.00it/s]
3%|▎ | 4/120 [00:00<00:13, 8.67it/s]
4%|▍ | 5/120 [00:00<00:12, 9.06it/s]
5%|▌ | 6/120 [00:00<00:12, 9.32it/s]
6%|▌ | 7/120 [00:00<00:11, 9.51it/s]
7%|▋ | 8/120 [00:00<00:11, 9.64it/s]
8%|▊ | 9/120 [00:01<00:11, 9.73it/s]
8%|▊ | 10/120 [00:01<00:11, 9.79it/s]
9%|▉ | 11/120 [00:01<00:11, 9.83it/s]
10%|█ | 12/120 [00:01<00:10, 9.84it/s]
11%|█ | 13/120 [00:01<00:10, 9.87it/s]
12%|█▏ | 14/120 [00:01<00:10, 9.86it/s]
12%|█▎ | 15/120 [00:01<00:10, 9.87it/s]
13%|█▎ | 16/120 [00:01<00:10, 9.88it/s]
14%|█▍ | 17/120 [00:01<00:10, 9.88it/s]
15%|█▌ | 18/120 [00:01<00:10, 9.88it/s]
16%|█▌ | 19/120 [00:02<00:10, 9.86it/s]
17%|█▋ | 20/120 [00:02<00:10, 9.89it/s]
18%|█▊ | 21/120 [00:02<00:10, 9.88it/s]
18%|█▊ | 22/120 [00:02<00:09, 9.88it/s]
19%|█▉ | 23/120 [00:02<00:09, 9.88it/s]
20%|██ | 24/120 [00:02<00:09, 9.88it/s]
21%|██ | 25/120 [00:02<00:09, 9.88it/s]
22%|██▏ | 26/120 [00:02<00:09, 9.87it/s]
22%|██▎ | 27/120 [00:02<00:09, 9.87it/s]
23%|██▎ | 28/120 [00:02<00:09, 9.88it/s]
24%|██▍ | 29/120 [00:03<00:09, 9.88it/s]
25%|██▌ | 30/120 [00:03<00:09, 9.87it/s]
26%|██▌ | 31/120 [00:03<00:09, 9.87it/s]
27%|██▋ | 32/120 [00:03<00:08, 9.85it/s]
28%|██▊ | 33/120 [00:03<00:08, 9.87it/s]
28%|██▊ | 34/120 [00:03<00:08, 9.85it/s]
29%|██▉ | 35/120 [00:03<00:08, 9.85it/s]
30%|███ | 36/120 [00:03<00:08, 9.85it/s]
31%|███ | 37/120 [00:03<00:08, 9.85it/s]
32%|███▏ | 38/120 [00:03<00:08, 9.85it/s]
32%|███▎ | 39/120 [00:04<00:08, 9.88it/s]
33%|███▎ | 40/120 [00:04<00:08, 9.88it/s]
34%|███▍ | 41/120 [00:04<00:08, 9.87it/s]
35%|███▌ | 42/120 [00:04<00:07, 9.88it/s]
36%|███▌ | 43/120 [00:04<00:07, 9.89it/s]
37%|███▋ | 44/120 [00:04<00:07, 9.88it/s]
38%|███▊ | 45/120 [00:04<00:07, 9.89it/s]
38%|███▊ | 46/120 [00:04<00:07, 9.81it/s]
39%|███▉ | 47/120 [00:04<00:07, 9.72it/s]
40%|████ | 48/120 [00:04<00:07, 9.73it/s]
41%|████ | 49/120 [00:05<00:07, 9.74it/s]
42%|████▏ | 50/120 [00:05<00:07, 9.78it/s]
42%|████▎ | 51/120 [00:05<00:07, 9.81it/s]
43%|████▎ | 52/120 [00:05<00:06, 9.80it/s]
44%|████▍ | 53/120 [00:05<00:06, 9.79it/s]
45%|████▌ | 54/120 [00:05<00:06, 9.80it/s]
46%|████▌ | 55/120 [00:05<00:06, 9.80it/s]
47%|████▋ | 56/120 [00:05<00:06, 9.79it/s]
48%|████▊ | 57/120 [00:05<00:06, 9.77it/s]
48%|████▊ | 58/120 [00:05<00:06, 9.81it/s]
49%|████▉ | 59/120 [00:06<00:06, 9.83it/s]
50%|█████ | 60/120 [00:06<00:06, 9.86it/s]
51%|█████ | 61/120 [00:06<00:05, 9.87it/s]
52%|█████▏ | 62/120 [00:06<00:05, 9.78it/s]
52%|█████▎ | 63/120 [00:06<00:05, 9.77it/s]
53%|█████▎ | 64/120 [00:06<00:05, 9.82it/s]
54%|█████▍ | 65/120 [00:06<00:05, 9.84it/s]
55%|█████▌ | 66/120 [00:06<00:05, 9.87it/s]
56%|█████▌ | 67/120 [00:06<00:05, 9.88it/s]
57%|█████▋ | 68/120 [00:07<00:05, 9.90it/s]
57%|█████▊ | 69/120 [00:07<00:05, 9.92it/s]
58%|█████▊ | 70/120 [00:07<00:05, 9.93it/s]
59%|█████▉ | 71/120 [00:07<00:04, 9.92it/s]
60%|██████ | 72/120 [00:07<00:04, 9.80it/s]
61%|██████ | 73/120 [00:07<00:04, 9.86it/s]
62%|██████▏ | 74/120 [00:07<00:04, 9.87it/s]
62%|██████▎ | 75/120 [00:07<00:04, 9.90it/s]
63%|██████▎ | 76/120 [00:07<00:04, 9.91it/s]
64%|██████▍ | 77/120 [00:07<00:04, 9.90it/s]
65%|██████▌ | 78/120 [00:08<00:04, 9.93it/s]
66%|██████▌ | 79/120 [00:08<00:04, 9.92it/s]
67%|██████▋ | 80/120 [00:08<00:04, 9.92it/s]
68%|██████▊ | 81/120 [00:08<00:03, 9.91it/s]
68%|██████▊ | 82/120 [00:08<00:03, 9.91it/s]
69%|██████▉ | 83/120 [00:08<00:03, 9.85it/s]
70%|███████ | 84/120 [00:08<00:03, 9.87it/s]
71%|███████ | 85/120 [00:08<00:03, 9.89it/s]
72%|███████▏ | 86/120 [00:08<00:03, 9.89it/s]
72%|███████▎ | 87/120 [00:08<00:03, 9.90it/s]
73%|███████▎ | 88/120 [00:09<00:03, 9.91it/s]
74%|███████▍ | 89/120 [00:09<00:03, 9.92it/s]
75%|███████▌ | 90/120 [00:09<00:03, 9.91it/s]
76%|███████▌ | 91/120 [00:09<00:02, 9.92it/s]
77%|███████▋ | 92/120 [00:09<00:02, 9.93it/s]
78%|███████▊ | 93/120 [00:09<00:02, 9.93it/s]
78%|███████▊ | 94/120 [00:09<00:02, 9.93it/s]
79%|███████▉ | 95/120 [00:09<00:02, 9.94it/s]
80%|████████ | 96/120 [00:09<00:02, 9.95it/s]
81%|████████ | 97/120 [00:09<00:02, 9.96it/s]
82%|████████▏ | 98/120 [00:10<00:02, 9.96it/s]
83%|████████▎ | 100/120 [00:10<00:02, 9.98it/s]
84%|████████▍ | 101/120 [00:10<00:01, 9.97it/s]
85%|████████▌ | 102/120 [00:10<00:01, 9.97it/s]
86%|████████▌ | 103/120 [00:10<00:01, 9.95it/s]
87%|████████▋ | 104/120 [00:10<00:01, 9.92it/s]
88%|████████▊ | 105/120 [00:10<00:01, 9.92it/s]
88%|████████▊ | 106/120 [00:10<00:01, 9.93it/s]
89%|████████▉ | 107/120 [00:10<00:01, 9.94it/s]
91%|█████████ | 109/120 [00:11<00:01, 9.95it/s]
92%|█████████▏| 110/120 [00:11<00:01, 9.95it/s]
92%|█████████▎| 111/120 [00:11<00:00, 9.95it/s]
93%|█████████▎| 112/120 [00:11<00:00, 9.94it/s]
94%|█████████▍| 113/120 [00:11<00:00, 9.93it/s]
96%|█████████▌| 115/120 [00:11<00:00, 9.97it/s]
98%|█████████▊| 117/120 [00:11<00:00, 9.99it/s]
98%|█████████▊| 118/120 [00:12<00:00, 9.99it/s]
100%|██████████| 120/120 [00:12<00:00, 10.09it/s]
100%|██████████| 120/120 [00:12<00:00, 9.81it/s]
Audio generated.
Audio rearranged.
Audio normalized and converted.
Saving audio to file: /tmp/outputs/output_9f0f5406daaf43d88fffc76dc0b62c02.wav
Audio saved: /tmp/outputs/output_9f0f5406daaf43d88fffc76dc0b62c02.wav
Failed to upload image: {'statusCode': 400, 'error': 'Duplicate', 'message': 'The resource already exists'}
Failed to upload metadata: {'statusCode': 400, 'error': 'Duplicate', 'message': 'The resource already exists'}
Version Details
- Version ID
80d7a3ff48781aadfe37bd4c0c0317ffa94c67698d661f4792b1b01129a29689- Version Created
- November 3, 2024