lucataco/ace-step πŸ”’πŸ“β“ β†’ πŸ–ΌοΈ

▢️ 57.0K runs πŸ“… May 2025 βš™οΈ Cog 0.14.10 πŸ”— GitHub πŸ“„ Paper βš–οΈ License
music-generation singing-voice-generation text-to-music

About

A Step Towards Music Generation Foundation Model text2music

Example Output

Output

Example output

Performance Metrics

6.18s Prediction Time
45.84s Total Time
All Input Parameters
{
  "seed": -1,
  "tags": "synth-pop, electronic, pop, synthesizer, drums, bass, piano, 128 BPM, energetic, uplifting, modern",
  "lyrics": "[verse]\nWoke up in a city that's always alive\nNeon lights they shimmer they thrive\nElectric pulses beat they drive\nMy heart races just to survive\n\n[chorus]\nOh electric dreams they keep me high\nThrough the wires I soar and fly\nMidnight rhythms in the sky\nElectric dreams together we’ll defy\n\n[verse]\nLost in the labyrinth of screens\nVirtual love or so it seems\nIn the night the city gleams\nDigital faces haunted by memes\n\n[chorus]\nOh electric dreams they keep me high\nThrough the wires I soar and fly\nMidnight rhythms in the sky\nElectric dreams together we’ll defy\n\n[bridge]\nSilent whispers in my ear\nPixelated love serene and clear\nThrough the chaos find you near\nIn electric dreams no fear\n\n[verse]\nBound by circuits intertwined\nLove like ours is hard to find\nIn this world we’re truly blind\nBut electric dreams free the mind",
  "duration": 60,
  "scheduler": "euler",
  "guidance_type": "apg",
  "guidance_scale": 15,
  "number_of_steps": 60,
  "granularity_scale": 10,
  "guidance_interval": 0.5,
  "min_guidance_scale": 3,
  "tag_guidance_scale": 0,
  "lyric_guidance_scale": 0,
  "guidance_interval_decay": 0
}
Input Parameters
seed Type: integerDefault: -1
Random seed. Set to -1 to randomize.
tags (required) Type: string
Text prompts to guide music generation, e.g., 'epic,cinematic'
lyrics Type: string
Lyrics for the music. Use [verse], [chorus], and [bridge] to separate different parts of the lyrics. Use [instrumental] or [inst] to generate instrumental music
duration Type: numberDefault: 60Range: 1 - 240
Duration of the generated audio in seconds. -1 means a random duration between 30 and 240 seconds.
scheduler Default: euler
Scheduler type.
guidance_type Default: apg
Guidance type for CFG.
guidance_scale Type: numberDefault: 15Range: 0 - 30
Overall guidance scale.
number_of_steps Type: integerDefault: 60Range: 10 - 200
Number of inference steps.
granularity_scale Type: numberDefault: 10Range: -100 - 100
Omega scale for APG guidance, or similar for other CFG types.
guidance_interval Type: numberDefault: 0.5Range: 0 - 1
Guidance interval.
min_guidance_scale Type: numberDefault: 3Range: 0 - 100
Minimum guidance scale.
tag_guidance_scale Type: numberDefault: 0Range: 0 - 10
Guidance scale for tags (text prompt).
lyric_guidance_scale Type: numberDefault: 0Range: 0 - 10
Guidance scale for lyrics.
guidance_interval_decay Type: numberDefault: 0Range: 0 - 1
Guidance interval decay.
Output Schema

Output

Type: string β€’ Format: uri

Example Execution Logs
Using seed: 6796391
Generating audio with duration: 60.0s, tags: 'synth-pop, electronic, pop, synthesizer, drums, bass, piano, 128 BPM, energetic, uplifting, modern', lyrics: 'True'
2025-05-14 15:17:23.278 | INFO     | pipeline_ace_step:text2music_diffusion_process:896 - cfg_type: apg, guidance_scale: 15.0, omega_scale: 10
2025-05-14 15:17:23.290 | INFO     | pipeline_ace_step:text2music_diffusion_process:1114 - start_idx: 15, end_idx: 45, num_inference_steps: 60
  0%|          | 0/60 [00:00<?, ?it/s]
  2%|▏         | 1/60 [00:00<00:23,  2.52it/s]
  7%|β–‹         | 4/60 [00:00<00:05,  9.39it/s]
 12%|β–ˆβ–        | 7/60 [00:00<00:03, 13.84it/s]
 17%|β–ˆβ–‹        | 10/60 [00:00<00:02, 16.81it/s]
 22%|β–ˆβ–ˆβ–       | 13/60 [00:00<00:02, 18.81it/s]
 27%|β–ˆβ–ˆβ–‹       | 16/60 [00:01<00:02, 16.58it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 18/60 [00:01<00:02, 15.50it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 20/60 [00:01<00:02, 14.25it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 22/60 [00:01<00:02, 13.42it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 24/60 [00:01<00:02, 12.85it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 26/60 [00:01<00:02, 12.47it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 28/60 [00:02<00:02, 12.22it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 30/60 [00:02<00:02, 12.03it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 32/60 [00:02<00:02, 11.90it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 34/60 [00:02<00:02, 11.80it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 36/60 [00:02<00:02, 11.74it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 38/60 [00:02<00:01, 11.70it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 40/60 [00:03<00:01, 11.66it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 42/60 [00:03<00:01, 11.65it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 44/60 [00:03<00:01, 11.63it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 46/60 [00:03<00:01, 12.49it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 49/60 [00:03<00:00, 15.23it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 52/60 [00:03<00:00, 17.35it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 55/60 [00:04<00:00, 18.55it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 58/60 [00:04<00:00, 20.21it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 60/60 [00:04<00:00, 14.10it/s]
0%|          | 0/1 [00:00<?, ?it/s]2025-05-14 15:17:28.229 | WARNING  | pipeline_ace_step:save_wav_file:1419 - save_path is None, using default path ./outputs/
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.07it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.07it/s]
Audio generated at: ./outputs/output_20250514151728_0.mp3
Version Details
Version ID
280fc4f9ee507577f880a167f639c02622421d8fecf492454320311217b688f1
Version Created
May 14, 2025
Run on Replicate β†’