afiaka87/tortoise-tts 🔢📝❓🖼️ → 🖼️
About
Generate speech from text, clone voices from mp3 files. From James Betker AKA "neonbjb".
Example Output
Output
Performance Metrics
235.45s
Prediction Time
459.51s
Total Time
All Input Parameters
{ "seed": 0, "text": "The expressiveness of autoregressive transformers is literally nuts! I absolutely adore them.", "preset": "fast", "voice_a": "custom_voice", "voice_b": "disabled", "voice_c": "disabled", "custom_voice": "https://replicate.delivery/mgxm/671f3086-382f-4850-be82-db853e5f05a8/nixon.mp3" }
Input Parameters
- seed
- Random seed which can be used to reproduce results.
- text
- Text to speak.
- preset
- Which voice preset to use. See the documentation for more information.
- voice_a
- Selects the voice to use for generation. Use `random` to select a random voice. Use `custom_voice` to use a custom voice.
- voice_b
- (Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.
- voice_c
- (Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.
- cvvp_amount
- How much the CVVP model should influence the output. Increasing this can in some cases reduce the likelyhood of multiple speakers. Defaults to 0 (disabled)
- custom_voice
- (Optional) Create a custom voice based on an mp3 file of a speaker. Audio should be at least 15 seconds, only contain one speaker, and be in mp3 format. Overrides the `voice_a` input.
Output Schema
Output
Example Execution Logs
Creating voice from /tmp/tmpn3ll0ogznixon.mp3 [1;33m[1;33m[1;33m[1;33m[1;33m[1;33mWARNING[1;0m[1;0m[1;0m[1;0m[1;0m[1;0m: Input file had loudness range of 10.4, which is larger than the loudness range target (7.0). Normalization will revert to dynamic mode. Choose a higher target loudness range if you want linear normalization. [1;33m[1;33m[1;33m[1;33m[1;33m[1;33mWARNING[1;0m[1;0m[1;0m[1;0m[1;0m[1;0m: In dynamic mode, the sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it. [wav @ 0x560e9a45a3c0] ignoring wrong sample_count 55165030 [wav @ 0x560e9a45a3c0] Estimating duration from bitrate, this may be inaccurate Generating text using voices: ['custom_voice'] Generating autoregressive samples.. 0%| | 0/6 [00:00<?, ?it/s] 17%|█▋ | 1/6 [00:05<00:28, 5.80s/it] 33%|███▎ | 2/6 [00:11<00:22, 5.57s/it] 50%|█████ | 3/6 [00:16<00:16, 5.65s/it] 67%|██████▋ | 4/6 [00:22<00:10, 5.43s/it] 83%|████████▎ | 5/6 [00:27<00:05, 5.54s/it] 100%|██████████| 6/6 [00:33<00:00, 5.61s/it] 100%|██████████| 6/6 [00:33<00:00, 5.59s/it] Computing best candidates using CLVP 0%| | 0/6 [00:00<?, ?it/s] 17%|█▋ | 1/6 [00:00<00:01, 3.79it/s] 33%|███▎ | 2/6 [00:01<00:02, 1.46it/s] 50%|█████ | 3/6 [00:02<00:02, 1.22it/s] 67%|██████▋ | 4/6 [00:03<00:01, 1.14it/s] 83%|████████▎ | 5/6 [00:04<00:00, 1.09it/s] 100%|██████████| 6/6 [00:05<00:00, 1.06it/s] 100%|██████████| 6/6 [00:05<00:00, 1.16it/s]
Version Details
- Version ID
e9658de4b325863c4fcdc12d94bb7c9b54cbfe351b7ca1b36860008172b91c71
- Version Created
- August 2, 2022