jichengdu/llasa πŸ“πŸ–ΌοΈ β†’ πŸ–ΌοΈ

▢️ 89 runs πŸ“… Mar 2025 βš™οΈ Cog 0.14.2
text-to-speech voice-cloning

About

8B TTS

Example Output

Output

Example output

Performance Metrics

6.73s Prediction Time
224.91s Total Time
All Input Parameters
{
  "text": "δΈΊζ‰€ζœ‰ηš„ηŒ«ηŒ«ε₯‹ζ–—η»ˆθΊ«οΌ",
  "voice_sample": "https://replicate.delivery/pbxt/MiFpnTHt7iIQ8LELP7yEKUvk1yO3HZwz9NquUVpOQ7SNPa74/zero_shot_prompt.wav"
}
Input Parameters
text (required) Type: string
Text to convert to speech
prompt_text Type: string
Optional prompt text. If not provided, will be extracted from voice sample using Whisper
voice_sample (required) Type: string
Voice sample audio file (16kHz)
Output Schema

Output

Type: string β€’ Format: uri

Example Execution Logs
/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
warnings.warn(
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
Whisper transcription: εΈŒζœ›δ½ δ»₯εŽθƒ½ε€ŸεšεΎ—ζ―”ζˆ‘θΏ˜ε₯½ε“Ÿ
Prompt Vq Code Shape: torch.Size([1, 1, 175])
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
Version Details
Version ID
e159ffbd476eaad8ddc3d05b73074a618a32a0aa4efb2e652aba0268ef506f37
Version Created
March 24, 2025
Run on Replicate β†’