jichengdu/llasa 📝🖼️ → 🖼️

▶️ 89 runs 📅 Mar 2025 ⚙️ Cog 0.14.2

text-to-speech voice-cloning

About

8B TTS

Example Output

Output

Performance Metrics

6.73s Prediction Time

224.91s Total Time

All Input Parameters

{
  "text": "为所有的猫猫奋斗终身！",
  "voice_sample": "https://replicate.delivery/pbxt/MiFpnTHt7iIQ8LELP7yEKUvk1yO3HZwz9NquUVpOQ7SNPa74/zero_shot_prompt.wav"
}

Input Parameters

text (required) Type: string: Text to convert to speech
prompt_text Type: string: Optional prompt text. If not provided, will be extracted from voice sample using Whisper
voice_sample (required) Type: string: Voice sample audio file (16kHz)

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
warnings.warn(
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
Whisper transcription: 希望你以后能够做得比我还好哟
Prompt Vq Code Shape: torch.Size([1, 1, 175])
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)

Version Details

Version ID: e159ffbd476eaad8ddc3d05b73074a618a32a0aa4efb2e652aba0268ef506f37
Version Created: March 24, 2025

Run on Replicate →