jichengdu/spark-tts ❓📝🔢🖼️ → 🖼️

▶️ 247 runs 📅 Mar 2025 ⚙️ Cog 0.14.2
text-to-speech voice-cloning

About

0.5B

Example Output

Output

Example output

Performance Metrics

3.57s Prediction Time
70.32s Total Time
All Input Parameters
{
  "mode": "voice_creation",
  "text": "白日依山尽,黄河入海流。",
  "pitch": "high",
  "speed": "low",
  "top_k": 50,
  "top_p": 0.95,
  "gender": "female",
  "prompt_text": "",
  "temperature": 0.8
}
Input Parameters
mode Default: voice_creation
TTS mode: voice cloning requires a prompt audio file to mimic the voice; voice creation generates speech with specified gender/pitch/speed parameters. (TTS模式:声音克隆需要提供语音样本来模仿声音;声音创建使用指定的性别/音高/语速参数生成语音)
text (required) Type: string
Text for TTS generation - REQUIRED in both modes (要转换为语音的文本 - 两种模式下都必需)
pitch Default: moderate
[Voice Creation] Voice pitch level - REQUIRED in voice creation mode (声音创建模式:声音音高 - 声音创建模式下必需)
speed Default: moderate
[Voice Creation] Speaking speed - REQUIRED in voice creation mode (声音创建模式:说话速度 - 声音创建模式下必需)
top_k Type: integerDefault: 50
Top-k sampling parameter - Limits the token selection to top k options (Top-k采样参数 - 将令牌选择限制为前k个选项)
top_p Type: numberDefault: 0.95
Top-p sampling parameter - Nucleus sampling probability threshold (Top-p采样参数 - 核采样概率阈值)
gender Default: female
[Voice Creation] Voice gender - REQUIRED in voice creation mode (声音创建模式:声音性别 - 声音创建模式下必需)
prompt_text Type: stringDefault:
[Voice Cloning] Transcript of prompt audio - Optional but improves quality (声音克隆模式:提示音频的文本转录 - 可选,但提供可提高质量)
temperature Type: numberDefault: 0.8
Sampling temperature (0.0-1.0) - Controls randomness in generation (采样温度 - 控制生成的随机性)
prompt_speech_path Type: string
[Voice Cloning] Path to the prompt audio file - REQUIRED in voice cloning mode (声音克隆模式:提示音频文件路径 - 声音克隆模式下必需)
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Running voice creation with text: '白日依山尽,黄河入海流。', gender: 'female', pitch: 'high', speed: 'low'
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Generated audio shape: (60800,), min: -0.36268576979637146, max: 0.43774864077568054
Saved generated audio to generated_speech.wav
Version Details
Version ID
eac056d8a49570ce3e99ed6efe3ce53527b4e3df4abc9c5471dc640dbb75006b
Version Created
March 24, 2025
Run on Replicate →