e1100x/chattts 📝🔢✓ → 🖼️

▶️ 342 runs 📅 Oct 2024 ⚙️ Cog 0.11.3 🔗 GitHub ⚖️ License
multilingual multilingual-tts text-to-speech

About

ChatTTS is a text-to-speech model designed specifically for dialogue scenarios such as LLM assistant.

Example Output

Output

Example output

Performance Metrics

11.23s Prediction Time
77.48s Total Time
All Input Parameters
{
  "texts": "四川美食确实以辣闻名,但也有不辣的选择。比如甜水面、赖汤圆、蛋烘糕、叶儿粑等,这些小吃口味温和,甜而不腻,也很受欢迎。",
  "top_K": 20,
  "top_P": 0.7,
  "prompt": "",
  "stream": false,
  "temperature": 0.3,
  "use_decoder": true,
  "skip_refine_text": false,
  "do_text_normalization": true,
  "do_homophone_replacement": false
}
Input Parameters
spk Type: string
Speaker (empty to sample a random one)
lang Type: string
Language code (e.g., 'en', 'zh')
speed Type: integerDefault: 4Range: 0 - 9
Speech speed (0-9)
texts Type: stringDefault: 四川美食确实以辣闻名,但也有不辣的选择。比如甜水面、赖汤圆、蛋烘糕、叶儿粑等,这些小吃口味温和,甜而不腻,也很受欢迎。
Texts to synthesize, separated by '|'
top_K Type: integerDefault: 20Range: 0 - 20
Top K for sampling (0-20)
top_P Type: numberDefault: 0.7Range: 0 - 1
Top P for sampling (0-1)
prompt Type: stringDefault: [oral_2][laugh_0][break_6]
Prompt for text refinement (e.g., '[oral_2][laugh_0][break_6]')
stream Type: booleanDefault: false
Use stream mode
add_break Type: booleanDefault: false
Automatically add breaks to the text
manual_seed Type: integer
Manual seed for reproducibility
temperature Type: numberDefault: 0.3Range: 0 - 1
Temperature for sampling (0-1)
use_decoder Type: booleanDefault: true
Use decoder
refine_text_only Type: booleanDefault: false
Only refine text without generating audio
skip_refine_text Type: booleanDefault: false
Skip text refining
do_text_normalization Type: booleanDefault: true
Perform text normalization
do_homophone_replacement Type: booleanDefault: false
Perform homophone replacement
Output Schema

Output

Type: arrayItems Type: stringItems Format: uri

Example Execution Logs
[+0000 20241022 16:29:22] [INFO] Predictor | predict | Text input: ['四川美食确实以辣闻名,但也有不辣的选择。比如甜水面、赖汤圆、蛋烘糕、叶儿粑等,这些小吃口味温和,甜而不腻,也很受欢迎。']
[+0000 20241022 16:29:22] [INFO] Predictor | predict | Use speaker: 蘁淰斅欀剄嘌繑瘤壆捉汃瀙忚觠帒泫枬劾獝煇扡穂毡禊圤倪宨姡兂諩滢氓臬呠抯庇谻羐栄趆灛珴澉徂菘伙广憎譱峼枰诀澄竬审宓犝砉縟漱嗅妀佄茏晙湽篪赵猉矩獜嬇球啗真谷讲媞繓絃諺主産敳场剕用譋珠肗堲灟媄筑癛匠籉棛畉灎炑蜲警蕞蟵祡冒笳圞緀袄绝碧畴樊倈褁弃赣柋繟壤綝嘻淔卲癣緤赡谔勬刻殘偰緒勵筀折脙朌嚮瑲庼勖璹稸脝斅嵡盕泎攰亍莏挮甍贰荊蘕舑哼嗁尋漳枦属偣娤枏器硪观羾烩熶娠褼疵癚彩埜洗荀漈珖狧聪籹抛葻脨婔瘭炐煮珘贎噦崼絣煤狿蚎福烰橂崮绐耠凷夐旋贶櫃傞哢勋堣恪嗕掰涮炆労旍赨歾捺訛臠薬噌帨芥背噠嵛渂侅艹俄諄蔗嬍控瘕睓灤媀侮艦羥殥滋梨煚诂褲篃恗泃匹柪嘰匞猓洮伎睟仡怸绨奨劃蚚腤奍誙榜慭覝廘嬳摩蟱掟蔞蕻缘挎淿睕崕褲坪劄蘨聱夦賴懾榅嵯柑改勨嗊瘥箭评恝憭测磉罣噞盀洨叺衃艰摴讏埲揋礄冖棽穃緕聇援墀冩亽腚卬埊爷庯熜绮嶼蕼梈緣丧礆癊縨唡趌衏绬催匞筞葮緬絑戕蠇毫涑僺听罱嶔欭糘曦夣緓諔藘荹淳栦蟼濗探紟仡従裦覹証牔涁蟉湉坦獴苤汞梙捌璄溔墬敻犑謁蒈忡豿嘝搮堎甴埈檃堤徖堁禉楞蛆呬秣粹耗籜犽擆茪茖櫽跈挽袞噭傕緲氉儫历剪赭狃痩牡巔從亓懜忔愰喈朒夺熊楯訟弞嵓繪杈苻柱揇姈荘宪崟壺竫絀檇梡梴乡猎帗臰褔柵硉洐欕抅盬蛼崋俼縦碒倍讎蔬疶晏喚蛌梩砘僼帒榶玶掵羱萿呂嘡殆熗溧譀墘忇绕惬揄奦峚橷襶緎奁旅噰潨偗咀烘撬梥俦朲唂渑妥舰蕊砫溔蟝嶄惔喀嶅灚乄圜茆淌冨姗虯繆嘑眴耻襭戕宅択盗峷攉孴癔滓叜奓咋旖箊揂焟詚腡玮谱囗聑詡巋谎紤敡燣惃柾状湙倱卬渔簶垗衒夗泊綬窘衸承梣砜僁豽劊扏芶岭睑蕝趀勇僠斟稷放苺譪疞彡樍兩熡勚圹貃苪縀熤毚汍傃絀某澵氞槍棩课矮詢哗朻狄撾坼兤腳櫟絾蒢渎唘朌贉倣壈襉淘接蒆羺撳茗擻摁稭悤姞旗抜犒珪唟殣惷燴曫跄圳姪跛埥僔萏肟卮柩諉懜蝒張尩紌廭啯媢抩絷硼蠄筄坧崣紃熡蒇算殦庚伔公敆櫎罿璀翛傇臨烎芋协佉惇歉脏晱咧厚琇豝縤从諱衖忸俵薥艆璅俢询攈傺彚様璙蔞眏菀腺婹翵栌諌诹楺樯诚囘烀赗朕仉噓榃妉媖巶粉瞯塈皽礃炙荀棺嫌趂翾棏撛罊璺琨冯殐喾暅賱撒攃巶徜礏哕蒷槄箬與乐狗伂瓃堧梋帔椛滹諫老裙堅婏讕儣諾份睳茱楿噖苖愋塍廔耘窞葭泦浳古糙罪恉苒淓跟槵袖校敆莮盉橭榮诞趚虹孚睋忞贛嘔跹蓅舗舓蟮極獳刂勴康袔一㴁
[+0000 20241022 16:29:22] [INFO] Predictor | predict | Start inference.
text:   0%|          | 0/384(max) [00:00, ?it/s]
text:   0%|          | 1/384(max) [00:00,  3.64it/s]We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
text:   2%|▏         | 8/384(max) [00:00, 26.02it/s]
text:   4%|▍         | 16/384(max) [00:00, 42.21it/s]
text:   6%|▋         | 24/384(max) [00:00, 52.66it/s]
text:   8%|▊         | 32/384(max) [00:00, 59.09it/s]
text:  10%|█         | 40/384(max) [00:00, 63.52it/s]
text:  12%|█▎        | 48/384(max) [00:00, 66.09it/s]
text:  15%|█▍        | 56/384(max) [00:01, 67.70it/s]
text:  17%|█▋        | 64/384(max) [00:01, 69.44it/s]
text:  17%|█▋        | 67/384(max) [00:01, 55.74it/s]
code:   0%|          | 0/2048(max) [00:00, ?it/s]
code:   0%|          | 8/2048(max) [00:00, 71.82it/s]
code:   1%|          | 16/2048(max) [00:00, 72.38it/s]
code:   1%|          | 24/2048(max) [00:00, 72.55it/s]
code:   2%|▏         | 32/2048(max) [00:00, 72.80it/s]
code:   2%|▏         | 40/2048(max) [00:00, 73.35it/s]
code:   2%|▏         | 48/2048(max) [00:00, 73.57it/s]
code:   3%|▎         | 56/2048(max) [00:00, 73.38it/s]
code:   3%|▎         | 64/2048(max) [00:00, 73.42it/s]
code:   4%|▎         | 72/2048(max) [00:00, 72.97it/s]
code:   4%|▍         | 80/2048(max) [00:01, 72.57it/s]
code:   4%|▍         | 88/2048(max) [00:01, 72.57it/s]
code:   5%|▍         | 96/2048(max) [00:01, 72.70it/s]
code:   5%|▌         | 104/2048(max) [00:01, 72.59it/s]
code:   5%|▌         | 112/2048(max) [00:01, 72.61it/s]
code:   6%|▌         | 120/2048(max) [00:01, 72.66it/s]
code:   6%|▋         | 128/2048(max) [00:01, 72.62it/s]
code:   7%|▋         | 136/2048(max) [00:01, 72.87it/s]
code:   7%|▋         | 144/2048(max) [00:01, 72.60it/s]
code:   7%|▋         | 152/2048(max) [00:02, 72.62it/s]
code:   8%|▊         | 160/2048(max) [00:02, 72.31it/s]
code:   8%|▊         | 168/2048(max) [00:02, 54.93it/s]
code:   9%|▊         | 176/2048(max) [00:02, 59.28it/s]
code:   9%|▉         | 184/2048(max) [00:02, 62.64it/s]
code:   9%|▉         | 192/2048(max) [00:02, 65.11it/s]
code:  10%|▉         | 200/2048(max) [00:02, 66.53it/s]
code:  10%|█         | 208/2048(max) [00:02, 67.79it/s]
code:  11%|█         | 216/2048(max) [00:03, 68.96it/s]
code:  11%|█         | 224/2048(max) [00:03, 69.66it/s]
code:  11%|█▏        | 232/2048(max) [00:03, 70.42it/s]
code:  12%|█▏        | 240/2048(max) [00:03, 70.63it/s]
code:  12%|█▏        | 248/2048(max) [00:03, 70.83it/s]
code:  12%|█▎        | 256/2048(max) [00:03, 71.11it/s]
code:  13%|█▎        | 264/2048(max) [00:03, 71.68it/s]
code:  13%|█▎        | 272/2048(max) [00:03, 71.92it/s]
code:  14%|█▎        | 280/2048(max) [00:03, 71.85it/s]
code:  14%|█▍        | 288/2048(max) [00:04, 72.12it/s]
code:  14%|█▍        | 296/2048(max) [00:04, 72.24it/s]
code:  15%|█▍        | 304/2048(max) [00:04, 72.37it/s]
code:  15%|█▌        | 312/2048(max) [00:04, 72.40it/s]
code:  16%|█▌        | 320/2048(max) [00:04, 71.61it/s]
code:  16%|█▌        | 328/2048(max) [00:04, 71.56it/s]
code:  16%|█▋        | 336/2048(max) [00:04, 71.49it/s]
code:  17%|█▋        | 344/2048(max) [00:04, 71.85it/s]
code:  17%|█▋        | 352/2048(max) [00:04, 71.76it/s]
code:  18%|█▊        | 360/2048(max) [00:05, 71.95it/s]
code:  18%|█▊        | 368/2048(max) [00:05, 72.12it/s]
code:  18%|█▊        | 376/2048(max) [00:05, 72.20it/s]
code:  19%|█▉        | 384/2048(max) [00:05, 72.14it/s]
code:  19%|█▉        | 392/2048(max) [00:05, 72.09it/s]
code:  20%|█▉        | 400/2048(max) [00:05, 72.35it/s]
code:  20%|█▉        | 408/2048(max) [00:05, 72.25it/s]
code:  20%|██        | 416/2048(max) [00:05, 72.09it/s]
code:  21%|██        | 424/2048(max) [00:05, 71.92it/s]
code:  21%|██        | 432/2048(max) [00:06, 71.77it/s]
code:  21%|██▏       | 440/2048(max) [00:06, 71.47it/s]
code:  22%|██▏       | 448/2048(max) [00:06, 71.62it/s]
code:  22%|██▏       | 456/2048(max) [00:06, 71.48it/s]
code:  23%|██▎       | 464/2048(max) [00:06, 71.56it/s]
code:  23%|██▎       | 472/2048(max) [00:06, 71.68it/s]
code:  23%|██▎       | 480/2048(max) [00:06, 71.67it/s]
code:  24%|██▍       | 488/2048(max) [00:06, 71.77it/s]
code:  24%|██▍       | 496/2048(max) [00:06, 71.73it/s]
code:  24%|██▍       | 498/2048(max) [00:07, 70.78it/s]
[+0000 20241022 16:29:31] [INFO] Predictor | predict | Inference completed.
[+0000 20241022 16:29:32] [INFO] Predictor | predict | Audio saved to output_audio_0.mp3
[+0000 20241022 16:29:32] [INFO] Predictor | predict | Audio generation successful.
Version Details
Version ID
f84864bcadf646ecb5024d4539b2d46078b3e5f64549cff0a00026dc596929ad
Version Created
October 22, 2024
Run on Replicate →