soykertje/whisper 🖼️❓🔢✓📝 → ❓

▶️ 84.7K runs 📅 Jul 2023 ⚙️ Cog 0.8.3 🔗 GitHub ⚖️ License
language-identification multilingual speech-to-text subtitle-generation

About

Convert speech in audio to text

Example Output

Output

{"segments":[{"id":0,"end":3.14,"seek":0,"text":" This is the Micro Machine Man presenting the most midget miniature motorcade of micro machines.","start":0.18,"words":[{"end":0.5,"word":" This","start":0.18,"probability":0.5188008546829224},{"end":0.66,"word":" is","start":0.5,"probability":0.9036935567855835},{"end":0.88,"word":" the","start":0.66,"probability":0.7630036473274231},{"end":0.9,"word":" Micro","start":0.88,"probability":0.7599949836730957},{"end":1,"word":" Machine","start":0.9,"probability":0.43241745233535767},{"end":1.18,"word":" Man","start":1,"probability":0.8935106992721558},{"end":1.4,"word":" presenting","start":1.18,"probability":0.4879225194454193},{"end":1.52,"word":" the","start":1.4,"probability":0.8076907992362976},{"end":1.68,"word":" most","start":1.52,"probability":0.8195436596870422},{"end":1.86,"word":" midget","start":1.68,"probability":0.9426034390926361},{"end":2.14,"word":" miniature","start":1.86,"probability":0.6330108642578125},{"end":2.46,"word":" motorcade","start":2.14,"probability":0.7706209719181061},{"end":2.7,"word":" of","start":2.46,"probability":0.8996413946151733},{"end":2.9,"word":" micro","start":2.7,"probability":0.2997014820575714},{"end":3.14,"word":" machines.","start":2.9,"probability":0.5044888257980347}],"tokens":[50364,639,307,264,25642,22155,2458,15578,264,881,2062,847,34674,5932,30340,295,4532,8379,13,50524],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852},{"id":1,"end":6.66,"seek":0,"text":" Each one has dramatic details, terrific trim, precision paint jobs, plus incredible micro machine pocket play sets.","start":3.46,"words":[{"end":3.56,"word":" Each","start":3.46,"probability":0.7921050786972046},{"end":3.68,"word":" one","start":3.56,"probability":0.8239385485649109},{"end":3.8,"word":" has","start":3.68,"probability":0.8931549787521362},{"end":4.04,"word":" dramatic","start":3.8,"probability":0.76500004529953},{"end":4.24,"word":" details,","start":4.04,"probability":0.6641080379486084},{"end":4.5,"word":" terrific","start":4.48,"probability":0.816464900970459},{"end":4.66,"word":" trim,","start":4.5,"probability":0.41670969128608704},{"end":4.96,"word":" precision","start":4.78,"probability":0.7823763489723206},{"end":5.18,"word":" paint","start":4.96,"probability":0.8373522162437439},{"end":5.3,"word":" jobs,","start":5.18,"probability":0.45608851313591003},{"end":5.52,"word":" plus","start":5.34,"probability":0.8570372462272644},{"end":5.78,"word":" incredible","start":5.52,"probability":0.7519615888595581},{"end":5.98,"word":" micro","start":5.78,"probability":0.6544991135597229},{"end":6.2,"word":" machine","start":5.98,"probability":0.659029483795166},{"end":6.54,"word":" pocket","start":6.2,"probability":0.6552930474281311},{"end":6.62,"word":" play","start":6.54,"probability":0.6885859370231628},{"end":6.66,"word":" sets.","start":6.62,"probability":0.633192241191864}],"tokens":[50524,6947,472,575,12023,4365,11,20899,10445,11,18356,4225,4782,11,1804,4651,4532,3479,8963,862,6352,13,50699],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852},{"id":2,"end":8.76,"seek":0,"text":" There's a police station, fire station, restaurant, service station, and more.","start":6.8,"words":[{"end":6.9,"word":" There's","start":6.8,"probability":0.8825059831142426},{"end":6.96,"word":" a","start":6.9,"probability":0.9860183000564575},{"end":7.18,"word":" police","start":6.96,"probability":0.8330947756767273},{"end":7.3,"word":" station,","start":7.18,"probability":0.8929913640022278},{"end":7.58,"word":" fire","start":7.4,"probability":0.7919226884841919},{"end":7.7,"word":" station,","start":7.58,"probability":0.8724629282951355},{"end":8.02,"word":" restaurant,","start":7.8,"probability":0.7704039216041565},{"end":8.26,"word":" service","start":8.08,"probability":0.8293709754943848},{"end":8.44,"word":" station,","start":8.26,"probability":0.9018292427062988},{"end":8.64,"word":" and","start":8.44,"probability":0.8904381990432739},{"end":8.76,"word":" more.","start":8.64,"probability":0.8884001970291138}],"tokens":[50699,821,311,257,3804,5214,11,2610,5214,11,6383,11,2643,5214,11,293,544,13,50804],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852},{"id":3,"end":10.28,"seek":0,"text":" Perfect pocket portables to take anyplace.","start":9,"words":[{"end":9.16,"word":" Perfect","start":9,"probability":0.8314006328582764},{"end":9.38,"word":" pocket","start":9.16,"probability":0.7699366211891174},{"end":9.62,"word":" portables","start":9.38,"probability":0.9437406063079834},{"end":9.8,"word":" to","start":9.62,"probability":0.8796511292457581},{"end":9.9,"word":" take","start":9.8,"probability":0.8167694807052612},{"end":10.28,"word":" anyplace.","start":9.9,"probability":0.6491216719150543}],"tokens":[50804,10246,8963,2436,2965,281,747,604,6742,13,50879],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852},{"id":4,"end":15.3,"seek":0,"text":" And there are many miniature play sets to play with and each one comes with its own special edition micro machine vehicle and fun fantastic features that miraculously move.","start":10.4,"words":[{"end":10.54,"word":" And","start":10.4,"probability":0.8727167844772339},{"end":10.66,"word":" there","start":10.54,"probability":0.8446416854858398},{"end":10.68,"word":" are","start":10.66,"probability":0.8793963193893433},{"end":10.82,"word":" many","start":10.68,"probability":0.8155453205108643},{"end":11.02,"word":" miniature","start":10.82,"probability":0.802977979183197},{"end":11.24,"word":" play","start":11.02,"probability":0.8290015459060669},{"end":11.24,"word":" sets","start":11.24,"probability":0.8442025780677795},{"end":11.42,"word":" to","start":11.24,"probability":0.895969808101654},{"end":11.52,"word":" play","start":11.42,"probability":0.8799439072608948},{"end":11.6,"word":" with","start":11.52,"probability":0.7744315266609192},{"end":11.7,"word":" and","start":11.6,"probability":0.40789568424224854},{"end":11.78,"word":" each","start":11.7,"probability":0.8371046185493469},{"end":11.92,"word":" one","start":11.78,"probability":0.8314981460571289},{"end":12.04,"word":" comes","start":11.92,"probability":0.8056870102882385},{"end":12.14,"word":" with","start":12.04,"probability":0.8167880773544312},{"end":12.24,"word":" its","start":12.14,"probability":0.6112188100814819},{"end":12.36,"word":" own","start":12.24,"probability":0.8195785880088806},{"end":12.58,"word":" special","start":12.36,"probability":0.9109472632408142},{"end":12.88,"word":" edition","start":12.58,"probability":0.7968374490737915},{"end":13.1,"word":" micro","start":12.88,"probability":0.7488399147987366},{"end":13.28,"word":" machine","start":13.1,"probability":0.71075838804245},{"end":13.52,"word":" vehicle","start":13.28,"probability":0.8960368037223816},{"end":13.66,"word":" and","start":13.52,"probability":0.8079777956008911},{"end":13.84,"word":" fun","start":13.66,"probability":0.8869229555130005},{"end":14.1,"word":" fantastic","start":13.84,"probability":0.6974602341651917},{"end":14.34,"word":" features","start":14.1,"probability":0.73177570104599},{"end":14.54,"word":" that","start":14.34,"probability":0.866608738899231},{"end":14.84,"word":" miraculously","start":14.54,"probability":0.9221479594707489},{"end":15.3,"word":" move.","start":14.84,"probability":0.8684508204460144}],"tokens":[50879,400,456,366,867,34674,862,6352,281,862,365,293,1184,472,1487,365,1080,1065,2121,11377,4532,3479,5864,293,1019,5456,4122,300,30686,25038,1286,13,51129],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852},{"id":5,"end":19.22,"seek":0,"text":" Raise the boat lift at the airport, marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge.","start":15.64,"words":[{"end":15.66,"word":" Raise","start":15.64,"probability":0.8107876181602478},{"end":15.78,"word":" the","start":15.66,"probability":0.8148850202560425},{"end":15.9,"word":" boat","start":15.78,"probability":0.8849641680717468},{"end":15.98,"word":" lift","start":15.9,"probability":0.45141515135765076},{"end":16.1,"word":" at","start":15.98,"probability":0.8778955340385437},{"end":16.24,"word":" the","start":16.1,"probability":0.8175328373908997},{"end":16.36,"word":" airport,","start":16.24,"probability":0.8736236691474915},{"end":16.62,"word":" marina,","start":16.38,"probability":0.6979465484619141},{"end":16.84,"word":" man","start":16.68,"probability":0.8808432221412659},{"end":16.92,"word":" the","start":16.84,"probability":0.7407904863357544},{"end":17.1,"word":" gun","start":16.92,"probability":0.896804690361023},{"end":17.22,"word":" turret","start":17.1,"probability":0.9114507436752319},{"end":17.32,"word":" at","start":17.22,"probability":0.8769182562828064},{"end":17.44,"word":" the","start":17.32,"probability":0.8099021315574646},{"end":17.54,"word":" army","start":17.44,"probability":0.7624181509017944},{"end":17.66,"word":" base,","start":17.54,"probability":0.8227003216743469},{"end":17.88,"word":" clean","start":17.74,"probability":0.7740076780319214},{"end":18,"word":" your","start":17.88,"probability":0.7935739159584045},{"end":18.12,"word":" car","start":18,"probability":0.8899481892585754},{"end":18.22,"word":" at","start":18.12,"probability":0.8764775395393372},{"end":18.26,"word":" the","start":18.22,"probability":0.8148840069770813},{"end":18.48,"word":" car","start":18.26,"probability":0.9004565477371216},{"end":18.52,"word":" wash,","start":18.48,"probability":0.5889071226119995},{"end":18.78,"word":" raise","start":18.62,"probability":0.7670565843582153},{"end":18.94,"word":" the","start":18.78,"probability":0.8077996969223022},{"end":19.06,"word":" toll","start":18.94,"probability":0.9132710099220276},{"end":19.22,"word":" bridge.","start":19.06,"probability":0.822396993637085}],"tokens":[51129,30062,264,6582,5533,412,264,10155,11,1849,1426,11,587,264,3874,34544,412,264,7267,3096,11,2541,428,1032,412,264,1032,5675,11,5300,264,16629,7283,13,51329],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852},{"id":6,"end":21.22,"seek":0,"text":" And these play sets fit together to form a micro machine world.","start":19.24,"words":[{"end":19.5,"word":" And","start":19.24,"probability":0.8680288195610046},{"end":19.62,"word":" these","start":19.5,"probability":0.6982218623161316},{"end":19.8,"word":" play","start":19.62,"probability":0.8423182368278503},{"end":19.84,"word":" sets","start":19.8,"probability":0.8482057452201843},{"end":20.08,"word":" fit","start":19.84,"probability":0.8614595532417297},{"end":20.2,"word":" together","start":20.08,"probability":0.7896531820297241},{"end":20.42,"word":" to","start":20.2,"probability":0.8943203091621399},{"end":20.42,"word":" form","start":20.42,"probability":0.8038238883018494},{"end":20.58,"word":" a","start":20.42,"probability":0.9939543604850769},{"end":20.74,"word":" micro","start":20.58,"probability":0.8134252429008484},{"end":20.92,"word":" machine","start":20.74,"probability":0.7036716341972351},{"end":21.22,"word":" world.","start":20.92,"probability":0.871777355670929}],"tokens":[51329,400,613,862,6352,3318,1214,281,1254,257,4532,3479,1002,13,51429],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852},{"id":7,"end":25.28,"seek":0,"text":" Micro machine pocket play sets so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all.","start":21.48,"words":[{"end":21.6,"word":" Micro","start":21.48,"probability":0.9181209206581116},{"end":21.78,"word":" machine","start":21.6,"probability":0.6455795764923096},{"end":22,"word":" pocket","start":21.78,"probability":0.7043859958648682},{"end":22.16,"word":" play","start":22,"probability":0.8732221722602844},{"end":22.24,"word":" sets","start":22.16,"probability":0.812091052532196},{"end":22.42,"word":" so","start":22.24,"probability":0.5316440463066101},{"end":22.68,"word":" tremendously","start":22.42,"probability":0.744458794593811},{"end":22.88,"word":" tiny,","start":22.68,"probability":0.815040647983551},{"end":23.04,"word":" so","start":22.96,"probability":0.9097583889961243},{"end":23.24,"word":" perfectly","start":23.04,"probability":0.8714672923088074},{"end":23.5,"word":" precise,","start":23.24,"probability":0.8715913891792297},{"end":23.78,"word":" so","start":23.64,"probability":0.9078233242034912},{"end":24.3,"word":" dazzlingly","start":23.78,"probability":0.8693957328796387},{"end":24.3,"word":" detailed,","start":24.3,"probability":0.7534111738204956},{"end":24.52,"word":" you'll","start":24.32,"probability":0.8286176919937134},{"end":24.62,"word":" want","start":24.52,"probability":0.7137832045555115},{"end":24.78,"word":" to","start":24.62,"probability":0.8862699270248413},{"end":24.88,"word":" pocket","start":24.78,"probability":0.7723338007926941},{"end":25.1,"word":" them","start":24.88,"probability":0.7235183715820312},{"end":25.28,"word":" all.","start":25.1,"probability":0.8980650901794434}],"tokens":[51429,25642,3479,8963,862,6352,370,27985,5870,11,370,6239,13600,11,370,44078,1688,356,9942,11,291,603,528,281,8963,552,439,13,51629],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852},{"id":8,"end":27.66,"seek":0,"text":" Micro machines and micro machine pocket play sets sold separately from Galoob.","start":25.5,"words":[{"end":25.68,"word":" Micro","start":25.5,"probability":0.9197527766227722},{"end":25.84,"word":" machines","start":25.68,"probability":0.7449283599853516},{"end":26,"word":" and","start":25.84,"probability":0.6610913276672363},{"end":26.12,"word":" micro","start":26,"probability":0.8715086579322815},{"end":26.32,"word":" machine","start":26.12,"probability":0.6785076856613159},{"end":26.5,"word":" pocket","start":26.32,"probability":0.7522760033607483},{"end":26.68,"word":" play","start":26.5,"probability":0.8618249297142029},{"end":26.78,"word":" sets","start":26.68,"probability":0.8218684792518616},{"end":26.96,"word":" sold","start":26.78,"probability":0.8280050158500671},{"end":27.16,"word":" separately","start":26.96,"probability":0.6918163895606995},{"end":27.4,"word":" from","start":27.16,"probability":0.8134355545043945},{"end":27.66,"word":" Galoob.","start":27.4,"probability":0.7426764170328776}],"tokens":[51629,25642,8379,293,4532,3479,8963,862,6352,3718,14759,490,7336,78,996,13,51754],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852},{"id":9,"end":29.5,"seek":0,"text":" The smaller they are, the better they are.","start":27.78,"words":[{"end":28,"word":" The","start":27.78,"probability":0.8174500465393066},{"end":28.2,"word":" smaller","start":28,"probability":0.7499929666519165},{"end":28.42,"word":" they","start":28.2,"probability":0.7381458878517151},{"end":28.68,"word":" are,","start":28.42,"probability":0.8937985897064209},{"end":28.88,"word":" the","start":28.68,"probability":0.813441812992096},{"end":29.02,"word":" better","start":28.88,"probability":0.8151369690895081},{"end":29.22,"word":" they","start":29.02,"probability":0.7374864816665649},{"end":29.5,"word":" are.","start":29.22,"probability":0.8918095827102661}],"tokens":[51754,440,4356,436,366,11,264,1101,436,366,13,51854],"avg_logprob":-0.29071213478265806,"temperature":0,"no_speech_prob":0.5242696404457092,"compression_ratio":2.023206751054852}],"translation":null,"transcription":" This is the Micro Machine Man presenting the most midget miniature motorcade of micro machines. Each one has dramatic details, terrific trim, precision paint jobs, plus incredible micro machine pocket play sets. There's a police station, fire station, restaurant, service station, and more. Perfect pocket portables to take anyplace. And there are many miniature play sets to play with and each one comes with its own special edition micro machine vehicle and fun fantastic features that miraculously move. Raise the boat lift at the airport, marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge. And these play sets fit together to form a micro machine world. Micro machine pocket play sets so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all. Micro machines and micro machine pocket play sets sold separately from Galoob. The smaller they are, the better they are.","detected_language":"english"}

Performance Metrics

46.07s Prediction Time
399.49s Total Time
All Input Parameters
{
  "audio": "https://replicate.delivery/pbxt/JOMXkhgXe8wbOGqnQ5LwJUulwpbcvd1NZUYKFe1TMPGaXg24/1987%20Micro%20Machines%20Car%20Playset%20Commercial%20%28Featuring%20John%20Moschitta%20the%20Micro%20Machine%20Man%29.mp3",
  "model": "large-v2",
  "translate": false,
  "temperature": 0,
  "transcription": "plain text",
  "suppress_tokens": "-1",
  "word_timestamps": true,
  "logprob_threshold": -1,
  "no_speech_threshold": 0.6,
  "condition_on_previous_text": true,
  "compression_ratio_threshold": 2.4,
  "temperature_increment_on_fallback": 0.2
}
Input Parameters
audio (required) Type: string
Audio file
model Default: large-v2
Choose a Whisper model.
language
language spoken in the audio, specify None to perform language detection
patience Type: number
optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
translate Type: booleanDefault: false
Translate the text to English when set to True
temperature Type: numberDefault: 0
temperature to use for sampling
transcription Default: plain text
Choose the format for the transcription
initial_prompt Type: string
optional text to provide as a prompt for the first window.
suppress_tokens Type: stringDefault: -1
comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
word_timestamps Type: booleanDefault: true
Improves the accuracy of the timestamps by using word-level timestamps
logprob_threshold Type: numberDefault: -1
if the average log probability is lower than this value, treat the decoding as failed
no_speech_threshold Type: numberDefault: 0.6
if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
condition_on_previous_text Type: booleanDefault: true
if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
compression_ratio_threshold Type: numberDefault: 2.4
if the gzip compression ratio is higher than this value, treat the decoding as failed
temperature_increment_on_fallback Type: numberDefault: 0.2
temperature to increase when falling back when the decoding fails to meet either of the thresholds below
Output Schema

Output

Example Execution Logs
Transcribe with large-v2 model
Detected language: English
  0%|          | 0/3014 [00:00<?, ?frames/s]
100%|█████████▉| 3000/3014 [00:20<00:00, 146.43frames/s]
100%|█████████▉| 3000/3014 [00:38<00:00, 78.77frames/s]
Version Details
Version ID
20de0792d38812ce94a0ba8e699b3416cbdc75486ed660db12deeb1b28f35bb6
Version Created
August 22, 2023
Run on Replicate →