hnesk/whisper-wordtimestamps 🖼️❓🔢📝✓ → ❓

▶️ 1.5M runs 📅 Apr 2023 ⚙️ Cog 0.6.1 🔗 GitHub 📄 Paper ⚖️ License
language-detection speech-to-text word-timestamps

About

openai/whisper with exposed settings for word_timestamps

Example Output

Output

{"segments":[{"id":0,"end":21.98,"seek":0,"text":" Ich bin Karstadtdetektiv, ich bin direktiv von Karstadt.","start":7.02,"words":[{"end":7.58,"word":" Ich","start":7.02,"probability":0.2817317247390747},{"end":7.78,"word":" bin","start":7.58,"probability":0.9665520787239075},{"end":8.8,"word":" Karstadtdetektiv,","start":7.78,"probability":0.40456129983285793},{"end":9.34,"word":" ich","start":9.04,"probability":0.5658414363861084},{"end":9.54,"word":" bin","start":9.34,"probability":0.9535486698150635},{"end":13.86,"word":" direktiv","start":9.54,"probability":0.5133590623736382},{"end":21.58,"word":" von","start":13.86,"probability":0.5105449557304382},{"end":21.98,"word":" Karstadt.","start":21.58,"probability":0.9696219563484192}],"tokens":[50364,3141,5171,8009,34511,17863,8192,592,11,1893,5171,20315,592,2957,8009,34511,13,51484],"avg_logprob":-0.526286150958087,"temperature":0,"no_speech_prob":0.29363149404525757,"compression_ratio":1.2659574468085106},{"id":1,"end":25.68,"seek":0,"text":" Ihr könnt klauen was ihr wollt, ich werd niemanden verwarten.","start":22.18,"words":[{"end":22.26,"word":" Ihr","start":22.18,"probability":0.42241692543029785},{"end":22.52,"word":" könnt","start":22.26,"probability":0.9941837191581726},{"end":22.96,"word":" klauen","start":22.52,"probability":0.6671548634767532},{"end":23.22,"word":" was","start":22.96,"probability":0.49221479892730713},{"end":23.4,"word":" ihr","start":23.22,"probability":0.9632819294929504},{"end":23.58,"word":" wollt,","start":23.4,"probability":0.9941409230232239},{"end":24.12,"word":" ich","start":23.9,"probability":0.8581060767173767},{"end":24.38,"word":" werd","start":24.12,"probability":0.7243250012397766},{"end":25,"word":" niemanden","start":24.38,"probability":0.8333843648433685},{"end":25.68,"word":" verwarten.","start":25,"probability":0.5471991151571274}],"tokens":[51484,14773,22541,33337,7801,390,5553,45826,11,1893,37258,32390,268,24615,11719,13,51684],"avg_logprob":-0.526286150958087,"temperature":0,"no_speech_prob":0.29363149404525757,"compression_ratio":1.2659574468085106},{"id":2,"end":30.98,"seek":2568,"text":" Alles was ich will ist ein Freund, mit dem ich nie verzocken kann.","start":25.88,"words":[{"end":26.4,"word":" Alles","start":25.88,"probability":0.2442339062690735},{"end":26.86,"word":" was","start":26.4,"probability":0.7699336409568787},{"end":27.1,"word":" ich","start":26.86,"probability":0.986125648021698},{"end":27.38,"word":" will","start":27.1,"probability":0.9883692264556885},{"end":27.68,"word":" ist","start":27.38,"probability":0.8179328441619873},{"end":27.8,"word":" ein","start":27.68,"probability":0.7604857683181763},{"end":28.04,"word":" Freund,","start":27.8,"probability":0.9988574981689453},{"end":29.26,"word":" mit","start":29.02,"probability":0.9271395802497864},{"end":29.68,"word":" dem","start":29.26,"probability":0.9874215126037598},{"end":29.9,"word":" ich","start":29.68,"probability":0.9966415166854858},{"end":30.1,"word":" nie","start":29.9,"probability":0.9423603415489197},{"end":30.76,"word":" verzocken","start":30.1,"probability":0.7202340215444565},{"end":30.98,"word":" kann.","start":30.76,"probability":0.9964371919631958}],"tokens":[50384,27633,390,1893,486,1418,1343,29685,11,2194,1371,1893,2838,43945,1560,268,4028,13,50634],"avg_logprob":-0.3213437625340053,"temperature":0,"no_speech_prob":0.8551008701324463,"compression_ratio":1.467741935483871},{"id":3,"end":35.56,"seek":2568,"text":" Wer mir gestern was selbst schenkt, mit mir saufen geht oder reizenden.","start":31.32,"words":[{"end":31.5,"word":" Wer","start":31.32,"probability":0.42606183886528015},{"end":31.74,"word":" mir","start":31.5,"probability":0.9747925996780396},{"end":32.22,"word":" gestern","start":31.74,"probability":0.5727033764123917},{"end":32.46,"word":" was","start":32.22,"probability":0.4778894782066345},{"end":32.66,"word":" selbst","start":32.46,"probability":0.44407567381858826},{"end":33.12,"word":" schenkt,","start":32.66,"probability":0.9743325710296631},{"end":33.36,"word":" mit","start":33.22,"probability":0.8650322556495667},{"end":33.58,"word":" mir","start":33.36,"probability":0.9979019165039062},{"end":34.02,"word":" saufen","start":33.58,"probability":0.828359067440033},{"end":34.34,"word":" geht","start":34.02,"probability":0.7603467702865601},{"end":34.72,"word":" oder","start":34.34,"probability":0.8149535059928894},{"end":35.56,"word":" reizenden.","start":34.72,"probability":0.37491480509440106}],"tokens":[50634,14255,3149,7219,1248,390,13053,956,268,2320,11,2194,3149,601,19890,7095,4513,319,590,8896,13,50884],"avg_logprob":-0.3213437625340053,"temperature":0,"no_speech_prob":0.8551008701324463,"compression_ratio":1.467741935483871},{"id":4,"end":39.16,"seek":2568,"text":" Einfach ein Freund, ein Freund, ein Freund.","start":36.28,"words":[{"end":36.8,"word":" Einfach","start":36.28,"probability":0.9038772881031036},{"end":37.06,"word":" ein","start":36.8,"probability":0.9908062815666199},{"end":37.28,"word":" Freund,","start":37.06,"probability":0.9923811554908752},{"end":38,"word":" ein","start":37.88,"probability":0.9502979516983032},{"end":38.22,"word":" Freund,","start":38,"probability":0.9989782571792603},{"end":38.94,"word":" ein","start":38.7,"probability":0.9811621904373169},{"end":39.16,"word":" Freund.","start":38.94,"probability":0.9972319006919861}],"tokens":[50884,6391,6749,1343,29685,11,1343,29685,11,1343,29685,13,51284],"avg_logprob":-0.3213437625340053,"temperature":0,"no_speech_prob":0.8551008701324463,"compression_ratio":1.467741935483871},{"id":5,"end":42.86,"seek":3916,"text":" Einfach ein Freund, ein Freund, ein Freund.","start":39.68,"words":[{"end":40.48,"word":" Einfach","start":39.68,"probability":0.22589362040162086},{"end":40.74,"word":" ein","start":40.48,"probability":0.9541503190994263},{"end":40.92,"word":" Freund,","start":40.74,"probability":0.9495833516120911},{"end":41.74,"word":" ein","start":41.34,"probability":0.9523187279701233},{"end":41.96,"word":" Freund,","start":41.74,"probability":0.9957629442214966},{"end":42.68,"word":" ein","start":42.68,"probability":0.9834578037261963},{"end":42.86,"word":" Freund.","start":42.68,"probability":0.99755859375}],"tokens":[50384,6391,6749,1343,29685,11,1343,29685,11,1343,29685,13,50634],"avg_logprob":-0.24916154146194458,"temperature":0,"no_speech_prob":0.49964576959609985,"compression_ratio":1.4791666666666667},{"id":6,"end":46.38,"seek":3916,"text":" Willst du mein Freund sein?","start":44.54,"words":[{"end":45.34,"word":" Willst","start":44.54,"probability":0.9553532004356384},{"end":45.4,"word":" du","start":45.34,"probability":0.9101346135139465},{"end":45.64,"word":" mein","start":45.4,"probability":0.9919112324714661},{"end":45.84,"word":" Freund","start":45.64,"probability":0.9959916472434998},{"end":46.38,"word":" sein?","start":45.84,"probability":0.9968083500862122}],"tokens":[50634,3099,372,1581,10777,29685,6195,30,51134],"avg_logprob":-0.24916154146194458,"temperature":0,"no_speech_prob":0.49964576959609985,"compression_ratio":1.4791666666666667},{"id":7,"end":50,"seek":4638,"text":" Willst du mein Freund sein?","start":48.32,"words":[{"end":48.98,"word":" Willst","start":48.32,"probability":0.5573687106370926},{"end":49.06,"word":" du","start":48.98,"probability":0.8228787779808044},{"end":49.26,"word":" mein","start":49.06,"probability":0.975639283657074},{"end":49.54,"word":" Freund","start":49.26,"probability":0.9768760800361633},{"end":50,"word":" sein?","start":49.54,"probability":0.993625283241272}],"tokens":[50384,3099,372,1581,10777,29685,6195,30,50784],"avg_logprob":-0.14141775214153787,"temperature":0,"no_speech_prob":0.5984991788864136,"compression_ratio":1.2564102564102564},{"id":8,"end":69.96,"seek":4638,"text":" Ich bin Karstadtdetektiv, ich bin direktiv von Karstadt.","start":54.58,"words":[{"end":55.24,"word":" Ich","start":54.58,"probability":0.6615683436393738},{"end":55.7,"word":" bin","start":55.24,"probability":0.9641287922859192},{"end":56.78,"word":" Karstadtdetektiv,","start":55.7,"probability":0.41106623102095907},{"end":57.32,"word":" ich","start":57.1,"probability":0.8154241442680359},{"end":57.52,"word":" bin","start":57.32,"probability":0.9685680270195007},{"end":65.54,"word":" direktiv","start":57.52,"probability":0.5172673519700766},{"end":69.56,"word":" von","start":65.54,"probability":0.18760497868061066},{"end":69.96,"word":" Karstadt.","start":69.56,"probability":0.9855942130088806}],"tokens":[50784,3141,5171,8009,34511,17863,8192,592,11,1893,5171,20315,592,2957,8009,34511,13,51534],"avg_logprob":-0.14141775214153787,"temperature":0,"no_speech_prob":0.5984991788864136,"compression_ratio":1.2564102564102564},{"id":9,"end":73.68,"seek":4638,"text":" Ihr könnt klauen was ihr wollt, ich werd niemanden verwarten.","start":70.16,"words":[{"end":70.26,"word":" Ihr","start":70.16,"probability":0.8053303956985474},{"end":70.5,"word":" könnt","start":70.26,"probability":0.9969340562820435},{"end":70.94,"word":" klauen","start":70.5,"probability":0.6660601794719696},{"end":71.22,"word":" was","start":70.94,"probability":0.4267093539237976},{"end":71.38,"word":" ihr","start":71.22,"probability":0.9836962223052979},{"end":71.58,"word":" wollt,","start":71.38,"probability":0.9930354952812195},{"end":72.16,"word":" ich","start":71.92,"probability":0.9728512167930603},{"end":72.38,"word":" werd","start":72.16,"probability":0.7974334359169006},{"end":72.96,"word":" niemanden","start":72.38,"probability":0.6999421715736389},{"end":73.68,"word":" verwarten.","start":72.96,"probability":0.5133467316627502}],"tokens":[51534,14773,22541,33337,7801,390,5553,45826,11,1893,37258,32390,268,24615,11719,13,51734],"avg_logprob":-0.14141775214153787,"temperature":0,"no_speech_prob":0.5984991788864136,"compression_ratio":1.2564102564102564},{"id":10,"end":77.92,"seek":7368,"text":" Alles was ich will ist ein Freund, ein Freund, ein Freund.","start":73.82,"words":[{"end":74.42,"word":" Alles","start":73.82,"probability":0.6954259872436523},{"end":74.9,"word":" was","start":74.42,"probability":0.6987268328666687},{"end":75.12,"word":" ich","start":74.9,"probability":0.9345102906227112},{"end":75.38,"word":" will","start":75.12,"probability":0.9823461174964905},{"end":75.7,"word":" ist","start":75.38,"probability":0.7122949361801147},{"end":75.9,"word":" ein","start":75.7,"probability":0.9797828793525696},{"end":76.06,"word":" Freund,","start":75.9,"probability":0.9114095568656921},{"end":76.7,"word":" ein","start":76.44,"probability":0.9296829104423523},{"end":77.02,"word":" Freund,","start":76.7,"probability":0.9966609477996826},{"end":77.64,"word":" ein","start":77.46,"probability":0.9915374517440796},{"end":77.92,"word":" Freund.","start":77.64,"probability":0.993537962436676}],"tokens":[50384,27633,390,1893,486,1418,1343,29685,11,1343,29685,11,1343,29685,13,50634],"avg_logprob":-0.08094528142143698,"temperature":0,"no_speech_prob":0.19116559624671936,"compression_ratio":1.288888888888889}],"transcription":" Ich bin Karstadtdetektiv, ich bin direktiv von Karstadt. Ihr könnt klauen was ihr wollt, ich werd niemanden verwarten. Alles was ich will ist ein Freund, mit dem ich nie verzocken kann. Wer mir gestern was selbst schenkt, mit mir saufen geht oder reizenden. Einfach ein Freund, ein Freund, ein Freund. Einfach ein Freund, ein Freund, ein Freund. Willst du mein Freund sein? Willst du mein Freund sein? Ich bin Karstadtdetektiv, ich bin direktiv von Karstadt. Ihr könnt klauen was ihr wollt, ich werd niemanden verwarten. Alles was ich will ist ein Freund, ein Freund, ein Freund.","detected_language":"german"}

Performance Metrics

12.34s Prediction Time
12.44s Total Time
All Input Parameters
{
  "audio": "https://replicate.delivery/pbxt/IfYtYMI5B23lFkUoI7zDtehuLw2NzKCoJpmJQvSVGD5l3gfY/vocals.mp3",
  "model": "large-v1",
  "initial_prompt": "Karstadtdetektiv",
  "suppress_tokens": "-1",
  "word_timestamps": true,
  "logprob_threshold": -1,
  "append_punctuations": "\"'.。,,!!??::”)]}、",
  "no_speech_threshold": 0.6,
  "prepend_punctuations": "\"'“¿([{-",
  "condition_on_previous_text": true,
  "compression_ratio_threshold": 2,
  "temperature_increment_on_fallback": 0.2
}
Input Parameters
audio (required) Type: string
Audio file
model Default: base
Choose a Whisper model.
language
language spoken in the audio, specify None to perform language detection
patience Type: number
optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
temperature Type: numberDefault: 0
temperature to use for sampling
initial_prompt Type: string
optional text to provide as a prompt for the first window.
suppress_tokens Type: stringDefault: -1
comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
word_timestamps Type: booleanDefault: false
Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.
logprob_threshold Type: numberDefault: -1
if the average log probability is lower than this value, treat the decoding as failed
append_punctuations Type: stringDefault: "'.。,,!!??::”)]}、
If word_timestamps is True, merge these punctuation symbols with the previous word
no_speech_threshold Type: numberDefault: 0.6
if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
prepend_punctuations Type: stringDefault: "'“¿([{-
If word_timestamps is True, merge these punctuation symbols with the next word
condition_on_previous_text Type: booleanDefault: true
if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
compression_ratio_threshold Type: numberDefault: 2.4
if the gzip compression ratio is higher than this value, treat the decoding as failed
temperature_increment_on_fallback Type: numberDefault: 0.2
temperature to increase when falling back when the decoding fails to meet either of the thresholds below
Output Schema

Output

Example Execution Logs
Transcribe with large-v1 model
Version Details
Version ID
4a60104c44dd709fc08a03dfeca6c6906257633dd03fd58663ec896a4eeba30e
Version Created
April 17, 2023
Run on Replicate →