hnesk/whisper-wordtimestamps 🖼️❓🔢📝✓ → ❓
About
openai/whisper with exposed settings for word_timestamps
Example Output
Output
{"segments":[{"id":0,"end":21.98,"seek":0,"text":" Ich bin Karstadtdetektiv, ich bin direktiv von Karstadt.","start":7.02,"words":[{"end":7.58,"word":" Ich","start":7.02,"probability":0.2817317247390747},{"end":7.78,"word":" bin","start":7.58,"probability":0.9665520787239075},{"end":8.8,"word":" Karstadtdetektiv,","start":7.78,"probability":0.40456129983285793},{"end":9.34,"word":" ich","start":9.04,"probability":0.5658414363861084},{"end":9.54,"word":" bin","start":9.34,"probability":0.9535486698150635},{"end":13.86,"word":" direktiv","start":9.54,"probability":0.5133590623736382},{"end":21.58,"word":" von","start":13.86,"probability":0.5105449557304382},{"end":21.98,"word":" Karstadt.","start":21.58,"probability":0.9696219563484192}],"tokens":[50364,3141,5171,8009,34511,17863,8192,592,11,1893,5171,20315,592,2957,8009,34511,13,51484],"avg_logprob":-0.526286150958087,"temperature":0,"no_speech_prob":0.29363149404525757,"compression_ratio":1.2659574468085106},{"id":1,"end":25.68,"seek":0,"text":" Ihr könnt klauen was ihr wollt, ich werd niemanden verwarten.","start":22.18,"words":[{"end":22.26,"word":" Ihr","start":22.18,"probability":0.42241692543029785},{"end":22.52,"word":" könnt","start":22.26,"probability":0.9941837191581726},{"end":22.96,"word":" klauen","start":22.52,"probability":0.6671548634767532},{"end":23.22,"word":" was","start":22.96,"probability":0.49221479892730713},{"end":23.4,"word":" ihr","start":23.22,"probability":0.9632819294929504},{"end":23.58,"word":" wollt,","start":23.4,"probability":0.9941409230232239},{"end":24.12,"word":" ich","start":23.9,"probability":0.8581060767173767},{"end":24.38,"word":" werd","start":24.12,"probability":0.7243250012397766},{"end":25,"word":" niemanden","start":24.38,"probability":0.8333843648433685},{"end":25.68,"word":" verwarten.","start":25,"probability":0.5471991151571274}],"tokens":[51484,14773,22541,33337,7801,390,5553,45826,11,1893,37258,32390,268,24615,11719,13,51684],"avg_logprob":-0.526286150958087,"temperature":0,"no_speech_prob":0.29363149404525757,"compression_ratio":1.2659574468085106},{"id":2,"end":30.98,"seek":2568,"text":" Alles was ich will ist ein Freund, mit dem ich nie verzocken kann.","start":25.88,"words":[{"end":26.4,"word":" Alles","start":25.88,"probability":0.2442339062690735},{"end":26.86,"word":" was","start":26.4,"probability":0.7699336409568787},{"end":27.1,"word":" ich","start":26.86,"probability":0.986125648021698},{"end":27.38,"word":" will","start":27.1,"probability":0.9883692264556885},{"end":27.68,"word":" ist","start":27.38,"probability":0.8179328441619873},{"end":27.8,"word":" ein","start":27.68,"probability":0.7604857683181763},{"end":28.04,"word":" Freund,","start":27.8,"probability":0.9988574981689453},{"end":29.26,"word":" mit","start":29.02,"probability":0.9271395802497864},{"end":29.68,"word":" dem","start":29.26,"probability":0.9874215126037598},{"end":29.9,"word":" ich","start":29.68,"probability":0.9966415166854858},{"end":30.1,"word":" nie","start":29.9,"probability":0.9423603415489197},{"end":30.76,"word":" verzocken","start":30.1,"probability":0.7202340215444565},{"end":30.98,"word":" kann.","start":30.76,"probability":0.9964371919631958}],"tokens":[50384,27633,390,1893,486,1418,1343,29685,11,2194,1371,1893,2838,43945,1560,268,4028,13,50634],"avg_logprob":-0.3213437625340053,"temperature":0,"no_speech_prob":0.8551008701324463,"compression_ratio":1.467741935483871},{"id":3,"end":35.56,"seek":2568,"text":" Wer mir gestern was selbst schenkt, mit mir saufen geht oder reizenden.","start":31.32,"words":[{"end":31.5,"word":" Wer","start":31.32,"probability":0.42606183886528015},{"end":31.74,"word":" mir","start":31.5,"probability":0.9747925996780396},{"end":32.22,"word":" gestern","start":31.74,"probability":0.5727033764123917},{"end":32.46,"word":" was","start":32.22,"probability":0.4778894782066345},{"end":32.66,"word":" selbst","start":32.46,"probability":0.44407567381858826},{"end":33.12,"word":" schenkt,","start":32.66,"probability":0.9743325710296631},{"end":33.36,"word":" mit","start":33.22,"probability":0.8650322556495667},{"end":33.58,"word":" mir","start":33.36,"probability":0.9979019165039062},{"end":34.02,"word":" saufen","start":33.58,"probability":0.828359067440033},{"end":34.34,"word":" geht","start":34.02,"probability":0.7603467702865601},{"end":34.72,"word":" oder","start":34.34,"probability":0.8149535059928894},{"end":35.56,"word":" reizenden.","start":34.72,"probability":0.37491480509440106}],"tokens":[50634,14255,3149,7219,1248,390,13053,956,268,2320,11,2194,3149,601,19890,7095,4513,319,590,8896,13,50884],"avg_logprob":-0.3213437625340053,"temperature":0,"no_speech_prob":0.8551008701324463,"compression_ratio":1.467741935483871},{"id":4,"end":39.16,"seek":2568,"text":" Einfach ein Freund, ein Freund, ein Freund.","start":36.28,"words":[{"end":36.8,"word":" Einfach","start":36.28,"probability":0.9038772881031036},{"end":37.06,"word":" ein","start":36.8,"probability":0.9908062815666199},{"end":37.28,"word":" Freund,","start":37.06,"probability":0.9923811554908752},{"end":38,"word":" ein","start":37.88,"probability":0.9502979516983032},{"end":38.22,"word":" Freund,","start":38,"probability":0.9989782571792603},{"end":38.94,"word":" ein","start":38.7,"probability":0.9811621904373169},{"end":39.16,"word":" Freund.","start":38.94,"probability":0.9972319006919861}],"tokens":[50884,6391,6749,1343,29685,11,1343,29685,11,1343,29685,13,51284],"avg_logprob":-0.3213437625340053,"temperature":0,"no_speech_prob":0.8551008701324463,"compression_ratio":1.467741935483871},{"id":5,"end":42.86,"seek":3916,"text":" Einfach ein Freund, ein Freund, ein Freund.","start":39.68,"words":[{"end":40.48,"word":" Einfach","start":39.68,"probability":0.22589362040162086},{"end":40.74,"word":" ein","start":40.48,"probability":0.9541503190994263},{"end":40.92,"word":" Freund,","start":40.74,"probability":0.9495833516120911},{"end":41.74,"word":" ein","start":41.34,"probability":0.9523187279701233},{"end":41.96,"word":" Freund,","start":41.74,"probability":0.9957629442214966},{"end":42.68,"word":" ein","start":42.68,"probability":0.9834578037261963},{"end":42.86,"word":" Freund.","start":42.68,"probability":0.99755859375}],"tokens":[50384,6391,6749,1343,29685,11,1343,29685,11,1343,29685,13,50634],"avg_logprob":-0.24916154146194458,"temperature":0,"no_speech_prob":0.49964576959609985,"compression_ratio":1.4791666666666667},{"id":6,"end":46.38,"seek":3916,"text":" Willst du mein Freund sein?","start":44.54,"words":[{"end":45.34,"word":" Willst","start":44.54,"probability":0.9553532004356384},{"end":45.4,"word":" du","start":45.34,"probability":0.9101346135139465},{"end":45.64,"word":" mein","start":45.4,"probability":0.9919112324714661},{"end":45.84,"word":" Freund","start":45.64,"probability":0.9959916472434998},{"end":46.38,"word":" sein?","start":45.84,"probability":0.9968083500862122}],"tokens":[50634,3099,372,1581,10777,29685,6195,30,51134],"avg_logprob":-0.24916154146194458,"temperature":0,"no_speech_prob":0.49964576959609985,"compression_ratio":1.4791666666666667},{"id":7,"end":50,"seek":4638,"text":" Willst du mein Freund sein?","start":48.32,"words":[{"end":48.98,"word":" Willst","start":48.32,"probability":0.5573687106370926},{"end":49.06,"word":" du","start":48.98,"probability":0.8228787779808044},{"end":49.26,"word":" mein","start":49.06,"probability":0.975639283657074},{"end":49.54,"word":" Freund","start":49.26,"probability":0.9768760800361633},{"end":50,"word":" sein?","start":49.54,"probability":0.993625283241272}],"tokens":[50384,3099,372,1581,10777,29685,6195,30,50784],"avg_logprob":-0.14141775214153787,"temperature":0,"no_speech_prob":0.5984991788864136,"compression_ratio":1.2564102564102564},{"id":8,"end":69.96,"seek":4638,"text":" Ich bin Karstadtdetektiv, ich bin direktiv von Karstadt.","start":54.58,"words":[{"end":55.24,"word":" Ich","start":54.58,"probability":0.6615683436393738},{"end":55.7,"word":" bin","start":55.24,"probability":0.9641287922859192},{"end":56.78,"word":" Karstadtdetektiv,","start":55.7,"probability":0.41106623102095907},{"end":57.32,"word":" ich","start":57.1,"probability":0.8154241442680359},{"end":57.52,"word":" bin","start":57.32,"probability":0.9685680270195007},{"end":65.54,"word":" direktiv","start":57.52,"probability":0.5172673519700766},{"end":69.56,"word":" von","start":65.54,"probability":0.18760497868061066},{"end":69.96,"word":" Karstadt.","start":69.56,"probability":0.9855942130088806}],"tokens":[50784,3141,5171,8009,34511,17863,8192,592,11,1893,5171,20315,592,2957,8009,34511,13,51534],"avg_logprob":-0.14141775214153787,"temperature":0,"no_speech_prob":0.5984991788864136,"compression_ratio":1.2564102564102564},{"id":9,"end":73.68,"seek":4638,"text":" Ihr könnt klauen was ihr wollt, ich werd niemanden verwarten.","start":70.16,"words":[{"end":70.26,"word":" Ihr","start":70.16,"probability":0.8053303956985474},{"end":70.5,"word":" könnt","start":70.26,"probability":0.9969340562820435},{"end":70.94,"word":" klauen","start":70.5,"probability":0.6660601794719696},{"end":71.22,"word":" was","start":70.94,"probability":0.4267093539237976},{"end":71.38,"word":" ihr","start":71.22,"probability":0.9836962223052979},{"end":71.58,"word":" wollt,","start":71.38,"probability":0.9930354952812195},{"end":72.16,"word":" ich","start":71.92,"probability":0.9728512167930603},{"end":72.38,"word":" werd","start":72.16,"probability":0.7974334359169006},{"end":72.96,"word":" niemanden","start":72.38,"probability":0.6999421715736389},{"end":73.68,"word":" verwarten.","start":72.96,"probability":0.5133467316627502}],"tokens":[51534,14773,22541,33337,7801,390,5553,45826,11,1893,37258,32390,268,24615,11719,13,51734],"avg_logprob":-0.14141775214153787,"temperature":0,"no_speech_prob":0.5984991788864136,"compression_ratio":1.2564102564102564},{"id":10,"end":77.92,"seek":7368,"text":" Alles was ich will ist ein Freund, ein Freund, ein Freund.","start":73.82,"words":[{"end":74.42,"word":" Alles","start":73.82,"probability":0.6954259872436523},{"end":74.9,"word":" was","start":74.42,"probability":0.6987268328666687},{"end":75.12,"word":" ich","start":74.9,"probability":0.9345102906227112},{"end":75.38,"word":" will","start":75.12,"probability":0.9823461174964905},{"end":75.7,"word":" ist","start":75.38,"probability":0.7122949361801147},{"end":75.9,"word":" ein","start":75.7,"probability":0.9797828793525696},{"end":76.06,"word":" Freund,","start":75.9,"probability":0.9114095568656921},{"end":76.7,"word":" ein","start":76.44,"probability":0.9296829104423523},{"end":77.02,"word":" Freund,","start":76.7,"probability":0.9966609477996826},{"end":77.64,"word":" ein","start":77.46,"probability":0.9915374517440796},{"end":77.92,"word":" Freund.","start":77.64,"probability":0.993537962436676}],"tokens":[50384,27633,390,1893,486,1418,1343,29685,11,1343,29685,11,1343,29685,13,50634],"avg_logprob":-0.08094528142143698,"temperature":0,"no_speech_prob":0.19116559624671936,"compression_ratio":1.288888888888889}],"transcription":" Ich bin Karstadtdetektiv, ich bin direktiv von Karstadt. Ihr könnt klauen was ihr wollt, ich werd niemanden verwarten. Alles was ich will ist ein Freund, mit dem ich nie verzocken kann. Wer mir gestern was selbst schenkt, mit mir saufen geht oder reizenden. Einfach ein Freund, ein Freund, ein Freund. Einfach ein Freund, ein Freund, ein Freund. Willst du mein Freund sein? Willst du mein Freund sein? Ich bin Karstadtdetektiv, ich bin direktiv von Karstadt. Ihr könnt klauen was ihr wollt, ich werd niemanden verwarten. Alles was ich will ist ein Freund, ein Freund, ein Freund.","detected_language":"german"}
Performance Metrics
12.34s
Prediction Time
12.44s
Total Time
All Input Parameters
{ "audio": "https://replicate.delivery/pbxt/IfYtYMI5B23lFkUoI7zDtehuLw2NzKCoJpmJQvSVGD5l3gfY/vocals.mp3", "model": "large-v1", "initial_prompt": "Karstadtdetektiv", "suppress_tokens": "-1", "word_timestamps": true, "logprob_threshold": -1, "append_punctuations": "\"'.。,,!!??::”)]}、", "no_speech_threshold": 0.6, "prepend_punctuations": "\"'“¿([{-", "condition_on_previous_text": true, "compression_ratio_threshold": 2, "temperature_increment_on_fallback": 0.2 }
Input Parameters
- audio (required)
- Audio file
- model
- Choose a Whisper model.
- language
- language spoken in the audio, specify None to perform language detection
- patience
- optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
- temperature
- temperature to use for sampling
- initial_prompt
- optional text to provide as a prompt for the first window.
- suppress_tokens
- comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
- word_timestamps
- Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.
- logprob_threshold
- if the average log probability is lower than this value, treat the decoding as failed
- append_punctuations
- If word_timestamps is True, merge these punctuation symbols with the previous word
- no_speech_threshold
- if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
- prepend_punctuations
- If word_timestamps is True, merge these punctuation symbols with the next word
- condition_on_previous_text
- if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
- compression_ratio_threshold
- if the gzip compression ratio is higher than this value, treat the decoding as failed
- temperature_increment_on_fallback
- temperature to increase when falling back when the decoding fails to meet either of the thresholds below
Output Schema
Output
Example Execution Logs
Transcribe with large-v1 model
Version Details
- Version ID
4a60104c44dd709fc08a03dfeca6c6906257633dd03fd58663ec896a4eeba30e
- Version Created
- April 17, 2023