collectiveai-team/whisper-wordtimestamps 🖼️❓🔢🔊📝✓ → ❓

▶️ 1.3K runs 📅 Sep 2023 ⚙️ Cog 0.8.3 🔗 GitHub ⚖️ License
language-detection speech-to-text word-level-timestamps

About

API for enhanced word-level timestamp accuracy using OpenAI's Whisper model

Example Output

Output

{"segments":[{"id":0,"end":6.2,"seek":0,"text":" What are some cool synthetic organisms that you think about you dream about when you think about embodied mind","start":0,"tokens":[50364,708,366,512,1627,23420,22110,300,291,519,466,291,3055,466,562,291,519,466,42046,1575,50674],"avg_logprob":-0.2868123466585889,"temperature":0,"no_speech_prob":0.0683993548154831,"compression_ratio":1.7454545454545454},{"id":1,"end":13.48,"seek":0,"text":" What do you imagine what do you hope to build yeah on a practical level what I really hope to do is","start":6.72,"tokens":[50700,708,360,291,3811,437,360,291,1454,281,1322,1338,322,257,8496,1496,437,286,534,1454,281,360,307,51038],"avg_logprob":-0.2868123466585889,"temperature":0,"no_speech_prob":0.0683993548154831,"compression_ratio":1.7454545454545454},{"id":2,"end":18,"seek":0,"text":" To gain enough of an understanding of the embodied intelligence of","start":14.040000000000001,"tokens":[51066,1407,6052,1547,295,364,3701,295,264,42046,7599,295,51264],"avg_logprob":-0.2868123466585889,"temperature":0,"no_speech_prob":0.0683993548154831,"compression_ratio":1.7454545454545454},{"id":3,"end":25.76,"seek":0,"text":" organs and tissues such that we can achieve a radically different regenerative medicine so that we can say","start":19.080000000000002,"tokens":[51318,20659,293,27353,1270,300,321,393,4584,257,35508,819,26358,1166,7195,370,300,321,393,584,51652],"avg_logprob":-0.2868123466585889,"temperature":0,"no_speech_prob":0.0683993548154831,"compression_ratio":1.7454545454545454},{"id":4,"end":32,"seek":2576,"text":" Basically and I think about it as um, you know in terms of like okay, can you what's the what's the?","start":26.48,"tokens":[50400,8537,293,286,519,466,309,382,1105,11,291,458,294,2115,295,411,1392,11,393,291,437,311,264,437,311,264,30,50676],"avg_logprob":-0.26734968226590605,"temperature":0,"no_speech_prob":0.0923895612359047,"compression_ratio":1.8982456140350876},{"id":5,"end":40.400000000000006,"seek":2576,"text":" What's the goal kind of and end and end game for this whole thing to me the end game is something that you would call an anatomical compiler","start":33.04,"tokens":[50728,708,311,264,3387,733,295,293,917,293,917,1216,337,341,1379,551,281,385,264,917,1216,307,746,300,291,576,818,364,21618,298,804,31958,51096],"avg_logprob":-0.26734968226590605,"temperature":0,"no_speech_prob":0.0923895612359047,"compression_ratio":1.8982456140350876},{"id":6,"end":44.8,"seek":2576,"text":" So the idea is you would sit down in front of the computer and you would draw the","start":40.52,"tokens":[51102,407,264,1558,307,291,576,1394,760,294,1868,295,264,3820,293,291,576,2642,264,51316],"avg_logprob":-0.26734968226590605,"temperature":0,"no_speech_prob":0.0923895612359047,"compression_ratio":1.8982456140350876},{"id":7,"end":50.32000000000001,"seek":2576,"text":" The body or the organ that you wanted not not molecular details, but like you this is what I want","start":45.160000000000004,"tokens":[51334,440,1772,420,264,1798,300,291,1415,406,406,19046,4365,11,457,411,291,341,307,437,286,528,51592],"avg_logprob":-0.26734968226590605,"temperature":0,"no_speech_prob":0.0923895612359047,"compression_ratio":1.8982456140350876},{"id":8,"end":55.72,"seek":2576,"text":" I want a six-legged you know frog with a propeller on top or I want I want a heart that looks like this or I want a leg","start":50.32000000000001,"tokens":[51592,286,528,257,2309,12,306,12244,291,458,17259,365,257,25577,4658,322,1192,420,286,528,286,528,257,1917,300,1542,411,341,420,286,528,257,1676,51862],"avg_logprob":-0.26734968226590605,"temperature":0,"no_speech_prob":0.0923895612359047,"compression_ratio":1.8982456140350876},{"id":9,"end":59.08,"seek":5576,"text":" That looks like this and what it would do if we knew what we were doing is","start":55.76,"tokens":[50364,663,1542,411,341,293,437,309,576,360,498,321,2586,437,321,645,884,307,50530],"avg_logprob":-0.18501592839805825,"temperature":0,"no_speech_prob":0.0006984978681430221,"compression_ratio":1.6884615384615385},{"id":10,"end":68.36,"seek":5576,"text":" Put out a convert that anatomical description into a set of stimuli that would have to be given to cells to convince them to build exactly that thing","start":59.72,"tokens":[50562,4935,484,257,7620,300,21618,298,804,3855,666,257,992,295,47752,300,576,362,281,312,2212,281,5438,281,13447,552,281,1322,2293,300,551,50994],"avg_logprob":-0.18501592839805825,"temperature":0,"no_speech_prob":0.0006984978681430221,"compression_ratio":1.6884615384615385},{"id":11,"end":73.24,"seek":5576,"text":" Right, I probably won't live to see it, but I think it's achievable and I think what that if","start":68.6,"tokens":[51006,1779,11,286,1391,1582,380,1621,281,536,309,11,457,286,519,309,311,3538,17915,293,286,519,437,300,498,51238],"avg_logprob":-0.18501592839805825,"temperature":0,"no_speech_prob":0.0006984978681430221,"compression_ratio":1.6884615384615385},{"id":12,"end":82.6,"seek":5576,"text":" If we can have that then that is basically the solution to all of medicine except for infectious disease so birth defects","start":73.64,"tokens":[51258,759,321,393,362,300,550,300,307,1936,264,3827,281,439,295,7195,3993,337,26780,4752,370,3965,32655,51706],"avg_logprob":-0.18501592839805825,"temperature":0,"no_speech_prob":0.0006984978681430221,"compression_ratio":1.6884615384615385},{"id":13,"end":88.75999999999999,"seek":8260,"text":" Right traumatic injury cancer aging degenerative disease if we knew how to tell cells what to build all of those things go away","start":82.6,"tokens":[50364,1779,26456,10454,5592,19090,40520,1166,4752,498,321,2586,577,281,980,5438,437,281,1322,439,295,729,721,352,1314,50672],"avg_logprob":-0.2850281584496592,"temperature":0,"no_speech_prob":0.013411387801170349,"compression_ratio":1.7018867924528303},{"id":14,"end":91.16,"seek":8260,"text":" so those things go away and the","start":88.83999999999999,"tokens":[50676,370,729,721,352,1314,293,264,50792],"avg_logprob":-0.2850281584496592,"temperature":0,"no_speech_prob":0.013411387801170349,"compression_ratio":1.7018867924528303},{"id":15,"end":97.19999999999999,"seek":8260,"text":" positive feedback spiral of economic costs where all of the advances are","start":92.32,"tokens":[50850,3353,5824,25165,295,4836,5497,689,439,295,264,25297,366,51094],"avg_logprob":-0.2850281584496592,"temperature":0,"no_speech_prob":0.013411387801170349,"compression_ratio":1.7018867924528303},{"id":16,"end":104.16,"seek":8260,"text":" Increasingly more heroic and expensive interventions of a synchial ship when you're like 90 and then and so on right all of that goes away","start":97.52,"tokens":[51110,30367,3349,356,544,32915,293,5124,20924,295,257,5451,339,831,5374,562,291,434,411,4289,293,550,293,370,322,558,439,295,300,1709,1314,51442],"avg_logprob":-0.2850281584496592,"temperature":0,"no_speech_prob":0.013411387801170349,"compression_ratio":1.7018867924528303},{"id":17,"end":108.28,"seek":8260,"text":" Because basically instead of trying to fix you up as you as you degrade you you","start":104.16,"tokens":[51442,1436,1936,2602,295,1382,281,3191,291,493,382,291,382,291,368,8692,291,291,51648],"avg_logprob":-0.2850281584496592,"temperature":0,"no_speech_prob":0.013411387801170349,"compression_ratio":1.7018867924528303},{"id":18,"end":113.56,"seek":10828,"text":" Progressively regenerate you know you apply the regenerative medicine early before things degrade","start":109.28,"tokens":[50414,32587,3413,26358,473,291,458,291,3079,264,26358,1166,7195,2440,949,721,368,8692,50628],"avg_logprob":-0.28634646606445313,"temperature":0,"no_speech_prob":0.0018953427206724882,"compression_ratio":1.8064516129032258},{"id":19,"end":118.2,"seek":10828,"text":" So I think that that'll have massive economic impacts over what we're trying to do now","start":113.84,"tokens":[50642,407,286,519,300,300,603,362,5994,4836,11606,670,437,321,434,1382,281,360,586,50860],"avg_logprob":-0.28634646606445313,"temperature":0,"no_speech_prob":0.0018953427206724882,"compression_ratio":1.8064516129032258},{"id":20,"end":124.44,"seek":10828,"text":" Which is not at all sustainable and and that that's what I hope I hope that I hope that we get it so so to me","start":118.2,"tokens":[50860,3013,307,406,412,439,11235,293,293,300,300,311,437,286,1454,286,1454,300,286,1454,300,321,483,309,370,370,281,385,51172],"avg_logprob":-0.28634646606445313,"temperature":0,"no_speech_prob":0.0018953427206724882,"compression_ratio":1.8064516129032258},{"id":21,"end":127.64,"seek":10828,"text":" Yes, the xenobots will be doing useful things","start":124.44,"tokens":[51172,1079,11,264,49773,996,1971,486,312,884,4420,721,51332],"avg_logprob":-0.28634646606445313,"temperature":0,"no_speech_prob":0.0018953427206724882,"compression_ratio":1.8064516129032258},{"id":22,"end":132.44,"seek":10828,"text":" Cleaning up the environment cleaning out, you know, you're you know, you're joints and all that kind of stuff","start":128.2,"tokens":[51360,8834,8415,493,264,2823,8924,484,11,291,458,11,291,434,291,458,11,291,434,19949,293,439,300,733,295,1507,51572],"avg_logprob":-0.28634646606445313,"temperature":0,"no_speech_prob":0.0018953427206724882,"compression_ratio":1.8064516129032258},{"id":23,"end":135.44,"seek":10828,"text":" but more important than that I think we can use","start":132.8,"tokens":[51590,457,544,1021,813,300,286,519,321,393,764,51722],"avg_logprob":-0.28634646606445313,"temperature":0,"no_speech_prob":0.0018953427206724882,"compression_ratio":1.8064516129032258},{"id":24,"end":137.64,"seek":10828,"text":" these","start":136.16,"tokens":[51758,613,51832],"avg_logprob":-0.28634646606445313,"temperature":0,"no_speech_prob":0.0018953427206724882,"compression_ratio":1.8064516129032258},{"id":25,"end":143.67999999999998,"seek":13764,"text":" synthetic systems to try to understand to develop a science of detecting and","start":137.64,"tokens":[50364,23420,3652,281,853,281,1223,281,1499,257,3497,295,40237,293,50666],"avg_logprob":-0.29921891954210067,"temperature":0,"no_speech_prob":0.0003567475650925189,"compression_ratio":1.7651821862348178},{"id":26,"end":150.79999999999998,"seek":13764,"text":" Manipulating the goals of collective intelligence is of cells specifically for regenerative medicine and then sort of beyond that","start":144.32,"tokens":[50698,2458,647,12162,264,5493,295,12590,7599,307,295,5438,4682,337,26358,1166,7195,293,550,1333,295,4399,300,51022],"avg_logprob":-0.29921891954210067,"temperature":0,"no_speech_prob":0.0003567475650925189,"compression_ratio":1.7651821862348178},{"id":27,"end":156.23999999999998,"seek":13764,"text":" If we sort of think further beyond that what I hope is that kind of like what you said all of this drives a","start":150.79999999999998,"tokens":[51022,759,321,1333,295,519,3052,4399,300,437,286,1454,307,300,733,295,411,437,291,848,439,295,341,11754,257,51294],"avg_logprob":-0.29921891954210067,"temperature":0,"no_speech_prob":0.0003567475650925189,"compression_ratio":1.7651821862348178},{"id":28,"end":159.39999999999998,"seek":13764,"text":" reconsideration of how we","start":157.16,"tokens":[51340,40497,399,295,577,321,51452],"avg_logprob":-0.29921891954210067,"temperature":0,"no_speech_prob":0.0003567475650925189,"compression_ratio":1.7651821862348178},{"id":29,"end":160.51999999999998,"seek":13764,"text":" formulate","start":159.56,"tokens":[51460,47881,51508],"avg_logprob":-0.29921891954210067,"temperature":0,"no_speech_prob":0.0003567475650925189,"compression_ratio":1.7651821862348178},{"id":30,"end":165.51999999999998,"seek":13764,"text":" Ethical norms because this old school so so so in the olden days what you could do is","start":160.51999999999998,"tokens":[51508,10540,804,24357,570,341,1331,1395,370,370,370,294,264,1331,268,1708,437,291,727,360,307,51758],"avg_logprob":-0.29921891954210067,"temperature":0,"no_speech_prob":0.0003567475650925189,"compression_ratio":1.7651821862348178},{"id":31,"end":171.48000000000002,"seek":16552,"text":" I just see you we were confronted with something you used so you could tap on it right and if you heard a metallic clanging sound","start":165.92000000000002,"tokens":[50384,286,445,536,291,321,645,31257,365,746,291,1143,370,291,727,5119,322,309,558,293,498,291,2198,257,25759,596,9741,1626,50662],"avg_logprob":-0.2699518270425863,"temperature":0,"no_speech_prob":0.007809792645275593,"compression_ratio":1.7936507936507937},{"id":32,"end":175.4,"seek":16552,"text":" You'd said ah fine, right? So you could conclude it was made in a factory. I can take it apart","start":171.48000000000002,"tokens":[50662,509,1116,848,3716,2489,11,558,30,407,291,727,16886,309,390,1027,294,257,9265,13,286,393,747,309,4936,50858],"avg_logprob":-0.2699518270425863,"temperature":0,"no_speech_prob":0.007809792645275593,"compression_ratio":1.7936507936507937},{"id":33,"end":179.4,"seek":16552,"text":" I can do whatever right if you did that and you got in you sort of a squishy kind of warm","start":175.4,"tokens":[50858,286,393,360,2035,558,498,291,630,300,293,291,658,294,291,1333,295,257,45402,733,295,4561,51058],"avg_logprob":-0.2699518270425863,"temperature":0,"no_speech_prob":0.007809792645275593,"compression_ratio":1.7936507936507937},{"id":34,"end":185.48000000000002,"seek":16552,"text":" Sensation you'd say I need to be you know more or less nice to it and whatever that's not gonna be feasible","start":179.96,"tokens":[51086,318,35292,291,1116,584,286,643,281,312,291,458,544,420,1570,1481,281,309,293,2035,300,311,406,799,312,26648,51362],"avg_logprob":-0.2699518270425863,"temperature":0,"no_speech_prob":0.007809792645275593,"compression_ratio":1.7936507936507937},{"id":35,"end":189.60000000000002,"seek":16552,"text":" It was never really feasible, but it was good enough because we didn't have any we we didn't know any better","start":185.48000000000002,"tokens":[51362,467,390,1128,534,26648,11,457,309,390,665,1547,570,321,994,380,362,604,321,321,994,380,458,604,1101,51568],"avg_logprob":-0.2699518270425863,"temperature":0,"no_speech_prob":0.007809792645275593,"compression_ratio":1.7936507936507937},{"id":36,"end":191.92000000000002,"seek":16552,"text":" that needs to go and I think that","start":189.92000000000002,"tokens":[51584,300,2203,281,352,293,286,519,300,51684],"avg_logprob":-0.2699518270425863,"temperature":0,"no_speech_prob":0.007809792645275593,"compression_ratio":1.7936507936507937},{"id":37,"end":195.92,"seek":19192,"text":" By by breaking down those artificial barriers","start":192.88,"tokens":[50412,3146,538,7697,760,729,11677,13565,50564],"avg_logprob":-0.2813154326544868,"temperature":0,"no_speech_prob":0.0005882359691895545,"compression_ratio":1.5980392156862746},{"id":38,"end":204.48,"seek":19192,"text":" Someday we can try to build a a system of of ethical norms that does not rely on these completely contingent","start":196.72,"tokens":[50604,12297,16826,321,393,853,281,1322,257,257,1185,295,295,18890,24357,300,775,406,10687,322,613,2584,27820,317,50992],"avg_logprob":-0.2813154326544868,"temperature":0,"no_speech_prob":0.0005882359691895545,"compression_ratio":1.5980392156862746},{"id":39,"end":211.44,"seek":19192,"text":" Facts of of our earthly history, but on something much much deeper that you know really takes takes agency and and","start":204.72,"tokens":[51004,479,15295,295,295,527,46262,2503,11,457,322,746,709,709,7731,300,291,458,534,2516,2516,7934,293,293,51340],"avg_logprob":-0.2813154326544868,"temperature":0,"no_speech_prob":0.0005882359691895545,"compression_ratio":1.5980392156862746},{"id":40,"end":214.76,"seek":19192,"text":" the capacity to suffer and all that takes that seriously","start":212,"tokens":[51368,264,6042,281,9753,293,439,300,2516,300,6638,51506],"avg_logprob":-0.2813154326544868,"temperature":0,"no_speech_prob":0.0005882359691895545,"compression_ratio":1.5980392156862746}],"transcription":" What are some cool synthetic organisms that you think about you dream about when you think about embodied mind What do you imagine what do you hope to build yeah on a practical level what I really hope to do is To gain enough of an understanding of the embodied intelligence of organs and tissues such that we can achieve a radically different regenerative medicine so that we can say Basically and I think about it as um, you know in terms of like okay, can you what's the what's the? What's the goal kind of and end and end game for this whole thing to me the end game is something that you would call an anatomical compiler So the idea is you would sit down in front of the computer and you would draw the The body or the organ that you wanted not not molecular details, but like you this is what I want I want a six-legged you know frog with a propeller on top or I want I want a heart that looks like this or I want a leg That looks like this and what it would do if we knew what we were doing is Put out a convert that anatomical description into a set of stimuli that would have to be given to cells to convince them to build exactly that thing Right, I probably won't live to see it, but I think it's achievable and I think what that if If we can have that then that is basically the solution to all of medicine except for infectious disease so birth defects Right traumatic injury cancer aging degenerative disease if we knew how to tell cells what to build all of those things go away so those things go away and the positive feedback spiral of economic costs where all of the advances are Increasingly more heroic and expensive interventions of a synchial ship when you're like 90 and then and so on right all of that goes away Because basically instead of trying to fix you up as you as you degrade you you Progressively regenerate you know you apply the regenerative medicine early before things degrade So I think that that'll have massive economic impacts over what we're trying to do now Which is not at all sustainable and and that that's what I hope I hope that I hope that we get it so so to me Yes, the xenobots will be doing useful things Cleaning up the environment cleaning out, you know, you're you know, you're joints and all that kind of stuff but more important than that I think we can use these synthetic systems to try to understand to develop a science of detecting and Manipulating the goals of collective intelligence is of cells specifically for regenerative medicine and then sort of beyond that If we sort of think further beyond that what I hope is that kind of like what you said all of this drives a reconsideration of how we formulate Ethical norms because this old school so so so in the olden days what you could do is I just see you we were confronted with something you used so you could tap on it right and if you heard a metallic clanging sound You'd said ah fine, right? So you could conclude it was made in a factory. I can take it apart I can do whatever right if you did that and you got in you sort of a squishy kind of warm Sensation you'd say I need to be you know more or less nice to it and whatever that's not gonna be feasible It was never really feasible, but it was good enough because we didn't have any we we didn't know any better that needs to go and I think that By by breaking down those artificial barriers Someday we can try to build a a system of of ethical norms that does not rely on these completely contingent Facts of of our earthly history, but on something much much deeper that you know really takes takes agency and and the capacity to suffer and all that takes that seriously","detected_language":"english"}

Performance Metrics

18.13s Prediction Time
18.15s Total Time
All Input Parameters
{
  "model": "base",
  "audio_url": "https://replicate.delivery/pbxt/IZjTvet2ZGiyiYaMEEPrzn0xY1UDNsh0NfcO9qeTlpwCo7ig/lex-levin-4min.mp3",
  "temperature": 0,
  "suppress_tokens": "-1",
  "word_timestamps": false,
  "logprob_threshold": -1,
  "append_punctuations": "\"'.。,,!!??::”)]}、",
  "no_speech_threshold": 0.6,
  "prepend_punctuations": "\"'“¿([{-",
  "condition_on_previous_text": true,
  "compression_ratio_threshold": 2.4,
  "temperature_increment_on_fallback": 0.2
}
Input Parameters
audio Type: string
Audio file
model Default: base
Choose a Whisper model.
language
language spoken in the audio, specify None to perform language detection
patience Type: number
optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
audio_url Type: string
Audio URL
temperature Type: numberDefault: 0
temperature to use for sampling
initial_prompt Type: string
optional text to provide as a prompt for the first window.
suppress_tokens Type: stringDefault: -1
comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
word_timestamps Type: booleanDefault: false
Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.
logprob_threshold Type: numberDefault: -1
if the average log probability is lower than this value, treat the decoding as failed
append_punctuations Type: stringDefault: "'.。,,!!??::”)]}、
If word_timestamps is True, merge these punctuation symbols with the previous word
no_speech_threshold Type: numberDefault: 0.6
if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
prepend_punctuations Type: stringDefault: "'“¿([{-
If word_timestamps is True, merge these punctuation symbols with the next word
condition_on_previous_text Type: booleanDefault: true
if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
compression_ratio_threshold Type: numberDefault: 2.4
if the gzip compression ratio is higher than this value, treat the decoding as failed
temperature_increment_on_fallback Type: numberDefault: 0.2
temperature to increase when falling back when the decoding fails to meet either of the thresholds below
Output Schema

Output

Example Execution Logs
Transcribe with base model
Version Details
Version ID
781317565f264090bf5831cceb3ea6b794ed402e746fde1cdec103a8951b52df
Version Created
December 6, 2023
Run on Replicate →