zsxkib/whisper-lazyloading 🖼️❓🔢✓📝 → ❓

▶️ 178 runs 📅 Jun 2024 ⚙️ Cog 0.9.9 🔗 GitHub 📄 Paper ⚖️ License
speech-to-text subtitle-generation

About

Convert speech in audio to text w/ `tiny`, `small`, `base`, and `large-v3` models

Example Output

Output

{"segments":[{"id":0,"end":18.6,"seek":0,"text":" the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet","start":0,"tokens":[50365,264,707,27254,436,980,366,7908,264,2853,390,2159,986,9376,293,13436,292,382,731,31421,520,685,366,3318,337,257,12206,311,3199,257,955,6630,16441,390,322,264,3098,18119,51295],"avg_logprob":-0.060147291276513075,"temperature":0,"no_speech_prob":0.05821266025304794,"compression_ratio":1.412280701754386},{"id":1,"end":31.840000000000003,"seek":1860,"text":" the kite dipped and swayed but stayed aloft the pleasant hours fly by much too soon the room was crowded with a mild wab","start":18.6,"tokens":[50365,264,38867,45162,293,27555,292,457,9181,419,6750,264,16232,2496,3603,538,709,886,2321,264,1808,390,21634,365,257,15154,261,455,51027],"avg_logprob":-0.11862952368600028,"temperature":0,"no_speech_prob":0.00025310463388450444,"compression_ratio":1.696969696969697},{"id":2,"end":45.2,"seek":1860,"text":" the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid","start":31.840000000000003,"tokens":[51027,264,1808,390,21634,365,257,4868,4298,341,2068,3726,4393,10257,428,20631,750,25218,292,562,415,2729,720,257,2418,34850,327,51695],"avg_logprob":-0.11862952368600028,"temperature":0,"no_speech_prob":0.00025310463388450444,"compression_ratio":1.696969696969697},{"id":3,"end":48.6,"seek":1860,"text":" the beetle droned in the hot june sun","start":45.2,"tokens":[51695,264,49735,1224,19009,294,264,2368,361,2613,3295,51865],"avg_logprob":-0.11862952368600028,"temperature":0,"no_speech_prob":0.00025310463388450444,"compression_ratio":1.696969696969697},{"id":4,"end":52.38,"seek":4860,"text":" the beetle droned in the hot june sun","start":48.6,"tokens":[50365,264,49735,1224,19009,294,264,2368,361,2613,3295,50554],"avg_logprob":-0.3010915426107553,"temperature":0.4,"no_speech_prob":0.2937493324279785,"compression_ratio":0.8409090909090909}],"translation":null,"transcription":" the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet the kite dipped and swayed but stayed aloft the pleasant hours fly by much too soon the room was crowded with a mild wab the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid the beetle droned in the hot june sun the beetle droned in the hot june sun","detected_language":"english"}

Performance Metrics

9.78s Prediction Time
203.20s Total Time
All Input Parameters
{
  "audio": "https://replicate.delivery/mgxm/e5159b1b-508a-4be4-b892-e1eb47850bdc/OSR_uk_000_0050_8k.wav",
  "model": "large-v3",
  "language": "auto",
  "translate": false,
  "temperature": 0,
  "transcription": "plain text",
  "suppress_tokens": "-1",
  "logprob_threshold": -1,
  "no_speech_threshold": 0.6,
  "condition_on_previous_text": true,
  "compression_ratio_threshold": 2.4,
  "temperature_increment_on_fallback": 0.2
}
Input Parameters
audio (required) Type: string
Audio file
model Default: large-v3
Whisper model size (currently only large-v3 is supported).
language Default: auto
Language spoken in the audio, specify 'auto' for automatic language detection
patience Type: number
optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
translate Type: booleanDefault: false
Translate the text to English when set to True
temperature Type: numberDefault: 0
temperature to use for sampling
transcription Default: plain text
Choose the format for the transcription
initial_prompt Type: string
optional text to provide as a prompt for the first window.
suppress_tokens Type: stringDefault: -1
comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
logprob_threshold Type: numberDefault: -1
if the average log probability is lower than this value, treat the decoding as failed
no_speech_threshold Type: numberDefault: 0.6
if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
condition_on_previous_text Type: booleanDefault: true
if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
compression_ratio_threshold Type: numberDefault: 2.4
if the gzip compression ratio is higher than this value, treat the decoding as failed
temperature_increment_on_fallback Type: numberDefault: 0.2
temperature to increase when falling back when the decoding fails to meet either of the thresholds below
Output Schema
Example Execution Logs
Transcribe with large-v3 model.
Detected language: English
  0%|          | 0/5241 [00:00<?, ?frames/s]
 35%|███▌      | 1860/5241 [00:02<00:04, 706.14frames/s]
 93%|█████████▎| 4860/5241 [00:06<00:00, 755.37frames/s]
100%|██████████| 5241/5241 [00:08<00:00, 554.70frames/s]
100%|██████████| 5241/5241 [00:08<00:00, 608.37frames/s]
Version Details
Version ID
909df2f50ba92488979e2c3dea577937b7e991bd815395d3dfbe3bcbf5038276
Version Created
July 1, 2024
Run on Replicate →