jacksoby/whisperx 🖼️ → ❓

▶️ 605 runs 📅 Jul 2025 ⚙️ Cog 0.15.10
speech-to-text

About

Audio to text transcriptions

Example Output

Output

{"segments":[{"end":2.937,"text":" This is the final build, screenshot.","start":0.031,"words":[{"end":0.844,"word":"This","score":0.944,"start":0.031},{"end":0.986,"word":"is","score":0.692,"start":0.925},{"end":1.108,"word":"the","score":0.994,"start":1.047},{"end":1.413,"word":"final","score":0.901,"start":1.149},{"end":1.718,"word":"build,","score":0.914,"start":1.454},{"end":2.937,"word":"screenshot.","score":0.767,"start":2.084}]}],"word_segments":[{"end":0.844,"word":"This","score":0.944,"start":0.031},{"end":0.986,"word":"is","score":0.692,"start":0.925},{"end":1.108,"word":"the","score":0.994,"start":1.047},{"end":1.413,"word":"final","score":0.901,"start":1.149},{"end":1.718,"word":"build,","score":0.914,"start":1.454},{"end":2.937,"word":"screenshot.","score":0.767,"start":2.084}]}

Performance Metrics

3.58s Prediction Time
76.57s Total Time
Input Parameters
audio_file (required) Type: string
Audio file
Output Schema
segments Type: arrayItems: object
Segments
word_segments Type: array
Word Segments
Example Execution Logs
Warning: audio is shorter than 30s, language detection may be inaccurate.
Detected language: en (0.99) in first 30s of audio...
Downloading: "https://download.pytorch.org/torchaudio/models/wav2vec2_fairseq_base_ls960_asr_ls960.pth" to /root/.cache/torch/hub/checkpoints/wav2vec2_fairseq_base_ls960_asr_ls960.pth
  0%|          | 0.00/360M [00:00<?, ?B/s]
  2%|▏         | 5.88M/360M [00:00<00:06, 61.2MB/s]
  5%|▌         | 18.2M/360M [00:00<00:03, 101MB/s] 
  9%|▊         | 31.1M/360M [00:00<00:02, 115MB/s]
 14%|█▍        | 49.6M/360M [00:00<00:02, 146MB/s]
 25%|██▍       | 88.6M/360M [00:00<00:01, 240MB/s]
 38%|███▊      | 135M/360M [00:00<00:00, 324MB/s] 
 54%|█████▎    | 193M/360M [00:00<00:00, 415MB/s]
 68%|██████▊   | 247M/360M [00:00<00:00, 462MB/s]
 81%|████████▏ | 293M/360M [00:00<00:00, 468MB/s]
 94%|█████████▎| 338M/360M [00:01<00:00, 463MB/s]
100%|██████████| 360M/360M [00:01<00:00, 357MB/s]
Version Details
Version ID
b484c1fc8bb7096df7fea8c9628adee66cedc6088d1cbcc56a72674df05c5c24
Version Created
July 12, 2025
Run on Replicate →