erium/whisperx 🖼️✓❓🔢 → 📝
About
Automatic Speech Recognition with Word-level Timestamps & Diarization

Example Output
Output
[{"text": " Ihr h\u00f6rt die IRIUM Podcast, der Data Science und Machine Learning Podcast f\u00fcr Young Professionals und Studienabsolventen, die wirklich wissen wollen, was in der Arbeitswelt abgeht.", "start": 0.009, "end": 10.742, "speaker": "SPEAKER_00"}]
Performance Metrics
17.52s
Prediction Time
282.86s
Total Time
All Input Parameters
{ "audio": "https://replicate.delivery/pbxt/K3BGhaLBJ3nhPfDteXbTA8xIuQvC5dR3wViyiX0OKuzrVJ6f/erium.wav", "debug": false, "diarize": true, "language": "de", "batch_size": 32 }
Input Parameters
- audio (required)
- Audio file
- debug
- Print out memory usage information.
- diarize
- Use this to identify speakers
- language
- The audio file's language.
- batch_size
- The number of batches that are run in parallel.
- max_speakers
- Maximum number of speakers in case of diarization.
- min_speakers
- Minimum number of speakers in case of diarization.
Output Schema
Output
Example Execution Logs
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint torch_models/whisperx-vad-segmentation.bin` Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x. pytorch_model.bin: 0%| | 0.00/26.6M [00:00<?, ?B/s] pytorch_model.bin: 39%|███▉ | 10.5M/26.6M [00:00<00:00, 60.1MB/s] pytorch_model.bin: 100%|██████████| 26.6M/26.6M [00:00<00:00, 93.8MB/s] pytorch_model.bin: 100%|██████████| 26.6M/26.6M [00:00<00:00, 85.6MB/s] config.yaml: 0%| | 0.00/221 [00:00<?, ?B/s] config.yaml: 100%|██████████| 221/221 [00:00<00:00, 893kB/s]
Version Details
- Version ID
fd124db1a0a853845690a4f34fa1a3bf79230d3dedc6d8c6a4630dd80f88d1b4
- Version Created
- January 12, 2024