erium/whisperx 🖼️✓❓🔢 → 📝

▶️ 5.1K runs 📅 Oct 2023 ⚙️ Cog 0.8.6 🔗 GitHub 📄 Paper ⚖️ License
speaker-diarization speech-to-text

About

Automatic Speech Recognition with Word-level Timestamps & Diarization

Example Output

Output

[{"text": " Ihr h\u00f6rt die IRIUM Podcast, der Data Science und Machine Learning Podcast f\u00fcr Young Professionals und Studienabsolventen, die wirklich wissen wollen, was in der Arbeitswelt abgeht.", "start": 0.009, "end": 10.742, "speaker": "SPEAKER_00"}]

Performance Metrics

17.52s Prediction Time
282.86s Total Time
All Input Parameters
{
  "audio": "https://replicate.delivery/pbxt/K3BGhaLBJ3nhPfDteXbTA8xIuQvC5dR3wViyiX0OKuzrVJ6f/erium.wav",
  "debug": false,
  "diarize": true,
  "language": "de",
  "batch_size": 32
}
Input Parameters
audio (required) Type: string
Audio file
debug Type: booleanDefault: false
Print out memory usage information.
diarize Type: booleanDefault: false
Use this to identify speakers
language Default: de
The audio file's language.
batch_size Type: integerDefault: 32Range: 1 - ∞
The number of batches that are run in parallel.
max_speakers Type: integerRange: 1 - ∞
Maximum number of speakers in case of diarization.
min_speakers Type: integerRange: 1 - ∞
Minimum number of speakers in case of diarization.
Output Schema

Output

Type: string

Example Execution Logs
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint torch_models/whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x.
pytorch_model.bin:   0%|          | 0.00/26.6M [00:00<?, ?B/s]
pytorch_model.bin:  39%|███▉      | 10.5M/26.6M [00:00<00:00, 60.1MB/s]
pytorch_model.bin: 100%|██████████| 26.6M/26.6M [00:00<00:00, 93.8MB/s]
pytorch_model.bin: 100%|██████████| 26.6M/26.6M [00:00<00:00, 85.6MB/s]
config.yaml:   0%|          | 0.00/221 [00:00<?, ?B/s]
config.yaml: 100%|██████████| 221/221 [00:00<00:00, 893kB/s]
Version Details
Version ID
fd124db1a0a853845690a4f34fa1a3bf79230d3dedc6d8c6a4630dd80f88d1b4
Version Created
January 12, 2024
Run on Replicate →