konieshadow/speaker-diarization 🖼️🔢 → 🖼️

▶️ 7.6K runs 📅 Jun 2025 ⚙️ Cog 0.15.2 🔗 GitHub 📄 Paper ⚖️ License

audio-analysis audio-segmentation speaker-diarization

Performance

20.3sTypical run time

~184sCold start (first call)

7.6KTotal runs

About

Speaker Diarization with "pyannote/speaker-diarization-3.1"

Example Output

Output

Performance Metrics

20.35s Prediction Time

183.62s Total Time

Input Parameters

audio Type: stringDefault: https://r2.getcastify.com/lex_ai_john_carmack_1.wav: Audio file
max_speakers Type: integer: Maximum number of speakers
min_speakers Type: integer: Minimum number of speakers
num_speakers Type: integer: Number of speakers (if known)

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Preprocessing audio file: /tmp/tmps19xjtlulex_ai_john_carmack_1.wav
pre-processing audio file...
Running speaker diarization...
/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/pyannote/audio/utils/reproducibility.py:74: ReproducibilityWarning: TensorFloat-32 (TF32) has been disabled as it might lead to reproducibility issues and lower accuracy.
It can be re-enabled by calling
>>> import torch
>>> torch.backends.cuda.matmul.allow_tf32 = True
>>> torch.backends.cudnn.allow_tf32 = True
See https://github.com/pyannote/pyannote-audio/issues/1370 for more details.
warnings.warn(
/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at ../aten/src/ATen/native/ReduceOps.cpp:1823.)
std = sequences.std(dim=-1, correction=1)
Post-processing diarization results...

Version Details

Version ID: c58b6b038f6de30f93eaccd6aecb59d1b9a48ac13b22be000bcffe853efb2c20
Version Created: June 3, 2025

Run on Replicate →