konieshadow/speaker-diarization 🖼️🔢 → 🖼️

▶️ 31 runs 📅 Jun 2025 ⚙️ Cog 0.15.2 🔗 GitHub 📄 Paper ⚖️ License
audio-analysis speaker-diarization

About

Speaker Diarization with "pyannote/speaker-diarization-3.1"

Example Output

Output

Example output

Performance Metrics

20.35s Prediction Time
183.62s Total Time
Input Parameters
audio Type: stringDefault: https://r2.getcastify.com/lex_ai_john_carmack_1.wav
Audio file
max_speakers Type: integer
Maximum number of speakers
min_speakers Type: integer
Minimum number of speakers
num_speakers Type: integer
Number of speakers (if known)
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Preprocessing audio file: /tmp/tmps19xjtlulex_ai_john_carmack_1.wav
pre-processing audio file...
Running speaker diarization...
/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/pyannote/audio/utils/reproducibility.py:74: ReproducibilityWarning: TensorFloat-32 (TF32) has been disabled as it might lead to reproducibility issues and lower accuracy.
It can be re-enabled by calling
>>> import torch
>>> torch.backends.cuda.matmul.allow_tf32 = True
>>> torch.backends.cudnn.allow_tf32 = True
See https://github.com/pyannote/pyannote-audio/issues/1370 for more details.
warnings.warn(
/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at ../aten/src/ATen/native/ReduceOps.cpp:1823.)
std = sequences.std(dim=-1, correction=1)
Post-processing diarization results...
Version Details
Version ID
c58b6b038f6de30f93eaccd6aecb59d1b9a48ac13b22be000bcffe853efb2c20
Version Created
June 3, 2025
Run on Replicate →