meronym/speaker-transcription 🖼️📝 → 🖼️

▶️ 28.5K runs 📅 Apr 2023 ⚙️ Cog 0.6.1 🔗 GitHub ⚖️ License

audio-embedding speaker-diarization speech-to-text

Performance

65.3sTypical run time

28.5KTotal runs

About

Whisper transcription plus speaker diarization

Example Output

Output

Performance Metrics

65.26s Prediction Time

65.57s Total Time

Input Parameters

audio (required) Type: string: Audio file
prompt Type: string: Optional text to provide as a prompt for each Whisper model call.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

pre-processing audio file...
diarizing audio file...
post-processing diarization...
transcribing segments...
transcribing segment 0:00:00.497812 to 0:00:09.779063
transcribing segment 0:00:09.863438 to 0:03:34.962188

Version Details

Version ID: 9950ee297f0fdad8736adf74ada54f63cc5b5bdfd5b2187366910ed5baf1a7a1
Version Created: April 27, 2023

Run on Replicate →