nicknaskida/incredibly-fast-whisper ❓🖼️📝🔢✓ → ❓

▶️ 329 runs 📅 Sep 2024 ⚙️ Cog 0.9.20 🔗 GitHub ⚖️ License
speaker-diarization speech-to-text

About

whisper-large-v3, incredibly fast, with speaker diarization, powered by Hugging Face Transformers! 🤗

Example Output

Output

{"text":" the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit hours fly by much too soon. The room was crowded with a mild wab. The room was crowded with a wild mob. This strong arm shall shield your honour. She blushed when he gave her a white orchid The beetle droned in the hot June sun","chunks":[{"text":" the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit hours fly by much too soon. The room was crowded","timestamp":[0,29.72]},{"text":" with a mild wab. The room was crowded with a wild mob. This strong arm shall shield your","timestamp":[29.72,38.98]},{"text":" honour. She blushed when he gave her a white orchid The beetle droned in the hot June sun","timestamp":[38.98,48.52]}]}

Performance Metrics

4.54s Prediction Time
88.20s Total Time
All Input Parameters
{
  "task": "transcribe",
  "audio": "https://replicate.delivery/pbxt/Js2Fgx9MSOCzdTnzHQLJXj7abLp3JLIG3iqdsYXV24tHIdk8/OSR_uk_000_0050_8k.wav",
  "language": "None",
  "timestamp": "chunk",
  "batch_size": 24,
  "diarise_audio": false
}
Input Parameters
task Default: transcribe
Task to perform: transcribe or translate to another language.
audio (required) Type: string
Audio file
hf_token Type: string
Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.
language Default: None
Language spoken in the audio, specify 'None' to perform language detection.
timestamp Default: chunk
Whisper supports both chunked as well as word level timestamps.
batch_size Type: integerDefault: 24
Number of parallel batches you want to compute. Reduce if you face OOMs.
max_speakers Type: integerRange: 1 - ∞
Maximum number of speakers system should consider in audio file. Must be at least 1. Cannot be used together with num_speakers and be less than min_speakers. (default: None)
min_speakers Type: integerRange: 1 - ∞
Minimum number of speakers system should consider in audio file. Must be at least 1. Cannot be used together with num_speakers and be greater than max_speakers. (default: None)
num_speakers Type: integerRange: 1 - ∞
Exact number of speakers present in the audio file. Useful when the exact number of participants in the conversation is known. Must be at least 1. Cannot be used together with min_speakers or max_speakers. (default: None)
diarise_audio Type: booleanDefault: false
Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too.
Output Schema

Output

Example Execution Logs
Voila!✨ Your file has been transcribed!
Version Details
Version ID
968947af412ab5fc4574dde1bcaf09ae6b2c925ca8817c431f8e73ae61883c67
Version Created
September 8, 2024
Run on Replicate →