nicknaskida/incredibly-fast-whisper ❓🖼️📝🔢✓ → ❓

▶️ 438 runs 📅 Sep 2024 ⚙️ Cog 0.9.20 🔗 GitHub ⚖️ License

speaker-diarization speech-to-text

Performance

4.5sTypical run time

~88sCold start (first call)

438Total runs

About

whisper-large-v3, incredibly fast, with speaker diarization, powered by Hugging Face Transformers! 🤗

Example Output

Output

{"text":" the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit hours fly by much too soon. The room was crowded with a mild wab. The room was crowded with a wild mob. This strong arm shall shield your honour. She blushed when he gave her a white orchid The beetle droned in the hot June sun","chunks":[{"text":" the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit hours fly by much too soon. The room was crowded","timestamp":[0,29.72]},{"text":" with a mild wab. The room was crowded with a wild mob. This strong arm shall shield your","timestamp":[29.72,38.98]},{"text":" honour. She blushed when he gave her a white orchid The beetle droned in the hot June sun","timestamp":[38.98,48.52]}]}

Performance Metrics

4.54s Prediction Time

88.20s Total Time

All Input Parameters

{
  "task": "transcribe",
  "audio": "https://replicate.delivery/pbxt/Js2Fgx9MSOCzdTnzHQLJXj7abLp3JLIG3iqdsYXV24tHIdk8/OSR_uk_000_0050_8k.wav",
  "language": "None",
  "timestamp": "chunk",
  "batch_size": 24,
  "diarise_audio": false
}

Input Parameters

task Default: transcribe: Task to perform: transcribe or translate to another language.
audio (required) Type: string: Audio file
hf_token Type: string: Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.
language Default: None: Language spoken in the audio, specify 'None' to perform language detection.
timestamp Default: chunk: Whisper supports both chunked as well as word level timestamps.
batch_size Type: integerDefault: 24: Number of parallel batches you want to compute. Reduce if you face OOMs.
max_speakers Type: integerRange: 1 - ∞: Maximum number of speakers system should consider in audio file. Must be at least 1. Cannot be used together with num_speakers and be less than min_speakers. (default: None)
min_speakers Type: integerRange: 1 - ∞: Minimum number of speakers system should consider in audio file. Must be at least 1. Cannot be used together with num_speakers and be greater than max_speakers. (default: None)
num_speakers Type: integerRange: 1 - ∞: Exact number of speakers present in the audio file. Useful when the exact number of participants in the conversation is known. Must be at least 1. Cannot be used together with min_speakers or max_speakers. (default: None)
diarise_audio Type: booleanDefault: false: Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too.

Output Schema

Output

Example Execution Logs

Voila!✨ Your file has been transcribed!

Version Details

Version ID: 968947af412ab5fc4574dde1bcaf09ae6b2c925ca8817c431f8e73ae61883c67
Version Created: September 8, 2024

Run on Replicate →