nicknaskida/incredibly-fast-whisper ❓🖼️📝🔢✓ → ❓
About
whisper-large-v3, incredibly fast, with speaker diarization, powered by Hugging Face Transformers! 🤗
Example Output
Output
{"text":" the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit hours fly by much too soon. The room was crowded with a mild wab. The room was crowded with a wild mob. This strong arm shall shield your honour. She blushed when he gave her a white orchid The beetle droned in the hot June sun","chunks":[{"text":" the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit hours fly by much too soon. The room was crowded","timestamp":[0,29.72]},{"text":" with a mild wab. The room was crowded with a wild mob. This strong arm shall shield your","timestamp":[29.72,38.98]},{"text":" honour. She blushed when he gave her a white orchid The beetle droned in the hot June sun","timestamp":[38.98,48.52]}]}
Performance Metrics
4.54s
Prediction Time
88.20s
Total Time
All Input Parameters
{
"task": "transcribe",
"audio": "https://replicate.delivery/pbxt/Js2Fgx9MSOCzdTnzHQLJXj7abLp3JLIG3iqdsYXV24tHIdk8/OSR_uk_000_0050_8k.wav",
"language": "None",
"timestamp": "chunk",
"batch_size": 24,
"diarise_audio": false
}
Input Parameters
- task
- Task to perform: transcribe or translate to another language.
- audio (required)
- Audio file
- hf_token
- Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.
- language
- Language spoken in the audio, specify 'None' to perform language detection.
- timestamp
- Whisper supports both chunked as well as word level timestamps.
- batch_size
- Number of parallel batches you want to compute. Reduce if you face OOMs.
- max_speakers
- Maximum number of speakers system should consider in audio file. Must be at least 1. Cannot be used together with num_speakers and be less than min_speakers. (default: None)
- min_speakers
- Minimum number of speakers system should consider in audio file. Must be at least 1. Cannot be used together with num_speakers and be greater than max_speakers. (default: None)
- num_speakers
- Exact number of speakers present in the audio file. Useful when the exact number of participants in the conversation is known. Must be at least 1. Cannot be used together with min_speakers or max_speakers. (default: None)
- diarise_audio
- Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too.
Output Schema
Output
Example Execution Logs
Voila!✨ Your file has been transcribed!
Version Details
- Version ID
968947af412ab5fc4574dde1bcaf09ae6b2c925ca8817c431f8e73ae61883c67- Version Created
- September 8, 2024