hovevideo/stable-whisper 📝❓ → 🖼️

▶️ 10.5K runs 📅 May 2024 ⚙️ Cog 0.9.7 🔗 GitHub

speech-to-text video-auto-captioning video-to-text

Performance

8.9sTypical run time

~110sCold start (first call)

10.5KTotal runs

About

Transcribe audios using OpenAI's Whisper with stabilizing timestamps by stable-ts python package.

Example Output

Output

Performance Metrics

8.87s Prediction Time

110.33s Total Time

Input Parameters

url (required) Type: string: Audio or video URL
output_format Default: json: Output format: ass (ASS subtitles) or json (transcription in JSON format).

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Transcribe:   0%|          | 0/52.42 [00:00<?, ?sec/s]
Detected language: english
Transcribe:   0%|          | 0/52.42 [00:01<?, ?sec/s]
Transcribe:  55%|█████▌    | 28.88/52.42 [00:06<00:05,  4.61sec/s]
Transcribe:  89%|████████▉ | 46.84/52.42 [00:07<00:00,  6.95sec/s]
Transcribe: 100%|█████████▉| 52.41/52.42 [00:08<00:00,  6.87sec/s]
Transcribe: 100%|█████████▉| 52.41/52.42 [00:08<00:00,  6.37sec/s]
Saved: /src/input.ass
Saved: /src/input.json

Version Details

Version ID: a1697797eeccbcfc1955282a28b9aa1120335841bc8717911da8cd1e07ffefab
Version Created: May 9, 2024

Run on Replicate →