daanelson/whisper-train-preprocessor 🖼️ → 🖼️
About
Dataset Preprocessing code for Whisper Fine-Tuning
Example Output
Output
Performance Metrics
3.68s
Prediction Time
178.62s
Total Time
Input Parameters
- jsonl_data
- jsonl file with list of {'audio':<audio_url>', 'sentence':<transcription>})
- text_files
- tarball with list of transcriptions
- audio_files
- tarball with list of audio files
Output Schema
Output
Example Execution Logs
Casting the dataset: 0%| | 0/10 [00:00<?, ? examples/s] Saving the dataset (0/1 shards): 0%| | 0/10 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%|██████████| 10/10 [00:00<00:00, 1396.61 examples/s] Dataset built
Version Details
- Version ID
ffc5742275c72528330809f1572e7cc0ed1f39325dc7c8ed1ff66480a1314473- Version Created
- July 11, 2023