ibm-granite/granite-speech-3.3-8b 🔢🖼️📝 → 📝
About
Granite-speech-3.3-8b is a compact and efficient speech-language model, specifically designed for automatic speech recognition (ASR) and automatic speech translation (AST).

Example Output
Prompt:
"Transcribe the speech into written form."
Output
after his nap timothy lazily stretched first one grey velvet foot then another strolled indolently to his plate turning over the food carefully selecting choice bits nosing out that which he scorned upon the clean hearth
Performance Metrics
9.54s
Prediction Time
385.33s
Total Time
All Input Parameters
{ "audio": [ "https://replicate.delivery/pbxt/NMdAjCoC0WiNKkHIIbSsmssPEXujCRSDIjg9LlJYkt5BGs8d/10226_10111_000000.wav" ], "top_k": 50, "top_p": 0.9, "prompt": "Transcribe the speech into written form.", "max_tokens": 512, "min_tokens": 0, "temperature": 0.6, "presence_penalty": 0, "frequency_penalty": 0 }
Input Parameters
- seed
- Random seed. Leave blank to randomize the seed.
- audio
- Audio inputs for the model.
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- prompt
- User prompt to send to the model.
- max_tokens
- The maximum number of tokens the model should generate as output.
- min_tokens
- The minimum number of tokens the model should generate as output.
- temperature
- The value used to modulate the next token probabilities.
- chat_template
- A template to format the prompt with. If not provided, the default prompt template will be used.
- system_prompt
- System prompt to send to the model.The chat template provides a good default.
- stop_sequences
- A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
- presence_penalty
- Presence penalty
- frequency_penalty
- Frequency penalty
Output Schema
Output
Example Execution Logs
2025-07-15 20:43:02 [info ] predict() commencing request_id=1 user_prompt=Transcribe the speech into written form. 2025-07-15 20:43:02 [debug ] Formatted prompt using chat template formatted_prompt=<|start_of_role|>system<|end_of_role|> Knowledge Cutoff Date: April 2024. Today's Date: July 15, 2025. You are Granite, developed by IBM. You are a helpful AI assistant.<|end_of_text|> <|start_of_role|>user<|end_of_role|><|audio|>Transcribe the speech into written form.<|end_of_text|> <|start_of_role|>assistant<|end_of_role|> request_id=1 user_prompt=Transcribe the speech into written form. 2025-07-15 20:43:02 [debug ] SamplingParams request_id=1 sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.9, top_k=50, min_p=0.0, seed=None, stop=[], stop_token_ids=[0], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None) user_prompt=Transcribe the speech into written form. 2025-07-15 20:43:10 [debug ] LoRARequest lora_request=LoRARequest(lora_name='speech', lora_int_id=1, lora_path='/src/weights', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None) request_id=1 user_prompt=Transcribe the speech into written form. INFO 07-15 20:43:10 [async_llm.py:270] Added request 1. 2025-07-15 20:43:11 [debug ] result finish_reason=stop request_id=1 text=after his nap timothy lazily stretched first one grey velvet foot then another strolled indolently to his plate turning over the food carefully selecting choice bits nosing out that which he scorned upon the clean hearth user_prompt=Transcribe the speech into written form. 2025-07-15 20:43:11 [info ] Generation took 9.37s request_id=1 user_prompt=Transcribe the speech into written form. /root/.pyenv/versions/3.12.11/lib/python3.12/site-packages/cog/server/scope.py:22: ExperimentalFeatureWarning: current_scope is an experimental internal function. It may change or be removed without warning. warnings.warn( 2025-07-15 20:43:11 [info ] predict() complete request_id=1 user_prompt=Transcribe the speech into written form.
Version Details
- Version ID
688e7a943167401c310f0975cb68f1a35e0bddc3b65f60bde89c37860e07edf1
- Version Created
- July 31, 2025