ibm-granite/granite-speech-3.3-8b 🔢🖼️📝 → 📝

⭐ Official ▶️ 12.9K runs 📅 Jul 2025 ⚙️ Cog 0.16.1 🔗 GitHub 📄 Paper ⚖️ License
speech-to-text speech-translation text-generation

About

Granite-speech-3.3-8b is a compact and efficient speech-language model, specifically designed for automatic speech recognition (ASR) and automatic speech translation (AST).

Example Output

Prompt:

"Transcribe the speech into written form."

Output

after his nap timothy lazily stretched first one grey velvet foot then another strolled indolently to his plate turning over the food carefully selecting choice bits nosing out that which he scorned upon the clean hearth

Performance Metrics

9.54s Prediction Time
385.33s Total Time
All Input Parameters
{
  "audio": [
    "https://replicate.delivery/pbxt/NMdAjCoC0WiNKkHIIbSsmssPEXujCRSDIjg9LlJYkt5BGs8d/10226_10111_000000.wav"
  ],
  "top_k": 50,
  "top_p": 0.9,
  "prompt": "Transcribe the speech into written form.",
  "max_tokens": 512,
  "min_tokens": 0,
  "temperature": 0.6,
  "presence_penalty": 0,
  "frequency_penalty": 0
}
Input Parameters
seed Type: integer
Random seed. Leave blank to randomize the seed.
audio Type: array
Audio inputs for the model.
top_k Type: integerDefault: 50
The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.9
A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault:
User prompt to send to the model.
max_tokens Type: integerDefault: 512
The maximum number of tokens the model should generate as output.
min_tokens Type: integerDefault: 0
The minimum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.6
The value used to modulate the next token probabilities.
chat_template Type: string
A template to format the prompt with. If not provided, the default prompt template will be used.
system_prompt Type: string
System prompt to send to the model.The chat template provides a good default.
stop_sequences Type: string
A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
presence_penalty Type: numberDefault: 0
Presence penalty
frequency_penalty Type: numberDefault: 0
Frequency penalty
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
2025-07-15 20:43:02 [info     ] predict() commencing           request_id=1 user_prompt=Transcribe the speech into written form.
2025-07-15 20:43:02 [debug    ] Formatted prompt using chat template formatted_prompt=<|start_of_role|>system<|end_of_role|> Knowledge Cutoff Date: April 2024.
 Today's Date: July 15, 2025. You are Granite, developed by IBM. You are a helpful AI assistant.<|end_of_text|>
<|start_of_role|>user<|end_of_role|><|audio|>Transcribe the speech into written form.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|> request_id=1 user_prompt=Transcribe the speech into written form.
2025-07-15 20:43:02 [debug    ] SamplingParams                 request_id=1 sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.9, top_k=50, min_p=0.0, seed=None, stop=[], stop_token_ids=[0], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None) user_prompt=Transcribe the speech into written form.
2025-07-15 20:43:10 [debug    ] LoRARequest                    lora_request=LoRARequest(lora_name='speech', lora_int_id=1, lora_path='/src/weights', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None) request_id=1 user_prompt=Transcribe the speech into written form.
INFO 07-15 20:43:10 [async_llm.py:270] Added request 1.
2025-07-15 20:43:11 [debug    ] result                         finish_reason=stop request_id=1 text=after his nap timothy lazily stretched first one grey velvet foot then another strolled indolently to his plate turning over the food carefully selecting choice bits nosing out that which he scorned upon the clean hearth user_prompt=Transcribe the speech into written form.
2025-07-15 20:43:11 [info     ] Generation took 9.37s          request_id=1 user_prompt=Transcribe the speech into written form.
/root/.pyenv/versions/3.12.11/lib/python3.12/site-packages/cog/server/scope.py:22: ExperimentalFeatureWarning: current_scope is an experimental internal function. It may change or be removed without warning.
  warnings.warn(
2025-07-15 20:43:11 [info     ] predict() complete             request_id=1 user_prompt=Transcribe the speech into written form.
Version Details
Version ID
688e7a943167401c310f0975cb68f1a35e0bddc3b65f60bde89c37860e07edf1
Version Created
July 31, 2025
Run on Replicate →