ibm-granite/granite-3.2-8b-instruct 🔢📝 → 📝
About
Granite-3.2-8B-Instruct is a 8-billion parameter 128K context length language model fine-tuned for reasoning and instruction-following capabilities.

Example Output
"How is perplexity measured for LLMs and why is it useful?"
Output
Perplexity is a common metric used to evaluate the performance of language models, including large language models (LLMs). It's a measure of how well a model predicts a sample.
Perplexity is calculated based on the concept of cross-entropy. In simpler terms, it's a way to measure how surprised the model is by the test data. The lower the perplexity, the less surprised the model is, and the better it predicts the data.
Here's a simple breakdown of how it's calculated:
- The model is given a test set of sentences.
- For each word in the sentence, the model calculates the probability of that word given all the previous words.
- The perplexity is then the inverse probability of the entire sentence, averaged over the whole test set.
The formula for perplexity (PP) is:
PP(W) = exp(-1/N * Σ log P(wi|w1...wi-1))
Where:
- W is the test set of sentences,
- N is the number of words in the test set,
- wi is each word in the test set,
- P(wi|w1...wi-1) is the probability assigned by the model to word wi given the previous words.
Perplexity is useful for several reasons:
Model Comparison: It provides a standard way to compare different models. A lower perplexity score generally indicates a better model.
Model Improvement: It helps in identifying areas where the model is struggling. If a certain type of sentence or vocabulary consistently results in high perplexity, it indicates a weakness in the model that can be addressed through further training or adjustments.
Understanding Model Behavior: It gives insights into how the model understands language. A lower perplexity suggests the model has a better grasp of the language's structure and usage.
Evaluation of Unseen Data: While it's trained on a specific corpus, perplexity can be calculated on unseen data to evaluate the model's generalization capability.
However, it's important to note that while perplexity is a widely used metric, it's not without its limitations. For instance, it doesn't directly correlate with human judgment of fluency or coherence, and
Performance Metrics
All Input Parameters
{ "top_p": 0.9, "prompt": "How is perplexity measured for LLMs and why is it useful?", "max_tokens": 512, "min_tokens": 0, "temperature": 0.6, "system_prompt": "You are a helpful assistant.", "presence_penalty": 0, "frequency_penalty": 0 }
Input Parameters
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- prompt
- Prompt
- max_tokens
- The maximum number of tokens the model should generate as output.
- min_tokens
- The minimum number of tokens the model should generate as output.
- temperature
- The value used to modulate the next token probabilities.
- system_prompt
- System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
- stop_sequences
- A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
- presence_penalty
- Presence penalty
- frequency_penalty
- Frequency penalty
Output Schema
Output
Example Execution Logs
INFO: 127.0.0.1:39820 - "POST /v1/chat/completions HTTP/1.1" 200 OK INFO 03-24 13:57:24 metrics.py:455] Avg prompt throughput: 6.8 tokens/s, Avg generation throughput: 4.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%. Generation took 3.65sFormatted prompt: How is perplexity measured for LLMs and why is it useful?
Version Details
- Version ID
f60c214a9afb10e68da3675b87ffe298afb72ef3eb232e3327d886e6d4a4eaf4
- Version Created
- March 5, 2025