ibm-granite/granite-3.2-8b-instruct 🔢📝 → 📝

⭐ Official ▶️ 450.7K runs 📅 Feb 2025 ⚙️ Cog 0.10.0-alpha27 ⚖️ License

code-generation long-context question-answering text-generation

About

Granite-3.2-8B-Instruct is a 8-billion parameter 128K context length language model fine-tuned for reasoning and instruction-following capabilities.

Example Output

Prompt:

"How is perplexity measured for LLMs and why is it useful?"

Output

Perplexity is a common metric used to evaluate the performance of language models, including large language models (LLMs). It's a measure of how well a model predicts a sample.

Perplexity is calculated based on the concept of cross-entropy. In simpler terms, it's a way to measure how surprised the model is by the test data. The lower the perplexity, the less surprised the model is, and the better it predicts the data.

Here's a simple breakdown of how it's calculated:

The model is given a test set of sentences.
For each word in the sentence, the model calculates the probability of that word given all the previous words.
The perplexity is then the inverse probability of the entire sentence, averaged over the whole test set.

The formula for perplexity (PP) is:

PP(W) = exp(-1/N * Σ log P(wi|w1...wi-1))

Where:

W is the test set of sentences,
N is the number of words in the test set,
wi is each word in the test set,
P(wi|w1...wi-1) is the probability assigned by the model to word wi given the previous words.

Perplexity is useful for several reasons:

Model Comparison: It provides a standard way to compare different models. A lower perplexity score generally indicates a better model.
Model Improvement: It helps in identifying areas where the model is struggling. If a certain type of sentence or vocabulary consistently results in high perplexity, it indicates a weakness in the model that can be addressed through further training or adjustments.
Understanding Model Behavior: It gives insights into how the model understands language. A lower perplexity suggests the model has a better grasp of the language's structure and usage.
Evaluation of Unseen Data: While it's trained on a specific corpus, perplexity can be calculated on unseen data to evaluate the model's generalization capability.

However, it's important to note that while perplexity is a widely used metric, it's not without its limitations. For instance, it doesn't directly correlate with human judgment of fluency or coherence, and

Performance Metrics

3.66s Prediction Time

3.66s Total Time

All Input Parameters

{
  "top_p": 0.9,
  "prompt": "How is perplexity measured for LLMs and why is it useful?",
  "max_tokens": 512,
  "min_tokens": 0,
  "temperature": 0.6,
  "system_prompt": "You are a helpful assistant.",
  "presence_penalty": 0,
  "frequency_penalty": 0
}

Input Parameters

top_k Type: integerDefault: 50: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.9: A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault:: Prompt
max_tokens Type: integerDefault: 512: The maximum number of tokens the model should generate as output.
min_tokens Type: integerDefault: 0: The minimum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.6: The value used to modulate the next token probabilities.
system_prompt Type: stringDefault: You are a helpful assistant.: System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
stop_sequences Type: string: A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
presence_penalty Type: numberDefault: 0: Presence penalty
frequency_penalty Type: numberDefault: 0: Frequency penalty

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

INFO:     127.0.0.1:39820 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 03-24 13:57:24 metrics.py:455] Avg prompt throughput: 6.8 tokens/s, Avg generation throughput: 4.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
Generation took 3.65sFormatted prompt: How is perplexity measured for LLMs and why is it useful?

Version Details

Version ID: f60c214a9afb10e68da3675b87ffe298afb72ef3eb232e3327d886e6d4a4eaf4
Version Created: March 5, 2025

Run on Replicate →