ibm-granite/granite-3.0-2b-instruct 🔢📝 → 📝

⭐ Official ▶️ 420.3K runs 📅 Oct 2024 ⚙️ Cog 0.10.0-alpha21 🔗 GitHub 📄 Paper ⚖️ License

code-generation question-answering text-generation text-translation

About

Granite-3.0-2B-Instruct is a lightweight and open-source 2B parameter model designed to excel in instruction following tasks such as summarization, problem-solving, text translation, reasoning, code tasks, function-calling, and more.

Example Output

Prompt:

"Could you please explain what APR means?"

Output

APR stands for Annual Percentage Rate. It is a measure of the cost of credit, expressed as a yearly rate. It includes not only the interest rate but also any fees or additional costs associated with the loan.

Performance Metrics

0.60s Prediction Time

0.69s Total Time

All Input Parameters

{
  "top_p": 0.9,
  "prompt": "Could you please explain what APR means?",
  "max_tokens": 512,
  "min_tokens": 0,
  "temperature": 0.6,
  "system_prompt": "You are a helpful assistant.",
  "presence_penalty": 0,
  "frequency_penalty": 0
}

Input Parameters

top_k Type: integerDefault: 50: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.9: A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault:: Prompt
max_tokens Type: integerDefault: 512: The maximum number of tokens the model should generate as output.
min_tokens Type: integerDefault: 0: The minimum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.6: The value used to modulate the next token probabilities.
system_prompt Type: stringDefault: You are a helpful assistant.: System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
stop_sequences Type: string: A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
presence_penalty Type: numberDefault: 0: Presence penalty
frequency_penalty Type: numberDefault: 0: Frequency penalty

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

INFO:     ::1:60604 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 10-21 03:59:51 metrics.py:351] Avg prompt throughput: 3.3 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
Generation took 0.54sFormatted prompt: Could you please explain what APR means?

Version Details

Version ID: 97679a145f9b2e7370ac4da5012ab06efcba0f30e7a9efadd846fcce51f99d8e
Version Created: October 15, 2024

Run on Replicate →