microsoft/phi-3-medium-4k-instruct 🔢📝 → 📝
About
A 14B parameter, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense pro

Example Output
"Can you provide ways to eat combinations of bananas and dragonfruits?"
Output
Certainly! Here are some delicious and healthy combinations of bananas and dragonfruits that you can try:
Banana and Dragonfruit Smoothie:
Blend together 1 banana, 1/2 dragonfruit, 1 cup of almond milk, and a handful of ice cubes for a refreshing smoothie. You can also add a tablespoon of honey or a scoop of protein powder for added sweetness and nutrition.Banana and Dragonfruit Parfait:
Layer sliced bananas, diced dragonfruit, and Greek yogurt in a glass or bowl. Top with granola, chopped nuts, and a drizzle of honey for a delicious and nutritious breakfast or snack.Banana and Dragonfruit Salad:
Combine sliced bananas, diced dragonfruit, and mixed greens in a bowl. Drizzle with a simple vinaigrette made from olive oil, lemon juice, honey, and Dijon mustard. Top with to
Performance Metrics
All Input Parameters
{ "top_k": 50, "top_p": 0.9, "prompt": "Can you provide ways to eat combinations of bananas and dragonfruits?", "max_tokens": 256, "min_tokens": 0, "temperature": 0, "system_prompt": "You are a helpful assistant.", "presence_penalty": 0, "frequency_penalty": 0 }
Input Parameters
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- prompt
- Prompt
- max_tokens
- The maximum number of tokens the model should generate as output.
- min_tokens
- The minimum number of tokens the model should generate as output.
- temperature
- The value used to modulate the next token probabilities.
- system_prompt
- System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
- stop_sequences
- A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
- presence_penalty
- Presence penalty
- frequency_penalty
- Frequency penalty
Output Schema
Output
Example Execution Logs
INFO 07-24 04:05:54 async_llm_engine.py:646] Received request c3c00c0305bb4b64a88368c460202398: prompt: '<|user|>\nCan you provide ways to eat combinations of bananas and dragonfruits?<|end|>\n<|assistant|>\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[32000], include_stop_str_in_output=False, ignore_eos=False, max_tokens=256, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None. stdoutINFO 07-24 04:05:59 metrics.py:295] Avg prompt throughput: 3.8 tokens/s, Avg generation throughput: 46.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%. stdoutGeneration took 1721793164.83sFormatted prompt: <|user|> Can you provide ways to eat combinations of bananas and dragonfruits?<|end|> <|assistant|> INFO 07-24 04:05:59 async_llm_engine.py:134] Finished request c3c00c0305bb4b64a88368c460202398. stdout
Version Details
- Version ID
c6913a88bec85c7e014322393e63807fc5a54230cd488b8d5b566edbcb0e8f4c
- Version Created
- July 20, 2024