microsoft/phi-3-mini-4k-instruct 🔢📝 → 📝
About
Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets

Example Output
"How would you explain Internet to a medieval knight?"
Output
My noble knight, imagine a vast kingdom, not of land and stone, but of knowledge and connection. This kingdom, known as the Internet, is a magical realm where people from all corners of the world can share their thoughts, stories, and wisdom.
In this kingdom, there are many different paths, called "websites," where people can gather information on various subjects. These websites are like grand libraries, filled with scrolls and tomes on every topic imaginable.
To navigate this kingdom, one must possess a mystical tool known as a "computer." This device allows you to interact with the kingdom's inhabitants, known as "users," and explore the many paths and libraries within.
To communicate with others, knights and commoners alike use a magical messenger service called "email." This allows them to send messages across great distances, as if by enchantment, to their fellow citizens.
Another way to connect with others is through a grand marketplace called "social media." Here, people can share their daily lives, exchange ideas, and even form alliances with others from distant lands.
To find specific information or resources, one can use a powerful search tool known as a "search engine." This tool allows you to summon knowledge from the kingdom's vast libraries with just a few words, much like calling upon a wise sage for guidance.
In this kingdom, there are also many merchants and artisans who offer their goods and services through a system called "e-commerce." This allows you to purchase items from distant lands without ever leaving your castle.
In summary, the Internet is a vast and magical kingdom of knowledge and connection, where people can share ideas, communicate, and explore the world in ways that were once thought impossible. As a knight, you may find this realm both fascinating and bewildering, but with time and practice, you will surely master its many wonders.
Performance Metrics
All Input Parameters
{ "top_k": 1, "top_p": 1, "prompt": "How would you explain Internet to a medieval knight?", "max_tokens": 512, "min_tokens": 0, "temperature": 0.1, "system_prompt": "You are a helpful AI assistant.", "presence_penalty": 0, "frequency_penalty": 0 }
Input Parameters
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- prompt
- Prompt
- max_tokens
- The maximum number of tokens the model should generate as output.
- min_tokens
- The minimum number of tokens the model should generate as output.
- temperature
- The value used to modulate the next token probabilities.
- system_prompt
- System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
- stop_sequences
- A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
- presence_penalty
- Presence penalty
- frequency_penalty
- Frequency penalty
Output Schema
Output
Example Execution Logs
INFO 07-02 17:04:02 async_llm_engine.py:529] Received request 1e304be124294e54972b3c9e8562b9cc: prompt: '<|system|>\nYou are a helpful AI assistant.<|end|>\n<|user|>\nHow would you explain Internet to a medieval knight?<|end|>\n<|assistant|>\n', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.1, top_p=1.0, top_k=1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[32000], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None. stdoutINFO 07-02 17:04:05 metrics.py:334] Avg prompt throughput: 4.8 tokens/s, Avg generation throughput: 44.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0% stdoutGeneration took 1719937941.42sFormatted prompt: <|system|> You are a helpful AI assistant.<|end|> <|user|> How would you explain Internet to a medieval knight?<|end|> <|assistant|> INFO 07-02 17:04:08 async_llm_engine.py:120] Finished request 1e304be124294e54972b3c9e8562b9cc. stdout
Version Details
- Version ID
e17386e6ae2e351f63783fa89f427fd0ed415524a7b3d8c122f6ac80ad0166b1
- Version Created
- July 2, 2024