microsoft/phi-3-mini-4k-instruct 🔢📝 → 📝

▶️ 100.8K runs 📅 Apr 2024 ⚙️ Cog 0.10.0-alpha11 🔗 GitHub ⚖️ License

code-generation question-answering text-generation

About

Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets

Example Output

Prompt:

"How would you explain Internet to a medieval knight?"

Output

My noble knight, imagine a vast kingdom, not of land and stone, but of knowledge and connection. This kingdom, known as the Internet, is a magical realm where people from all corners of the world can share their thoughts, stories, and wisdom.

In this kingdom, there are many different paths, called "websites," where people can gather information on various subjects. These websites are like grand libraries, filled with scrolls and tomes on every topic imaginable.

To navigate this kingdom, one must possess a mystical tool known as a "computer." This device allows you to interact with the kingdom's inhabitants, known as "users," and explore the many paths and libraries within.

To communicate with others, knights and commoners alike use a magical messenger service called "email." This allows them to send messages across great distances, as if by enchantment, to their fellow citizens.

Another way to connect with others is through a grand marketplace called "social media." Here, people can share their daily lives, exchange ideas, and even form alliances with others from distant lands.

To find specific information or resources, one can use a powerful search tool known as a "search engine." This tool allows you to summon knowledge from the kingdom's vast libraries with just a few words, much like calling upon a wise sage for guidance.

In this kingdom, there are also many merchants and artisans who offer their goods and services through a system called "e-commerce." This allows you to purchase items from distant lands without ever leaving your castle.

In summary, the Internet is a vast and magical kingdom of knowledge and connection, where people can share ideas, communicate, and explore the world in ways that were once thought impossible. As a knight, you may find this realm both fascinating and bewildering, but with time and practice, you will surely master its many wonders.

Performance Metrics

6.45s Prediction Time

142.06s Total Time

All Input Parameters

{
  "top_k": 1,
  "top_p": 1,
  "prompt": "How would you explain Internet to a medieval knight?",
  "max_tokens": 512,
  "min_tokens": 0,
  "temperature": 0.1,
  "system_prompt": "You are a helpful AI assistant.",
  "presence_penalty": 0,
  "frequency_penalty": 0
}

Input Parameters

top_k Type: integerDefault: 50: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.9: A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault:: Prompt
max_tokens Type: integerDefault: 512: The maximum number of tokens the model should generate as output.
min_tokens Type: integerDefault: 0: The minimum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.6: The value used to modulate the next token probabilities.
system_prompt Type: stringDefault: You are a helpful assistant.: System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
stop_sequences Type: string: A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
presence_penalty Type: numberDefault: 0: Presence penalty
frequency_penalty Type: numberDefault: 0: Frequency penalty

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

INFO 07-02 17:04:02 async_llm_engine.py:529] Received request 1e304be124294e54972b3c9e8562b9cc: prompt: '<|system|>\nYou are a helpful AI assistant.<|end|>\n<|user|>\nHow would you explain Internet to a medieval knight?<|end|>\n<|assistant|>\n', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.1, top_p=1.0, top_k=1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[32000], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
 stdoutINFO 07-02 17:04:05 metrics.py:334] Avg prompt throughput: 4.8 tokens/s, Avg generation throughput: 44.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%
 stdoutGeneration took 1719937941.42sFormatted prompt: <|system|>
You are a helpful AI assistant.<|end|>
<|user|>
How would you explain Internet to a medieval knight?<|end|>
<|assistant|>
INFO 07-02 17:04:08 async_llm_engine.py:120] Finished request 1e304be124294e54972b3c9e8562b9cc.
 stdout

Version Details

Version ID: e17386e6ae2e351f63783fa89f427fd0ed415524a7b3d8c122f6ac80ad0166b1
Version Created: July 2, 2024

Run on Replicate →