lucataco/qwen2-57b-a14b-instruct 🔢📝 → 📝

▶️ 1.4K runs 📅 Jul 2024 ⚙️ Cog 0.10.0-alpha11 📄 Paper ⚖️ License
code-generation text-generation text-translation

About

Qwen2 57 billion parameter language model from Alibaba Cloud, fine tuned for chat completions

Example Output

Prompt:

"Give me a short introduction to large language model."

Output

A large language model (LLM) is a type of artificial intelligence model that is trained on a massive amount of text data to generate human-like text. These models are typically trained using deep learning techniques, and they are able to generate text that is coherent and contextually appropriate, making them useful for a variety of natural language processing tasks. Some common applications of large language models include language translation, text summarization, and question answering. They are also used in chatbots and virtual assistants to enable more natural and realistic conversations with users. Large language models are often referred to as "generative models" because they are able to generate new text based on the patterns they have learned from the training data.

Performance Metrics

2.11s Prediction Time
253.87s Total Time
All Input Parameters
{
  "top_k": 50,
  "top_p": 0.9,
  "prompt": "Give me a short introduction to large language model.",
  "max_tokens": 512,
  "min_tokens": 0,
  "temperature": 0.6,
  "system_prompt": "You are a helpful assistant.",
  "presence_penalty": 0,
  "frequency_penalty": 0
}
Input Parameters
top_k Type: integerDefault: 50
The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.9
A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault:
Prompt
max_tokens Type: integerDefault: 512
The maximum number of tokens the model should generate as output.
min_tokens Type: integerDefault: 0
The minimum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.6
The value used to modulate the next token probabilities.
system_prompt Type: stringDefault: You are a helpful assistant.
System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
stop_sequences Type: string
A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
presence_penalty Type: numberDefault: 0
Presence penalty
frequency_penalty Type: numberDefault: 0
Frequency penalty
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
INFO 07-02 15:28:31 async_llm_engine.py:529] Received request 98ac8d73e2454af68ca1cb481bc48be2: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nGive me a short introduction to large language model.<|im_end|>\n<|im_start|>assistant\n', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.9, top_k=50, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[151645], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
 stdoutGeneration took 1719933329.48sFormatted prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Give me a short introduction to large language model.<|im_end|>
<|im_start|>assistant
INFO 07-02 15:28:33 async_llm_engine.py:120] Finished request 98ac8d73e2454af68ca1cb481bc48be2.
 stdout
Version Details
Version ID
fc67fa3fa20d3d0ee59794df05548b59d285fbb43d944506203a8a2195b75c36
Version Created
July 2, 2024
Run on Replicate →