lucataco/qwen1.5-110b 🔢📝 → 📝
About
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data

Example Output
Prompt:
"Provide a short introduction to large language models"
Output
Large language models (LLMs) are artificial intelligence systems that have been trained on massive amounts of text data to generate human-like language output. They are capable of understanding and generating natural language, and can be used for a wide range of tasks such as language translation, summarization, question answering, and even creative writing. LLMs have become increasingly popular in recent years due to their ability to perform complex language tasks with high accuracy and speed. However, they also raise ethical concerns around issues such as bias and privacy.
Performance Metrics
4.74s
Prediction Time
250.40s
Total Time
All Input Parameters
{ "top_k": 50, "top_p": 0.8, "prompt": "Provide a short introduction to large language models", "temperature": 0.7, "system_prompt": "You are a helpful assistant.", "max_new_tokens": 256, "repetition_penalty": 1.05 }
Input Parameters
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- prompt
- Input prompt
- temperature
- The value used to modulate the next token probabilities.
- system_prompt
- System prompt
- max_new_tokens
- The maximum number of tokens the model should generate as output.
- repetition_penalty
- Repetition penalty
Output Schema
Output
Example Execution Logs
[36m(RayWorkerVllm pid=3790)[0m INFO 04-26 23:19:54 model_runner.py:867] Graph capturing finished in 6 secs. INFO 04-26 23:19:56 async_llm_engine.py:508] Received request 0: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nProvide a short introduction to large language models<|im_end|>\n<|im_start|>assistant\n', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.7, top_p=0.8, top_k=50, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|endoftext|>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=256, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None. INFO 04-26 23:19:56 metrics.py:218] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% INFO 04-26 23:20:00 async_llm_engine.py:120] Finished request 0. generation took 4.678s
Version Details
- Version ID
af7953cb4fe4948df44a074d4785c2f74d0096257197198e90c9ac84361b6aa9
- Version Created
- April 26, 2024