lucataco/qwen1.5-110b 🔢📝 → 📝

▶️ 2.7K runs 📅 Apr 2024 ⚙️ Cog 0.9.6 🔗 GitHub ⚖️ License

code-generation multilingual text-generation

About

Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data

Example Output

Prompt:

"Provide a short introduction to large language models"

Output

Large language models (LLMs) are artificial intelligence systems that have been trained on massive amounts of text data to generate human-like language output. They are capable of understanding and generating natural language, and can be used for a wide range of tasks such as language translation, summarization, question answering, and even creative writing. LLMs have become increasingly popular in recent years due to their ability to perform complex language tasks with high accuracy and speed. However, they also raise ethical concerns around issues such as bias and privacy.

Performance Metrics

4.74s Prediction Time

250.40s Total Time

All Input Parameters

{
  "top_k": 50,
  "top_p": 0.8,
  "prompt": "Provide a short introduction to large language models",
  "temperature": 0.7,
  "system_prompt": "You are a helpful assistant.",
  "max_new_tokens": 256,
  "repetition_penalty": 1.05
}

Input Parameters

top_k Type: integerDefault: 50: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.8: A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault: Give me a short introduction to large language model.: Input prompt
temperature Type: numberDefault: 0.7: The value used to modulate the next token probabilities.
system_prompt Type: stringDefault: You are a helpful assistant.: System prompt
max_new_tokens Type: integerDefault: 256Range: 1 - 16384: The maximum number of tokens the model should generate as output.
repetition_penalty Type: numberDefault: 1.05: Repetition penalty

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

[36m(RayWorkerVllm pid=3790)[0m INFO 04-26 23:19:54 model_runner.py:867] Graph capturing finished in 6 secs.
INFO 04-26 23:19:56 async_llm_engine.py:508] Received request 0: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nProvide a short introduction to large language models<|im_end|>\n<|im_start|>assistant\n', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.7, top_p=0.8, top_k=50, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|endoftext|>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=256, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None.
INFO 04-26 23:19:56 metrics.py:218] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-26 23:20:00 async_llm_engine.py:120] Finished request 0.
generation took 4.678s

Version Details

Version ID: af7953cb4fe4948df44a074d4785c2f74d0096257197198e90c9ac84361b6aa9
Version Created: April 26, 2024

Run on Replicate →