lucataco/yi-1.5-6b 🔢📝 → 📝

▶️ 66 runs 📅 May 2024 ⚙️ Cog 0.9.6 🔗 GitHub 📄 Paper ⚖️ License

code-generation text-generation text-translation

About

Yi-1.5 is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples

Example Output

Prompt:

"Tell me a joke"

Output

Here's one: What do you call an elephant that can fly?

Performance Metrics

0.55s Prediction Time

128.41s Total Time

All Input Parameters

{
  "top_k": 50,
  "top_p": 0.95,
  "prompt": "Tell me a joke",
  "temperature": 0.7,
  "system_prompt": "You are a friendly Chatbot.",
  "max_new_tokens": 512
}

Input Parameters

top_k Type: integerDefault: 50: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.95Range: 0.1 - 1: A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault: Tell me a joke: Input prompt
temperature Type: numberDefault: 0.7Range: 0.1 - 4: The value used to modulate the next token probabilities.
system_prompt Type: stringDefault: You are a friendly Chatbot.: System prompt
max_new_tokens Type: integerDefault: 512Range: 1 - 4096: The maximum number of tokens the model should generate as output.

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

No chat template is defined for this tokenizer - using the default template for the CachedLlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.
 stderrINFO 05-13 21:51:51 async_llm_engine.py:508] Received request 0.8444218515250481: prompt: '<|startoftext|>[INST] <<SYS>>\nYou are a friendly Chatbot.\n<</SYS>>\n\nTell me a joke [/INST]', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.2, temperature=0.7, top_p=0.95, top_k=50, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|endoftext|>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None.
 stdoutINFO 05-13 21:51:51 metrics.py:218] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
 stdoutINFO 05-13 21:51:51 async_llm_engine.py:120] Finished request 0.8444218515250481.
 stdoutgeneration took 0.430s
 stdout

Version Details

Version ID: f8047bd66544e8a209c8f26ac17edfffcfad583a74f9430bef25165651198b90
Version Created: May 13, 2024

Run on Replicate →