lucataco/yi-1.5-6b 🔢📝 → 📝
About
Yi-1.5 is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples

Example Output
Prompt:
"Tell me a joke"
Output
Here's one: What do you call an elephant that can fly?
Performance Metrics
0.55s
Prediction Time
128.41s
Total Time
All Input Parameters
{ "top_k": 50, "top_p": 0.95, "prompt": "Tell me a joke", "temperature": 0.7, "system_prompt": "You are a friendly Chatbot.", "max_new_tokens": 512 }
Input Parameters
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- prompt
- Input prompt
- temperature
- The value used to modulate the next token probabilities.
- system_prompt
- System prompt
- max_new_tokens
- The maximum number of tokens the model should generate as output.
Output Schema
Output
Example Execution Logs
No chat template is defined for this tokenizer - using the default template for the CachedLlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information. stderrINFO 05-13 21:51:51 async_llm_engine.py:508] Received request 0.8444218515250481: prompt: '<|startoftext|>[INST] <<SYS>>\nYou are a friendly Chatbot.\n<</SYS>>\n\nTell me a joke [/INST]', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.2, temperature=0.7, top_p=0.95, top_k=50, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|endoftext|>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None. stdoutINFO 05-13 21:51:51 metrics.py:218] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% stdoutINFO 05-13 21:51:51 async_llm_engine.py:120] Finished request 0.8444218515250481. stdoutgeneration took 0.430s stdout
Version Details
- Version ID
f8047bd66544e8a209c8f26ac17edfffcfad583a74f9430bef25165651198b90
- Version Created
- May 13, 2024