deepseek-ai/deepseek-coder-v2-lite-instruct 🔢📝 → 📝
About
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Example Output
Prompt:
"write a quick sort algorithm in python."
Output
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quick_sort(left) + middle + quick_sort(right)
print(quick_sort([3,6,8,10,1,2,1]))
Performance Metrics
1.64s
Prediction Time
1.67s
Total Time
All Input Parameters
{ "top_k": 50, "top_p": 0.9, "prompt": "write a quick sort algorithm in python.", "max_tokens": 512, "min_tokens": 0, "temperature": 0.6, "system_prompt": "You are an expert software engineer proficient in multiple programming languages.", "presence_penalty": 0, "frequency_penalty": 0 }
Input Parameters
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- prompt
- Prompt
- max_tokens
- The maximum number of tokens the model should generate as output.
- min_tokens
- The minimum number of tokens the model should generate as output.
- temperature
- The value used to modulate the next token probabilities.
- system_prompt
- System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
- stop_sequences
- A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
- presence_penalty
- Presence penalty
- frequency_penalty
- Frequency penalty
Output Schema
Output
Example Execution Logs
INFO 07-08 19:19:23 async_llm_engine.py:584] Received request 3bea0322126645299a822ea89c4e4778: prompt: 'write a quick sort algorithm in python.', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.9, top_k=50, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[100001], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None. stdoutINFO 07-08 19:19:23 metrics.py:341] Avg prompt throughput: 1.5 tokens/s, Avg generation throughput: 21.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%. stdoutGeneration took 1720466049.90sINFO 07-08 19:19:24 async_llm_engine.py:134] Finished request 3bea0322126645299a822ea89c4e4778. stdoutFormatted prompt: {system_prompt} User: {prompt} Assistant:
Version Details
- Version ID
a182159595ee209e25e673a5ab6f0293111dd6733df973f77efcd1f851123330
- Version Created
- July 8, 2024