kcaverly/phind-codellama-34b-v2-gguf 🔢📝 → 📝

▶️ 232 runs 📅 Dec 2023 ⚙️ Cog 0.8.6 🔗 GitHub 📄 Paper ⚖️ License
code-generation text-generation

About

A quantized 34B parameter language model from Phind for code completion

Example Output

Prompt:

"please create a rust enum called prediction status, with three variants starting, in progress and completed. Please only include valid rust code, do not include any commentary or explanations."

Output

pub enum PredictionStatus {
    Starting,
    InProgress,
    Completed,
}

Performance Metrics

1.49s Prediction Time
283.28s Total Time
All Input Parameters
{
  "top_k": 40,
  "top_p": 0.75,
  "prompt": "please create a rust enum called prediction status, with three variants starting, in progress and completed. Please only include valid rust code, do not include any commentary or explanations.",
  "temperature": 0.01,
  "system_prompt": "You are an intelligent programming assistant.",
  "max_new_tokens": -1,
  "prompt_template": "### System Prompt\n{system_prompt}\n### User Message\n{prompt}\n### Assistant\n"
}
Input Parameters
top_k Type: integerDefault: 40
This is the number of probable next words, to create a pool of words to choose from
top_p Type: numberDefault: 0.75
This parameter controls how many of the highest-probability words are selected to be included in the generated text
prompt (required) Type: string
Instruction for model
temperature Type: numberDefault: 0.01
This parameter used to control the 'warmth' or responsiveness of an AI model based on the LLaMA architecture. It adjusts how likely the model is to generate new, unexpected information versus sticking closely to what it has been trained on. A higher value for this parameter can lead to more creative and diverse responses, while a lower value results in safer, more conservative answers that are closer to those found in its training data. This parameter is particularly useful when fine-tuning models for specific tasks where you want to balance between generating novel insights and maintaining accuracy and coherence.
system_prompt Type: stringDefault: You are an intelligent programming assistant.
System prompt for the model, helps guides model behaviour.
max_new_tokens Type: integerDefault: -1
Maximum new tokens to generate.
prompt_template Type: stringDefault: ### System Prompt {system_prompt} ### User Message {prompt} ### Assistant
Template to pass to model. Override if you are providing multi-turn instructions.
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
llama_print_timings:        load time =     268.65 ms
llama_print_timings:      sample time =       4.06 ms /    28 runs   (    0.15 ms per token,  6891.46 tokens per second)
llama_print_timings: prompt eval time =     268.55 ms /    63 tokens (    4.26 ms per token,   234.59 tokens per second)
llama_print_timings:        eval time =    1159.85 ms /    27 runs   (   42.96 ms per token,    23.28 tokens per second)
llama_print_timings:       total time =    1478.76 ms
Version Details
Version ID
5cc313d1a99986f6b903926cd493e07bfbcf8f0fd1417a234aa2ba9ba01a3e61
Version Created
December 13, 2023
Run on Replicate →