kcaverly/openchat-3.5-1210-gguf 📝🔢 → 📝

▶️ 26.3K runs 📅 Dec 2023 ⚙️ Cog 0.8.6 🔗 GitHub 📄 Paper

code-generation math-reasoning text-generation

About

The "Overall Best Performing Open Source 7B Model" for Coding + Generalization or Mathematical Reasoning

Example Output

Prompt:

"Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?"

Output

Since Sally is the only girl in her family, she must be considered as one of the "sisters" mentioned. Therefore, Sally has 2 sisters (including herself).

Performance Metrics

0.86s Prediction Time

85.14s Total Time

All Input Parameters

{
  "prompt": "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?",
  "temperature": 0.7,
  "max_new_tokens": -1,
  "repeat_penalty": 1.1,
  "prompt_template": "GPT Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: "
}

Input Parameters

prompt (required) Type: string: Instruction for model
temperature Type: numberDefault: 0.7: This parameter used to control the 'warmth' or responsiveness of an AI model based on the LLaMA architecture. It adjusts how likely the model is to generate new, unexpected information versus sticking closely to what it has been trained on. A higher value for this parameter can lead to more creative and diverse responses, while a lower value results in safer, more conservative answers that are closer to those found in its training data. This parameter is particularly useful when fine-tuning models for specific tasks where you want to balance between generating novel insights and maintaining accuracy and coherence.
max_new_tokens Type: integerDefault: -1: Maximum new tokens to generate.
repeat_penalty Type: numberDefault: 1.1: This parameter plays a role in controlling the behavior of an AI language model during conversation or text generation. Its purpose is to discourage the model from repeating itself too often by increasing the likelihood of following up with different content after each response. By adjusting this parameter, users can influence the model's tendency to either stay within familiar topics (lower penalty) or explore new ones (higher penalty). For instance, setting a high repeat penalty might result in more varied and dynamic conversations, whereas a low penalty could be suitable for scenarios where consistency and predictability are preferred.
prompt_template Type: stringDefault: GPT Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:: Template to pass to model. Override if you are providing multi-turn instructions.

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

llama_print_timings:        load time =     155.21 ms
llama_print_timings:      sample time =       5.26 ms /    38 runs   (    0.14 ms per token,  7218.84 tokens per second)
llama_print_timings: prompt eval time =     155.11 ms /    40 tokens (    3.88 ms per token,   257.88 tokens per second)
llama_print_timings:        eval time =     570.10 ms /    37 runs   (   15.41 ms per token,    64.90 tokens per second)
llama_print_timings:       total time =     794.99 ms

Version Details

Version ID: 0d1426400ae23540eef130c0cd6cbd7184ac47cffee9dfd16fdf02d065df123b
Version Created: December 19, 2023

Run on Replicate →