tomasmcm/digital-socrates-13b 📝🔢 → 📝

▶️ 19 runs 📅 Jan 2024 ⚙️ Cog 0.8.6 📄 Paper ⚖️ License

automatic-evaluation educational-tool explanation-critiquing feedback-generation llm-fine-tuning reasoning-evaluation student-model-analysis

Performance

5.4sTypical run time

~228sCold start (first call)

19Total runs

About

Source: allenai/digital-socrates-13b ✦ Quant: TheBloke/digital-socrates-13B-AWQ ✦ Digital Socrates is an open-source, automatic explanation-critiquing model

Example Output

Prompt:

[INST] <>
You are a helpful and rigorous tutor.
<
You are a knowledgeable tutor who gives helpful critique on a given answer and explanation to a question. The first component of the critique should reflect back the most significant flaw (if any) in the explanation. The flaw could range from using incorrect facts or sloppy reasoning to not providing any justification beyond just stating the answer.

The flaw should be categorized into one of these dimensions:

misunderstanding: apparent misunderstanding of the original question or answer choices
lack_justification: just stating the final answer without any proper or informative justification
incorrect_information: stating incorrect fact(s) or knowledge
missing_information: missing a crucial fact, knowledge, or perspective that should be considered
incorrect_reasoning: an incorrect leap in the reasoning chain
incomplete_reasoning: the reasoning doesn't lead all the way to the answer
inconsistent_answer: the answer doesn't match the conclusion of the explanation
irrelevant: using irrelevant or redundant fact(s), knowledge, or reasoning

The flaw should be formulated as a standalone erroneous statement, that can be understood without reference to the full question or explanation.

When possible, use quote(s) in the reflection statement. In the quote(s), please use "..." to leave out parts not relevant to the error, to leave just the erroneous parts.

The quote(s) should be from the explanation, except in the case of the "misunderstanding" dimension, where the quote should try to identify the part of the question or answer that was misunderstood.

If necessary, try to add a brief context to the reflection statement so it can be read standalone without reference to the full question or explanation. It should be possible for someone to read just the feedback and realize that there is an error, so for instance quoting an answer option (like "(B)") or just stating a correct fact (that is used incorrectly) is not appropriate.

If an explanation is reasonable, and provides some meaningful evidence supporting the answer, there is no need to be overly picky and possibly misleading, just state "None" as the feedback instead. It is okay, and even encouraged, for the explanation to also present arguments against incorrect answer choices (this is not considered "irrelevant" arguments or "inconsistent_answer"). Sometimes a justification might reasonably involve a plausibility argument rather than strict deduction, and that might be okay.

The second component of the critique should provide statements toward fixing this flaw in the explanation. First, try to provide a general statement that addresses a likely misconception or confusion behind the flaw. Formulate this so it can be reused independently of this particular question. Next, provide a more specific statement to consider when revising the explanation.

Finally, please grade the explanation on a scale from 0 to 5, with the scoring guidelines:

0: Very wrong explanation
1: Has something useful in the explanation that's relevant to the question
2: Makes some valid points, but is mostly incorrect
3: Has many correct elements, but with significant flaws
4: Mostly correct, with a minor flaw
5: Completely correct

Please use the following format for the critique (use "None" if there is no feedback for a particular line):

The explanation states or suggests the following:

Main flaw (standalone statement):
Dimension:

Consider these points for revising the explanation:

General:
Specific:

Explanation score:

Here is the question with the given answer and explanation:

Question: When Dennis operates his lawnmower, he notices the engine makes a lot of noise. He also notices that the engine gets very hot. Which best describes the heat and noise generated from the lawnmower? (A) a change in phase (B) thermal expansion (C) an increase in entropy (D) mechanical advantage

Correct answer (according to answer sheet): (C)

Given answer: (D)

Given explanation: 1) The question states that the lawnmower engine makes a lot of noise.
2) The question states that the lawnmower engine gets very hot.
3) Noise and heat are both forms of energy.
4) The noise and heat generated from the lawnmower are a result of the conversion of energy from the fuel to mechanical energy. [/INST]

Output

The explanation states or suggests the following:

Main flaw (standalone statement): "The noise and heat generated from the lawnmower are a result of the conversion of energy from the fuel to mechanical energy."
Dimension: incorrect_information

Consider these points for revising the explanation:

General: Remember that when discussing thermodynamics, it's important to distinguish between the concepts of heat and temperature. Heat is a form of energy, but it's not the same as temperature.
Specific: In this case, you should explain that the noise and heat generated from the lawnmower are a result of the conversion of energy from the fuel to thermal energy, not mechanical energy.

Explanation score: 2

Performance Metrics

5.40s Prediction Time

227.98s Total Time

All Input Parameters

{
  "stop": "[INST]",
  "top_k": -1,
  "top_p": 0.95,
  "prompt": "[INST] <<SYS>>\nYou are a helpful and rigorous tutor.\n<</SYS>\nYou are a knowledgeable tutor who gives helpful critique on a given answer and explanation to a question. The first component of the critique should reflect back the most significant flaw (if any) in the explanation. The flaw could range from using incorrect facts or sloppy reasoning to not providing any justification beyond just stating the answer. \n\nThe flaw should be categorized into one of these dimensions:\n\nmisunderstanding: apparent misunderstanding of the original question or answer choices\nlack_justification: just stating the final answer without any proper or informative justification\nincorrect_information: stating incorrect fact(s) or knowledge\nmissing_information: missing a crucial fact, knowledge, or perspective that should be considered\nincorrect_reasoning: an incorrect leap in the reasoning chain\nincomplete_reasoning: the reasoning doesn't lead all the way to the answer\ninconsistent_answer: the answer doesn't match the conclusion of the explanation\nirrelevant: using irrelevant or redundant fact(s), knowledge, or reasoning\n\nThe flaw should be formulated as a standalone erroneous statement, that can be understood without reference to the full question or explanation.\n\nWhen possible, use quote(s) in the reflection statement. In the quote(s), please use \"...\" to leave out parts not relevant to the error, to leave just the erroneous parts. \n\nThe quote(s) should be from the explanation, except in the case of the  \"misunderstanding\" dimension, where the quote should try to identify the part of the question or answer that was misunderstood. \n\nIf necessary, try to add a brief context to the reflection statement so it can be read standalone without reference to the full question or explanation. It should be possible for someone to read just the feedback and realize that there is an error, so for instance quoting an answer option (like \"(B)\") or just stating a correct fact (that is used incorrectly) is not appropriate. \n\nIf an explanation is reasonable, and provides some meaningful evidence supporting the answer, there is no need to be overly picky and possibly misleading, just state \"None\" as the feedback instead. It is okay, and even encouraged, for the explanation to also present arguments against incorrect answer choices (this is not considered \"irrelevant\" arguments or \"inconsistent_answer\"). Sometimes a justification might reasonably involve a plausibility argument rather than strict deduction, and that might be okay.\n\n\nThe second component of the critique should provide statements toward fixing this flaw in the explanation. First, try to provide a general statement that addresses a likely misconception or confusion behind the flaw. Formulate this so it can be reused independently of this particular question. Next, provide a more specific statement to consider when revising the explanation.\n\nFinally, please grade the explanation on a scale from 0 to 5, with the scoring guidelines:\n\n0: Very wrong explanation\n1: Has something useful in the explanation that's relevant to the question\n2: Makes some valid points, but is mostly incorrect\n3: Has many correct elements, but with significant flaws\n4: Mostly correct, with a minor flaw\n5: Completely correct\n\n\nPlease use the following format for the critique (use \"None\" if there is no feedback for a particular line):\n\nThe explanation states or suggests the following:\n * Main flaw (standalone statement): <flaw>\n * Dimension: <dimension>\n\n Consider these points for revising the explanation:\n * General: <reusable statement>\n * Specific: <question-specific statement> \n\n Explanation score: <score>\n\n\nHere is the question with the given answer and explanation:\n\nQuestion: When Dennis operates his lawnmower, he notices the engine makes a lot of noise. He also notices that the engine gets very hot. Which best describes the heat and noise generated from the lawnmower? (A) a change in phase (B) thermal expansion (C) an increase in entropy (D) mechanical advantage\n\nCorrect answer (according to answer sheet): (C)\n\nGiven answer: (D)\n\nGiven explanation: 1) The question states that the lawnmower engine makes a lot of noise.\n2) The question states that the lawnmower engine gets very hot.\n3) Noise and heat are both forms of energy.\n4) The noise and heat generated from the lawnmower are a result of the conversion of energy from the fuel to mechanical energy. [/INST]\n\n",
  "max_tokens": 256,
  "temperature": 0.8,
  "presence_penalty": 0,
  "frequency_penalty": 0
}

Input Parameters

stop Type: string: List of strings that stop the generation when they are generated. The returned output will not contain the stop strings.
top_k Type: integerDefault: -1: Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.
top_p Type: numberDefault: 0.95Range: 0.01 - 1: Float that controls the cumulative probability of the top tokens to consider. Must be in (0, 1]. Set to 1 to consider all tokens.
prompt (required) Type: string: Text prompt to send to the model.
max_tokens Type: integerDefault: 128: Maximum number of tokens to generate per output sequence.
temperature Type: numberDefault: 0.8Range: 0.01 - 5: Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling.
presence_penalty Type: numberDefault: 0Range: -5 - 5: Float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.
frequency_penalty Type: numberDefault: 0Range: -5 - 5: Float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.

Output Schema

Output

Type: string

Example Execution Logs

Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s]
Processed prompts: 100%|██████████| 1/1 [00:05<00:00,  5.34s/it]
Processed prompts: 100%|██████████| 1/1 [00:05<00:00,  5.34s/it]
Generated 163 tokens in 5.3417792320251465 seconds.

Version Details

Version ID: b67a03616e834e779758703e4388ecfbefef765deda0d1a748f3d76158c4a2d0
Version Created: January 17, 2024

Run on Replicate →