tomasmcm/pandalyst-13b-v1.0 📝🔢 → 📝

▶️ 21 runs 📅 Jan 2024 ⚙️ Cog 0.8.6 📄 Paper ⚖️ License

correlation-analysis data-analysis data-cleaning data-processing large-language-model pandas plotting

About

Source: pipizhao/Pandalyst_13B_V1.0 ✦ Quant: TheBloke/Pandalyst_13B_V1.0-AWQ ✦ Pandalyst: A large language model for mastering data analysis using pandas

Example Output

Prompt:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

Dataframe Name: Career summary, with 43 rows and 9 columns.
Description: This is a table about Gabriel Furlán: Career summary
Columns: Season, Series, Team, Races, Wins, Poles, Podiums, Points, Position
Here are the descriptions of the columns of the dataframe:
{"Column Name": "Season", "Type": "int", "MIN": 1984, "MAX": 2011}
{"Column Name": "Wins", "Type": "string", "Enumerated Values": ["0", "6", "2", "3", "7", "?", "5", "1"]}
{"Column Name": "Position", "Type": "string", "Enumerated Values": ["9th", "18th", "2nd", "10th", "N/A", "13th", "7th", "4th", "6th", "3rd", "8th", "?", "1st", "16th", "11th", "19th", "5th"]}
{"Column Name": "Podiums", "Type": "string", "Enumerated Values": ["9", "0", "3", "7", "?", "5", "1", "4"]}
{"Column Name": "Team", "Type": "string", "Enumerated Values": ["INI Competición", "Junior Team", "Forti Corse", "GF Motorsport", "GF Racing", "Toyota Team Argentina", "Gustavo Sommi", "Ford", "DTA", "Maldonado Competición", "Sommi-Zanón", "Bainotti Dowen Pagio", "Scuderia 111", "Jolly Club"]}
{"Column Name": "Races", "Type": "string", "Enumerated Values": ["9", "8", "13", "2", "7", "14", "18", "?", "5", "1", "12", "4", "11"]}
{"Column Name": "Points", "Type": "string", "Enumerated Values": ["39", "90", "67", "0", "35.5", "79", "78", "52", "N/A", "16", "38", "26", "65", "144", "1", "139", "175", "42", "32", "46"]}
{"Column Name": "Series", "Type": "string", "Enumerated Values": ["Monaco Grand Prix Formula Three", "TC 2000", "Fórmula 2 Codasur", "Stock Car Brasil", "Formula Renault Argentina", "International Formula 3000", "Super Turismo V8", "World Cup Formula 3000", "Top Race V6", "Top Race V6 Copa América", "Formula Three Sudamericana", "Italian Formula Three", "TC 2000 Endurance series"]}
{"Column Name": "Poles", "Type": "string", "Enumerated Values": ["0", "2", "3", "?", "5", "1", "4"]}
Question:
Can we analyze the correlation between the number of races Gabriel Furlán participated in each season and the number of wins he achieved in the same season?

Response:

Output

# Prepare the data: convert "Races" and "Wins" to numerical values, and replace '?' and 'N/A' with NaN
career_summary['Races'] = pd.to_numeric(career_summary['Races'], errors='coerce')
career_summary['Wins'] = pd.to_numeric(career_summary['Wins'], errors='coerce')
career_summary = career_summary.dropna(subset=['Races', 'Wins'])

# Calculate the correlation
correlation = career_summary['Races'].corr(career_summary['Wins'])
print(f"The correlation between the number of races and the number of wins is {correlation}")

Performance Metrics

3.54s Prediction Time

214.52s Total Time

All Input Parameters

{
  "top_k": -1,
  "top_p": 0.95,
  "prompt": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nDataframe Name: Career summary, with 43 rows and 9 columns.\nDescription: This is a table about Gabriel Furlán: Career summary\nColumns: Season, Series, Team, Races, Wins, Poles, Podiums, Points, Position\nHere are the descriptions of the columns of the dataframe:\n{\"Column Name\": \"Season\", \"Type\": \"int\", \"MIN\": 1984, \"MAX\": 2011}\n{\"Column Name\": \"Wins\", \"Type\": \"string\", \"Enumerated Values\": [\"0\", \"6\", \"2\", \"3\", \"7\", \"?\", \"5\", \"1\"]}\n{\"Column Name\": \"Position\", \"Type\": \"string\", \"Enumerated Values\": [\"9th\", \"18th\", \"2nd\", \"10th\", \"N/A\", \"13th\", \"7th\", \"4th\", \"6th\", \"3rd\", \"8th\", \"?\", \"1st\", \"16th\", \"11th\", \"19th\", \"5th\"]}\n{\"Column Name\": \"Podiums\", \"Type\": \"string\", \"Enumerated Values\": [\"9\", \"0\", \"3\", \"7\", \"?\", \"5\", \"1\", \"4\"]}\n{\"Column Name\": \"Team\", \"Type\": \"string\", \"Enumerated Values\": [\"INI Competición\", \"Junior Team\", \"Forti Corse\", \"GF Motorsport\", \"GF Racing\", \"Toyota Team Argentina\", \"Gustavo Sommi\", \"Ford\", \"DTA\", \"Maldonado Competición\", \"Sommi-Zanón\", \"Bainotti Dowen Pagio\", \"Scuderia 111\", \"Jolly Club\"]}\n{\"Column Name\": \"Races\", \"Type\": \"string\", \"Enumerated Values\": [\"9\", \"8\", \"13\", \"2\", \"7\", \"14\", \"18\", \"?\", \"5\", \"1\", \"12\", \"4\", \"11\"]}\n{\"Column Name\": \"Points\", \"Type\": \"string\", \"Enumerated Values\": [\"39\", \"90\", \"67\", \"0\", \"35.5\", \"79\", \"78\", \"52\", \"N/A\", \"16\", \"38\", \"26\", \"65\", \"144\", \"1\", \"139\", \"175\", \"42\", \"32\", \"46\"]}\n{\"Column Name\": \"Series\", \"Type\": \"string\", \"Enumerated Values\": [\"Monaco Grand Prix Formula Three\", \"TC 2000\", \"Fórmula 2 Codasur\", \"Stock Car Brasil\", \"Formula Renault Argentina\", \"International Formula 3000\", \"Super Turismo V8\", \"World Cup Formula 3000\", \"Top Race V6\", \"Top Race V6 Copa América\", \"Formula Three Sudamericana\", \"Italian Formula Three\", \"TC 2000 Endurance series\"]}\n{\"Column Name\": \"Poles\", \"Type\": \"string\", \"Enumerated Values\": [\"0\", \"2\", \"3\", \"?\", \"5\", \"1\", \"4\"]}\nQuestion: \nCan we analyze the correlation between the number of races Gabriel Furlán participated in each season and the number of wins he achieved in the same season?\n\n### Response:\n",
  "max_tokens": 512,
  "temperature": 0.8,
  "presence_penalty": 0,
  "frequency_penalty": 0
}

Input Parameters

stop Type: string: List of strings that stop the generation when they are generated. The returned output will not contain the stop strings.
top_k Type: integerDefault: -1: Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.
top_p Type: numberDefault: 0.95Range: 0.01 - 1: Float that controls the cumulative probability of the top tokens to consider. Must be in (0, 1]. Set to 1 to consider all tokens.
prompt (required) Type: string: Text prompt to send to the model.
max_tokens Type: integerDefault: 128: Maximum number of tokens to generate per output sequence.
temperature Type: numberDefault: 0.8Range: 0.01 - 5: Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling.
presence_penalty Type: numberDefault: 0Range: -5 - 5: Float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.
frequency_penalty Type: numberDefault: 0Range: -5 - 5: Float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.

Output Schema

Output

Type: string

Example Execution Logs

Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s]
Processed prompts: 100%|██████████| 1/1 [00:03<00:00,  3.47s/it]
Processed prompts: 100%|██████████| 1/1 [00:03<00:00,  3.47s/it]
Generated 172 tokens in 3.4777262210845947 seconds.

Version Details

Version ID: 49be21dfc89f89bc2753f71132dd98eaba1251423a021ce2d62b690e068819cc
Version Created: January 31, 2024

Run on Replicate →