nateraw/defog-sqlcoder-7b-2 🔢📝 → 📝

▶️ 21.5K runs 📅 Feb 2024 ⚙️ Cog 0.9.4 🔗 GitHub

code-generation sql-generation text-generation text-to-sql

About

A capable large language model for natural language to SQL generation.

Example Output

Output

SELECT c.city, SUM(s.quantity) AS total_sales, SUM(CASE WHEN c.city = 'New York' THEN s.quantity ELSE 0 END) - SUM(CASE WHEN c.city = 'San Francisco' THEN s.quantity ELSE 0 END) AS difference_in_sales FROM sales s JOIN customers c ON s.customer_id = c.customer_id GROUP BY c.city ORDER BY total_sales DESC NULLS LAST;

Performance Metrics

2.59s Prediction Time

2.63s Total Time

All Input Parameters

{
  "top_k": 50,
  "top_p": 0.9,
  "question": "Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.",
  "temperature": 0,
  "max_new_tokens": 512,
  "table_metadata": "CREATE TABLE products (\n  product_id INTEGER PRIMARY KEY, -- Unique ID for each product\n  name VARCHAR(50), -- Name of the product\n  price DECIMAL(10,2), -- Price of each unit of the product\n  quantity INTEGER  -- Current quantity in stock\n);\n\nCREATE TABLE customers (\n   customer_id INTEGER PRIMARY KEY, -- Unique ID for each customer\n   name VARCHAR(50), -- Name of the customer\n   address VARCHAR(100) -- Mailing address of the customer\n);\n\nCREATE TABLE salespeople (\n  salesperson_id INTEGER PRIMARY KEY, -- Unique ID for each salesperson \n  name VARCHAR(50), -- Name of the salesperson\n  region VARCHAR(50) -- Geographic sales region \n);\n\nCREATE TABLE sales (\n  sale_id INTEGER PRIMARY KEY, -- Unique ID for each sale\n  product_id INTEGER, -- ID of product sold\n  customer_id INTEGER,  -- ID of customer who made purchase\n  salesperson_id INTEGER, -- ID of salesperson who made the sale\n  sale_date DATE, -- Date the sale occurred \n  quantity INTEGER -- Quantity of product sold\n);\n\nCREATE TABLE product_suppliers (\n  supplier_id INTEGER PRIMARY KEY, -- Unique ID for each supplier\n  product_id INTEGER, -- Product ID supplied\n  supply_price DECIMAL(10,2) -- Unit price charged by supplier\n);\n\n-- sales.product_id can be joined with products.product_id\n-- sales.customer_id can be joined with customers.customer_id \n-- sales.salesperson_id can be joined with salespeople.salesperson_id\n-- product_suppliers.product_id can be joined with products.product_id",
  "prompt_template": "### Task\nGenerate a SQL query to answer [QUESTION]{question}[/QUESTION]\n\n### Instructions\n- If you cannot answer the question with the available database schema, return 'I do not know'\n\n### Database Schema\nThe query will run on a database with the following schema:\n{table_metadata}\n\n### Answer\nGiven the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION]\n[SQL]",
  "presence_penalty": 0,
  "frequency_penalty": 0
}

Input Parameters

top_k Type: integerDefault: 50: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.9: A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
question (required) Type: string
temperature Type: numberDefault: 0.6: The value used to modulate the next token probabilities.
max_new_tokens Type: integerDefault: 512: The maximum number of tokens the model should generate as output.
table_metadata (required) Type: string
prompt_template Type: stringDefault: ### Task Generate a SQL query to answer [QUESTION]{question}[/QUESTION] ### Instructions - If you cannot answer the question with the available database schema, return 'I do not know' ### Database Schema The query will run on a database with the following schema: {table_metadata} ### Answer Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION] [SQL]: The template used to format the prompt. The input prompt is inserted into the template using the `{prompt}` placeholder.
presence_penalty Type: numberDefault: 0: Presence penalty
frequency_penalty Type: numberDefault: 0: Frequency penalty

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

INFO 02-11 03:18:28 async_llm_engine.py:431] Received request bdd2a2c68ac94d98bf8ecdd2fea93e0d: prompt: "### Task\nGenerate a SQL query to answer [QUESTION]Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.[/QUESTION]\n\n### Instructions\n- If you cannot answer the question with the available database schema, return 'I do not know'\n\n### Database Schema\nThe query will run on a database with the following schema:\nCREATE TABLE products (\n  product_id INTEGER PRIMARY KEY, -- Unique ID for each product\n  name VARCHAR(50), -- Name of the product\n  price DECIMAL(10,2), -- Price of each unit of the product\n  quantity INTEGER  -- Current quantity in stock\n);\n\nCREATE TABLE customers (\n   customer_id INTEGER PRIMARY KEY, -- Unique ID for each customer\n   name VARCHAR(50), -- Name of the customer\n   address VARCHAR(100) -- Mailing address of the customer\n);\n\nCREATE TABLE salespeople (\n  salesperson_id INTEGER PRIMARY KEY, -- Unique ID for each salesperson \n  name VARCHAR(50), -- Name of the salesperson\n  region VARCHAR(50) -- Geographic sales region \n);\n\nCREATE TABLE sales (\n  sale_id INTEGER PRIMARY KEY, -- Unique ID for each sale\n  product_id INTEGER, -- ID of product sold\n  customer_id INTEGER,  -- ID of customer who made purchase\n  salesperson_id INTEGER, -- ID of salesperson who made the sale\n  sale_date DATE, -- Date the sale occurred \n  quantity INTEGER -- Quantity of product sold\n);\n\nCREATE TABLE product_suppliers (\n  supplier_id INTEGER PRIMARY KEY, -- Unique ID for each supplier\n  product_id INTEGER, -- Product ID supplied\n  supply_price DECIMAL(10,2) -- Unit price charged by supplier\n);\n\n-- sales.product_id can be joined with products.product_id\n-- sales.customer_id can be joined with customers.customer_id \n-- sales.salesperson_id can be joined with salespeople.salesperson_id\n-- product_suppliers.product_id can be joined with products.product_id\n\n### Answer\nGiven the database schema, here is the SQL query that answers [QUESTION]Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.[/QUESTION]\n[SQL]", prefix_pos: None,sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt token ids: None, lora_request: None.
INFO 02-11 03:18:28 llm_engine.py:877] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 5.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.0%, CPU KV cache usage: 0.0%
INFO 02-11 03:18:31 async_llm_engine.py:110] Finished request bdd2a2c68ac94d98bf8ecdd2fea93e0d.
generation took 2.463s

Version Details

Version ID: ced935b577fb52644d933f77e2ff8902744e4c58a2f50023b3a1db80b7a75806
Version Created: February 10, 2024

Run on Replicate →