nateraw/defog-sqlcoder-7b-2 🔢📝 → 📝
About
A capable large language model for natural language to SQL generation.

Example Output
Output
SELECT c.city, SUM(s.quantity) AS total_sales, SUM(CASE WHEN c.city = 'New York' THEN s.quantity ELSE 0 END) - SUM(CASE WHEN c.city = 'San Francisco' THEN s.quantity ELSE 0 END) AS difference_in_sales FROM sales s JOIN customers c ON s.customer_id = c.customer_id GROUP BY c.city ORDER BY total_sales DESC NULLS LAST;
Performance Metrics
2.59s
Prediction Time
2.63s
Total Time
All Input Parameters
{ "top_k": 50, "top_p": 0.9, "question": "Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.", "temperature": 0, "max_new_tokens": 512, "table_metadata": "CREATE TABLE products (\n product_id INTEGER PRIMARY KEY, -- Unique ID for each product\n name VARCHAR(50), -- Name of the product\n price DECIMAL(10,2), -- Price of each unit of the product\n quantity INTEGER -- Current quantity in stock\n);\n\nCREATE TABLE customers (\n customer_id INTEGER PRIMARY KEY, -- Unique ID for each customer\n name VARCHAR(50), -- Name of the customer\n address VARCHAR(100) -- Mailing address of the customer\n);\n\nCREATE TABLE salespeople (\n salesperson_id INTEGER PRIMARY KEY, -- Unique ID for each salesperson \n name VARCHAR(50), -- Name of the salesperson\n region VARCHAR(50) -- Geographic sales region \n);\n\nCREATE TABLE sales (\n sale_id INTEGER PRIMARY KEY, -- Unique ID for each sale\n product_id INTEGER, -- ID of product sold\n customer_id INTEGER, -- ID of customer who made purchase\n salesperson_id INTEGER, -- ID of salesperson who made the sale\n sale_date DATE, -- Date the sale occurred \n quantity INTEGER -- Quantity of product sold\n);\n\nCREATE TABLE product_suppliers (\n supplier_id INTEGER PRIMARY KEY, -- Unique ID for each supplier\n product_id INTEGER, -- Product ID supplied\n supply_price DECIMAL(10,2) -- Unit price charged by supplier\n);\n\n-- sales.product_id can be joined with products.product_id\n-- sales.customer_id can be joined with customers.customer_id \n-- sales.salesperson_id can be joined with salespeople.salesperson_id\n-- product_suppliers.product_id can be joined with products.product_id", "prompt_template": "### Task\nGenerate a SQL query to answer [QUESTION]{question}[/QUESTION]\n\n### Instructions\n- If you cannot answer the question with the available database schema, return 'I do not know'\n\n### Database Schema\nThe query will run on a database with the following schema:\n{table_metadata}\n\n### Answer\nGiven the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION]\n[SQL]", "presence_penalty": 0, "frequency_penalty": 0 }
Input Parameters
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- question (required)
- temperature
- The value used to modulate the next token probabilities.
- max_new_tokens
- The maximum number of tokens the model should generate as output.
- table_metadata (required)
- prompt_template
- The template used to format the prompt. The input prompt is inserted into the template using the `{prompt}` placeholder.
- presence_penalty
- Presence penalty
- frequency_penalty
- Frequency penalty
Output Schema
Output
Example Execution Logs
INFO 02-11 03:18:28 async_llm_engine.py:431] Received request bdd2a2c68ac94d98bf8ecdd2fea93e0d: prompt: "### Task\nGenerate a SQL query to answer [QUESTION]Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.[/QUESTION]\n\n### Instructions\n- If you cannot answer the question with the available database schema, return 'I do not know'\n\n### Database Schema\nThe query will run on a database with the following schema:\nCREATE TABLE products (\n product_id INTEGER PRIMARY KEY, -- Unique ID for each product\n name VARCHAR(50), -- Name of the product\n price DECIMAL(10,2), -- Price of each unit of the product\n quantity INTEGER -- Current quantity in stock\n);\n\nCREATE TABLE customers (\n customer_id INTEGER PRIMARY KEY, -- Unique ID for each customer\n name VARCHAR(50), -- Name of the customer\n address VARCHAR(100) -- Mailing address of the customer\n);\n\nCREATE TABLE salespeople (\n salesperson_id INTEGER PRIMARY KEY, -- Unique ID for each salesperson \n name VARCHAR(50), -- Name of the salesperson\n region VARCHAR(50) -- Geographic sales region \n);\n\nCREATE TABLE sales (\n sale_id INTEGER PRIMARY KEY, -- Unique ID for each sale\n product_id INTEGER, -- ID of product sold\n customer_id INTEGER, -- ID of customer who made purchase\n salesperson_id INTEGER, -- ID of salesperson who made the sale\n sale_date DATE, -- Date the sale occurred \n quantity INTEGER -- Quantity of product sold\n);\n\nCREATE TABLE product_suppliers (\n supplier_id INTEGER PRIMARY KEY, -- Unique ID for each supplier\n product_id INTEGER, -- Product ID supplied\n supply_price DECIMAL(10,2) -- Unit price charged by supplier\n);\n\n-- sales.product_id can be joined with products.product_id\n-- sales.customer_id can be joined with customers.customer_id \n-- sales.salesperson_id can be joined with salespeople.salesperson_id\n-- product_suppliers.product_id can be joined with products.product_id\n\n### Answer\nGiven the database schema, here is the SQL query that answers [QUESTION]Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.[/QUESTION]\n[SQL]", prefix_pos: None,sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt token ids: None, lora_request: None. INFO 02-11 03:18:28 llm_engine.py:877] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 5.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.0%, CPU KV cache usage: 0.0% INFO 02-11 03:18:31 async_llm_engine.py:110] Finished request bdd2a2c68ac94d98bf8ecdd2fea93e0d. generation took 2.463s
Version Details
- Version ID
ced935b577fb52644d933f77e2ff8902744e4c58a2f50023b3a1db80b7a75806
- Version Created
- February 10, 2024