How to Safely Use LLMs for Text-to-SQL with Stored Procedures
Introduction
Text-to-SQL has long been a goal for those seeking to make data more accessible. With the rise of large language models (LLMs), that goal is now within reach — users can ask questions in plain English and receive answers directly from databases. But giving natural language access to structured data is not without risks.
This article explores the trade-offs between flexibility and control when using LLMs to generate SQL queries. I compare two strategies: instructing the model on what it must avoid, versus restricting it to use only predefined stored procedures. To ground the discussion, we’ll walk through a real-world use case using the Amazonic Traders database — including sample code, stored procedure definitions, and prompt engineering techniques.
TL;DR
- Text-to-SQL is a powerful capability that allows users to query databases using natural language, but it comes with risks and challenges.
- The technology has evolved from rule-based systems to large language models (LLMs) that can dynamically interpret and generate SQL queries.
- The key challenge is governance: how to ensure that users can only run safe and efficient queries without overwhelming the database or accessing sensitive data.
- Two main approaches to control LLM-generated SQL queries are discussed: (1) instructing the LLM on what it cannot do, and (2) limiting the engine to only what it can do by using predefined stored procedures or templates.
- The second approach is more structured and predictable, allowing for strong governance and security while still providing a conversational interface for users.
- The article provides a practical use case with the Amazonic Traders database, demonstrating how to create a stored procedure and describe it for LLM usage.
Some background
The idea of using natural language to query databases has been around for decades. The concept of “Text-to-SQL” — converting natural language questions into SQL queries — has been a research topic since the 1970s. Early systems like the Natural Language Query System (NLQ) and later systems like SQL*Plus attempted to bridge the gap between human language and structured query languages.
If you’re starting to go gray, you might remember that SQL Server once had a feature called English Query, introduced in SQL Server 7.0 (released in 1998). English Query allowed users to pose questions in natural language and have them translated into SQL queries. Under the hood, it used a custom semantic model built by database designers, which mapped English phrases to tables, columns, and relationships within the database. It wasn’t AI-based — instead, it relied on manually defined linguistic rules and vocabulary mappings. While powerful for its time, it required significant setup and didn’t adapt dynamically, which eventually led to its discontinuation in later versions.
With the rise of large language models (LLMs), the vision behind English Query has been revived — only now, with vastly more powerful and flexible technology. Unlike the rigid, rule-based approach of SQL Server’s English Query in version 7.0, modern LLMs like GPT-4, Claude, or Mistral can understand natural language dynamically, without needing handcrafted mappings or explicit vocabulary definitions. These models are trained on massive corpora that include both natural language and structured data examples, enabling them to generalize across domains and convert free-text input into valid SQL queries on the fly.
The key difference lies in adaptability and scalability: English Query required a human to define the semantic model and relationships manually, which was time-consuming and brittle. In contrast, LLMs can infer intent and relationships from context and schema descriptions, often with minimal setup. This shift drastically lowers the barrier to natural language querying, making it viable for broader adoption — but also introduces new risks around accuracy, security, and governance, especially if users are allowed to generate arbitrary SQL statements.
Why is it exciting?
The resurgence of natural language querying through LLMs marks a significant leap in data accessibility. The big dream is that users no longer need to know SQL syntax or understand database schemas in depth — they can simply ask questions in plain text and receive answers powered by real-time database queries. This democratizes data access across organizations, empowering business users, analysts, and support teams to explore insights without waiting for technical teams to write or validate queries. It’s the realization of a decades-old dream: making data conversational.
But… what’s the catch?
This freedom comes with trade-offs. When users can ask anything, they may unknowingly trigger expensive queries, access sensitive data, or overload systems — especially if the LLM generates raw SQL without validation or boundaries. Unlike rule-based systems like English Query, LLMs can hallucinate structure, misinterpret intent, or bypass logic entirely. Without proper guardrails, such as predefined stored procedures or query whitelisting, the risk of performance issues, data leaks, and compliance violations grows rapidly. In other words, we’ve unlocked a powerful capability — but without control, it can quickly become chaotic.
If you are a Production DBA, you might be thinking: Great, but how do I ensure that users can only run safe queries? Will the LLM build efficient queries enough that don’t overload my database? How do I prevent users from accessing sensitive data? Not to mention, legal and compliance issues around data access and privacy. These are real concerns that need to be addressed before unleashing LLMs on production databases.
Of course, you can still build a strong security layer around the databaes objects, but that security layer is necesary for the LLM. Traditional
SQL GRANT/REVOKE/DENY permissions can be handle at object and column level, but tedious and complex. Here is where the LLM can help you to firewall
that access.
Let’s be clear — the technology is nothing short of remarkable. Large language models can interpret natural language with context, flexibility, and even nuance, generating complex SQL queries that would take a human several minutes to write. The potential for productivity and accessibility is immense. But the real challenge lies in governance, not capability.
Just like you wouldn’t connect a server to the internet without a firewall, you shouldn’t connect a powerful LLM directly to your production database without clearly defining what’s allowed — and what’s not. The model doesn’t inherently know which queries are safe, appropriate, or aligned with business logic. It will do its best to satisfy the user’s request — even if that means generating expensive joins, accessing sensitive tables, or misusing internal metrics.
The solution is to guide the model’s behavior by narrowing its scope. Instead of allowing it to generate arbitrary SQL, you can configure it to select from a curated set of stored procedures, views, or parameterized templates — much like setting firewall rules. This way, you preserve the usability and flexibility of natural language while ensuring that queries stay within safe and efficient boundaries.
This is how the application flow looks like:
- The user asks a question in natural language.
- The application asks to the LLM to
translate
the question into SQL. - The application runs the SQL query in the database and returns the results to the user.
The options I explore in this article lies in the point 2). The goal is not to delegate to the application logic the responsibility of validating the SQL query, but to use the LLM to generate a SQL query that is already validated and safe to run in the database. Nothing new, this is how Text-To-SQL LLMs work.
Option 1: Instruct the LLM on What It Cannot Do
In this approach, I use prompt engineering, system messages, and fine-tuning to explicitly tell the model what is off-limits. For example, you might instruct:
- “Do not access customer personal information.”
- “Avoid writing queries with
CROSS JOIN
or unfilteredSELECT *
.” - “Never query tables outside the
reporting
schema.”
This is similar to writing policies for a user — you’re trying to teach the LLM what not to do. While it gives the model some flexibility, it relies heavily on the model’s ability to follow instructions, which can be unpredictable, especially under pressure or ambiguous inputs.
Challenge this approach with a real-world example: if a user asks, “Show me all customer data,” the model might interpret this as a valid request and generate a query that violates your security policies. The model may not understand the implications of its actions, leading to potential data leaks or performance issues.
On top of that, it is challenging to control how JOINs are performed, what and how data is filtered, and ence the performance of the query. The model may generate a query that is not only inefficient but also potentially harmful to the database.
This approach is akin to a blacklist: you define what’s forbidden, but the model can still operate outside those boundaries if it misinterprets the instructions or is prompted in a way that bypasses them.
Option 2: Limit the Engine to Only What It Can Do
In this stricter approach, you invert the logic: instead of telling the LLM what it can’t do, you only let it choose from what it can do. The model doesn’t generate raw SQL — it selects or fills parameters for predefined stored procedures, views, or query templates. Anything outside of this whitelist is simply not possible. This option allows you to define a library of stored procedures that encapsulate the business logic and data access patterns you want to expose. By using stored procedures, you can enforce security, performance, and compliance requirements at the database level. The LLM acts as a copilot, guiding users to select the right procedure and fill in the parameters, rather than generating raw SQL.
Additionally, you can use RBAC (Role-Based Access Control) to restrict access through the stored procedures, allowing a simple and effective way to manage permissions. This means that even if a user has access to the database, they can only execute the stored procedures they are authorized to use. In solution 1, RABC can be used to, but at table or view level, which is more granular than stored procedures.
This approach is more structured and predictable. It allows you to define the exact operations that users can perform, ensuring that they cannot bypass security or performance constraints. The LLM can still provide a conversational interface, but it operates within a controlled environment where all queries are pre-approved and optimized.
This mirrors how a firewall works: allow only approved traffic, and block everything else by default.
Options Comparison: Guarding Text-to-SQL with LLMs
This table summarizes the key differences between the two approaches:
The firewall analogy
The process needs to be redesigned. Instead of allowing the LLM to generate arbitrary SQL, you can configure it to select from a curated set of stored procedures, views, or parameterized templates — much like setting firewall rules. This way, you preserve the usability and flexibility of natural language while ensuring that queries stay within safe and efficient boundaries.
This means that in the use cases design phase, new activities will be added. These activities are:
Identify the objects needed for the use case
When creating use cases, you need to identify the objects needed for the use case. This normally includes tables, and views. Here you need colaboration from the data modeling team. Tools like database diagrams would help too.
Build the stored procedure
In this step is where you need to put your database experts. By using the stored procedure, your team will be able to define the business logic and data access patterns you want to expose. By using stored procedures, you can enforce security, performance, and compliance requirements at the database level. The LLM acts as a copilot, guiding users to select the right procedure and fill in the parameters, rather than generating raw SQL.
Describe the stored procedure to allow the LLM to use it
Probably this is the key step in this approach. You need to describe the stored procedure in a way that the LLM can understand. This means that you need to provide a clear and concise description of what the stored procedure does, what parameters it takes, and what it returns. This is where you need to put your prompt engineering skills to work. By practicing prompt engineering, your team will have real control on the generated SQL queries. My suggestion is that this activity should be done by the same team that is doing the stored procedure. This way, you will have a better understanding of what the LLM needs to know to generate the SQL queries. It will be fun for them learning how they can influence the LLM!
Practical use case with Amazonic Traders database
In this example, I will use the Amazonic Traders database, which is a sample database that contains data about products, customers, and orders. I will create a stored procedure that retrieves the total sales for a given product category and year. It is based on the TPC-H benchmark, which is a well-known benchmark for testing database systems. You can download the dataset and schema definition:
Creating the stored procedure
If you are a developer, the stored procedure is the interface that you will use to interact with the database. It is a set of SQL statements that are stored in the database and can be executed by calling the procedure name. The stored procedure can take parameters, which are values that are passed to the procedure when it is called. The stored procedure can also return values, which are the results of the SQL statements.
For this publication I have created 3 stored procedures:
- get_orders_in_period
- get_customers_agg_in_period
- get_clients_with_orders_in_period
All of them have date period as arguments, and client id as optional argument. As SQL-Expert note, when the client is not provided, the Transact-SQL will not include the client id in the WHERE clause. This is a good practice, as it allows the stored procedure to be more flexible and reusable. The stored procedure will return all the orders, customers or clients in the specified date range.
This is the code:
if exists (select * from sys.objects where name = 'get_orders_in_period' and type = 'P')
drop procedure get_orders_in_period;
GO
create procedure get_orders_in_period
@from_date date,
@to_date date,
@customer_id nchar(5) = null
as
begin
if @from_date is null or @to_date is null
begin
raiserror('from_date and to_date cannot be null', 16, 1);
return;
end
if @customer_id is null
select o_orderkey, o_custkey, o_totalprice, o_orderdate, c.c_name
from orders join customer c on o_custkey = c_custkey
where o_orderdate between @from_date and @to_date;
else
select o_orderkey, o_custkey, o_totalprice, o_orderdate, c.c_name
from orders join customer c on o_custkey = c_custkey
where o_orderdate between @from_date and @to_date
and o_custkey = @customer_id;
end;
GO
if exists (select * from sys.objects where name = 'get_customers_agg_in_period' and type = 'P')
drop procedure get_customers_agg_in_period;
GO
create procedure get_customers_agg_in_period
@from_date date,
@to_date date,
@customer_id nchar(5) = null
as
begin
if @from_date is null or @to_date is null
begin
raiserror('from_date and to_date cannot be null', 16, 1);
return;
end
if @customer_id is null
select c.c_custkey, c.c_name, sum(o.o_totalprice) as total_spent
from orders o join customer c on o.o_custkey = c.c_custkey
where o.o_orderdate between @from_date and @to_date
group by c.c_custkey, c.c_name;
else
select c.c_custkey, c.c_name, sum(o.o_totalprice) as total_spent
from orders o join customer c on o.o_custkey = c.c_custkey
where o.o_orderdate between @from_date and @to_date
and o.o_custkey = @customer_id
group by c.c_custkey, c.c_name;
end;
GO
if exists (select * from sys.objects where name = 'get_clients_with_orders_in_period' and type = 'P')
drop procedure get_clients_with_orders_in_period;
GO
create procedure get_clients_with_orders_in_period
@from_date date,
@to_date date,
@customer_id nchar(5) = null
as
begin
if @from_date is null or @to_date is null
begin
raiserror('from_date and to_date cannot be null', 16, 1);
return;
end
if @customer_id is null
select c.*
from customer c
where exists (
select * from orders o
where o.o_custkey = c.c_custkey
and o.o_orderdate between @from_date and @to_date
);
else
select c.*
from customer c
where exists (
select * from orders o
where o.o_custkey = c.c_custkey
and o.o_orderdate between @from_date and @to_date
)
and c.c_custkey = @customer_id;
end;
Describing the stored procedures for the LLM
If you want the LLM to build a call to the stored procedure, you need to describe the stored procedure in a way that the LLM can understand. This means that you need to provide a clear and concise description of what the stored procedure does, what parameters it takes, and what it returns. This is where you need to put your prompt engineering skills to work. By practicing prompt engineering, your team will have real control on the generated SQL queries. My suggestion is that this activity should be done by the same team that is doing the stored procedure. This way, you will have a better understanding of what the LLM needs to know to generate the SQL queries. It will be fun for them learning how they can influence the LLM!
For these 3 stored procedures this is the description I have included:
----
stored procedure name = get_orders_in_period
summary = This procedure retrieves orders within a specified date range.
If a customer ID is provided, it filters the results to that customer.
If no customer ID is provided, it retrieves all orders in the date range.
Parameters:
@from_date: The start date of the period.
@to_date: The end date of the period.
@customer_id: The ID of the customer (optional).
----
stored procedure name = get_customers_agg_in_period
summary = This procedure retrieves aggregated customer data within a specified date range.
If a customer ID is provided, it filters the results to that customer.
If no customer ID is provided, it retrieves all customers in the date range.
Parameters:
@from_date: The start date of the period.
@to_date: The end date of the period.
@customer_id: The ID of the customer (optional).
----
stored procedure name = get_clients_with_orders_in_period
summary = This procedure retrieves clients information with orders within a specified date range.
If a customer ID is provided, it filters the results to that customer.
If no customer ID is provided, it retrieves all clients in the date range.
Parameters:
@from_date: The start date of the period.
@to_date: The end date of the period.
@customer_id: The ID of the customer (optional).
----
Building the system message for the LLM call
The system message is the message that you send to the LLM to tell it what to do. It is the first message that you send to the LLM, and it sets the context for the conversation. The system message should be clear and concise, and it should include all the information that the LLM needs to understand what you want it to do.
metadata = load_file("./setup/05_storedprocedures_description.md")
system_message = f"""
RETURN Transact-SQL stored execution sentence.
You can use ONLY these stored procedures:
[ DESCRIPTION STARTS HERE ]
{metadata}
[ DESCRIPTION ENDS HERE ]
If you don't have enough information to create the Transact-SQL sentence, just say: "I don't know".
Do not make comments. What you retreive will be executed in the database.
Do not use any other stored procedure."""
user_prompt = f"""
{user_question}
"""
stored_procedure_call = get_chat_completion_from_platform("openai", system_message, user_prompt, 0.75, 10000)
Note that I am reading from a file the description of the stored procedures. This is a good practice, as it allows you to keep the description of the stored procedures in a separate file, and it allows you to change the description without changing the code. You can also use this file to generate documentation for the stored procedures.
From the extensibility perspective, when you add new stored procedures, you will not need to add any single python code!! You’d change the .md file (or whatever data cache you’d use), and boom! the LLM is ready to use new stored procedures.
The result
The user asks for the “orders between feb 1992 and nov 1992”.
That returns this result:
Note that the LLM builds the execution sentence for the stored procedure, and it uses the parameters that are defined in the stored procedure. The LLM does not generate any dynamic SQL code, it just builds the execution sentence for the stored procedure.
If you ask to do anything not described by the stored procedures like removing data from a table, you’d get an answer like this:
Note, that the system message had the instructions to answer “I don’t know” if the LLM does not have enough information to build the execution sentence. This is a good practice, as it allows you to avoid errors in the execution of the stored procedure. So, you should code in your frontend application to handle this case. You can also use this message to build a log of the questions that the LLM could not answer. This is a good practice, as it allows you to improve the system message and the stored procedures.
In this case, I handle the exception like this:
stored_procedure_call = get_chat_completion_from_platform("openai", system_message, user_prompt, 0.75, 10000)
if stored_procedure_call.lower() == "i don't know":
st.error("Please, rephrase your ask. What you are asking is not permitted, or there is no sufficient context.")
else:
df = execute_stored_procedure(source, stored_procedure_call, True)
Conclusion
The evolution of Text-to-SQL systems from rule-based approaches to LLMs has opened up exciting possibilities for natural language querying. However, with great power comes great responsibility. The challenge lies in balancing flexibility and control, ensuring that users can explore data without compromising security or performance.
It can be challenging to implement these systems effectively, requiring careful consideration of user needs and system capabilities. The two approaches discussed here — instructing the LLM on what it cannot do versus limiting it to predefined stored procedures — offer different trade-offs in terms of governance, flexibility, and user experience. The second approach, using stored procedures, provides a more structured and predictable environment for users while still allowing for natural language interaction. By defining clear boundaries and leveraging the power of LLMs, organizations can unlock the potential of Text-to-SQL systems while maintaining control over their data.
Resources
- SQL Server Stored Procedures
Official Microsoft documentation on how to create and use stored procedures in SQL Server.
https://learn.microsoft.com/en-us/sql/relational-databases/stored-procedures/stored-procedures-database-engine - Getting Started with Generative AI Capabilities in SQL Server and Azure SQL
Blog post by Microsoft on how to integrate generative AI into SQL Server and Azure SQL securely.
https://www.microsoft.com/en-us/sql-server/blog/2024/06/26/getting-started-with-delivering-generative-ai-capabilities-in-sql-server-and-azure-sql/ - Text-to-SQL in the Wild: A Survey on Recent Developments
A comprehensive academic paper covering recent techniques, challenges, and benchmarks in Text-to-SQL systems.
https://arxiv.org/abs/2208.13629 - Harnessing Large Language Models for Text-to-SQL Generation: A Survey
A detailed overview of LLM-enhanced approaches for Text-to-SQL tasks, including prompt design and fine-tuning strategies.
https://arxiv.org/abs/2410.06011