Building Modern Text-to-SQL Systems with GenAI: Lessons from LinkedIn, Uber, and Open Source
Introduction
In my last article — How to Safely Use LLMs for Text-to-SQL with Stored Procedures
, I explored the use of large language models (LLMs) for Text-to-SQL systems, focusing on balancing flexibility and control by comparing two approaches: instructing the LLM on what it cannot do versus limiting it to predefined stored procedures. It highlights the benefits of stored procedures for ensuring security, governance, and performance while enabling natural language interaction, providing practical examples and implementation strategies using the Amazonic Traders database.
One of the key challenges was setting up the interfaces and descriptions for the stored procedures to make them available to the LLM via system prompting. I called setting up a firewall
or guardrails
around the LLM, in order to guide or force the data-usage pattern that I wanted to expose.
For department projects, PoCs, or small projects, it is easy to envision, however as the databases, schemas, and tables in the data model increase, it is easy to end-up having one stored procedure for each use case you want to expose. This is not a scalability issue, but a maintenance issue. You will have to maintain the stored procedures and the LLM prompt at the same time. With GenAI, LLMs and MCP Tools you can automate the creation of the stored procedures, but you will still have to maintain the LLM prompt. Not very nice if you seek the KISS principle (Keep It Simple, Stupid)!
On the other hand, I emphasized the perspective of DBAs wanting to maintain control of the system, ensuring that LLMs are not allowed to generate SQL queries on their own. This is a legitimate point, especially for organizations that are not very resilient to the risks associated with LLMs. I also mentioned that this approach could be a good starting point for organizations that are not yet ready to fully embrace LLMs for Text-to-SQL.
Now, let’s let’s explore the innovation perspective. I will take the perspective of someone (person or corporation) that probably is closer to innovation and experimentation. I am not saying that DBAs don’t want to innovate, but they are usually more focused on governance, maintenance, optimization, and control. No offense, in my veins there is still a lot of DBA blood!
TL;DR
- Learn from industry leaders: Projects like LinkedIn’s SQL Bot and Uber’s QueryGPT show that high-quality Text-to-SQL systems require more than prompting — they need retrieval, ranking, validation, optimization, and rich UX strategies.
- Reuse and adapt existing work: Most of the core components needed for Text-to-SQL are already available through open-source projects (like Vanna) and industry best practices. Innovation today is about smart assembly, not reinventing everything.
- Evolve the role of SQL experts: In modern Text-to-SQL systems, SQL SMEs shift from manually writing queries to optimizing and validating LLM-generated outputs, ensuring quality, security, and compliance.
- Prioritize security with RLS: Native dataset-level access is not enough — enterprise-grade systems must enforce Row-Level Security (RLS) to control data visibility at the row level, leveraging built-in database capabilities like PostgreSQL RLS, SQL Server RLS, or Oracle VPD.
- Monitor and log usage: Text-to-SQL systems should monitor and log interactions systematically to detect misuse, ensure compliance, and enable continuous system improvement. Use monitoring as source for improvement (re-train).
- Do not expose raw SQL directly: Always validate and, if necessary, post-process generated SQL before execution; avoid exposing raw LLM-generated queries directly to end-users to reduce risk of injection, leakage, or execution errors.
- Key system components identified: Successful Text-to-SQL systems should integrate workspaces, intent classification, certified metadata, enhanced retrieval (RAG), schema management, RLS enforcement, validation loops, rich UI elements, monitoring, and continuous benchmarking.
- Costs Management: Control inference costs proactively by optimizing prompts, monitoring per-user usage, exposing query costs, and enforcing quotas to ensure sustainable Text-to-SQL operations at scale.
- User expectations: Clearly communicate system capabilities and limitations to users, encouraging validation and responsible usage to build trust and maximize the effectiveness of Text-to-SQL tools.
Adopt and learn from what is already built
90% of the things you see are iterations or combinations of existing ideas. The OSS communities are transforming the world. There are a lot of great projects on GitHub, Hugging Face, and other platforms. There are a lot of relevant publication on this subject in Medium, LinkedIn, and other blogs. Just a few quotes from people that know what they say:
Projects for ideation, investigate, ideas piping, and inspiration
1) Vanna — 15.3k stars in GitHub
Chat with your SQL database. Accurate Text-to-SQL Generation via LLMs using RAG.
Vanna is an MIT-licensed open-source Python framework that enables accurate Text-to-SQL generation using Retrieval-Augmented Generation (RAG). It supports various LLMs and vector stores, allowing you to train a model on your data and ask questions that return SQL queries ready to run on your database.
High level architecture
Key points
- Able to represent the data: Different ways to represent the query results (e.g., tables, charts, dataframes, plotly, etc.).
- Metadata vectorization: schema, queries, results, and documentation. This is great for large database systems with many tables and complex schemas where context is huge.
- RAG model: It uses a vector store to retrieve relevant information from the database schema and query history, which is then used to generate SQL queries using an LLM.
- Multiple technologies are supported:
- LLM: OpenAI, Cohere, Llama.
- SQL database: SQL Server, Oracle, Duckdb, Postgres, MySQL.
- Vector store: Pinecone, Weaviate, Chroma.
2) LinkedIn Engineering: Practical Text-to-SQL for Data Analytics
LinkedIn created SQL Bot, an AI-driven assistant embedded within their DARWIN data science platform. It converts natural language questions into SQL queries, empowering employees across departments to independently access the insights they need — all while respecting access controls. You can find the detailed publication here.
This project stands as an excellent real-world example of how to apply LLMs for Text-to-SQL in a production-grade environment. It offers a wealth of practical ideas and patterns that are critical if you aim to build professional, scalable systems.
Based on the publication, I recommend two approaches:
- Incorporate the ideas directly: Many of the strategies outlined can be readily adapted into your own solutions (see the Key Points section below).
- Be aware of the gaps if you cannot: Even if you’re unable to implement all ideas, understanding what each concept addresses is crucial. If a particular feature is missing, you can explicitly acknowledge it when designing proposals or contracts, ensuring transparency and setting clear expectations with your clients.
Key points
Strategy 1: metadata and retrieval quality
LinkedIn extends RAG with embedding-based retrieval (EBR) focused on certified metadata. EBR is a retrieval method that uses embeddings to represent the data and queries, allowing for more accurate and efficient retrieval of relevant information. This approach is particularly useful for large datasets with complex schemas, where traditional keyword-based retrieval methods may struggle to find relevant information.
Strategy 2: Knowledge Graph and LLMs for Ranking, Writing, and Self-Correction
Once a set of candidate tables is retrieved through filtering and Embedding-Based Retrieval (EBR), LinkedIn enhances the Text-to-SQL process with a knowledge graph and specialized LLM pipelines. They use structured metadata, user input, query logs, and certified examples to build a semantic network. Then, LLMs rank tables and fields, plan SQL queries step-by-step, and apply automatic validation and self-correction. This multi-layered approach ensures that the generated SQL is both accurate and contextually appropriate, even for complex or ambiguous user queries.
Check the publication for more details on the implementation of these strategies. It is rich in flowcharts and diagrams that help to understand how LinkedIn did the implementation.
Strategy 3: User Experience with Rich Chat Elements
LinkedIn emphasizes user experience as critical for driving adoption of their SQL Bot. By deeply integrating the bot into existing workflows, supporting multiple interaction styles, providing rich feedback elements, and handling access control gracefully, they created a system that users actually enjoy and trust. This strategy shows that success is not just about technical accuracy — it’s about guiding, informing, and empowering the user throughout the process.
I do care a lot on how clients use the tools, and I am really impressed with the way LinkedIn has implemented the user experience. It goes beyond the typical chat interface where user writes what he seeks, but it is rich in elements that help user to reuse existing queries showing usage statistics; it would help to show the user users profile because normally people tend to use data and queries in clusters of teams, departments, or projects.
Strategy 4: Options for User Customization
To make SQL Bot more adaptable and scalable, LinkedIn empowers users to customize its behavior directly without needing engineering support. By offering control over datasets, domain-specific instructions, and example queries, they allow teams to tailor SQL Bot to their needs, ensuring better relevance, higher trust, and faster adoption.
Strategy 5: Ongoing Benchmarking
To ensure SQL Bot continuously improves in quality and performance, LinkedIn developed a rigorous, customized benchmarking framework. By using tailored evaluation sets, clear metrics, and human-plus-LLM judging, they maintain high standards of accuracy, reduce hallucinations, and guide model evolution over time. Benchmarking is treated not as a one-time task, but as a living process integrated into product development.
This figure shows the value of adding examples and descriptions to the metadata. In my view, it is a positive surprise because I thought that providing the database schema was sufficient:
Uber: QueryGPT — Natural Language to SQL Using Generative AI
At Uber, crafting SQL queries to access and manipulate vast datasets requires not only proficiency in SQL syntax but also a deep understanding of internal data models. This complexity often leads to significant time investment, especially for operations managers and data scientists who are among the most frequent users of Uber’s data platform.
Uber developed QueryGPT, an AI-powered tool that translates natural language prompts into SQL queries. By leveraging large language models (LLMs), vector databases, and similarity search, QueryGPT aims to streamline the query generation process, thereby enhancing productivity across various teams.
The publication, is not as detailed as the LinkedIn Engineering Team publication, but it has bright details on their learnings.
Data privacy and security: Row-Level Security (RLS)
Both LinkedIn’s SQL Bot and Uber’s QueryGPT implement dataset-level access controls — verifying whether a user is authorized to query a particular table — but they do not mention applying row-level security policies dynamically within generated queries.
A few weeks ago, my friend Davide Mauri highlighted to me that this is a critical consideration when designing Text-to-SQL systems, especially given the growing importance of data privacy, compliance, and security requirements.
In many organizations, particularly those operating in regulated industries such as finance, healthcare, and logistics, Row-Level Security (RLS) is not optional — it is a mandatory architectural feature.
RLS enables different users to access different subsets of data within the same table, depending on their permissions, ensuring fine-grained data control.
Fortunately, most major commercial database systems implement RLS natively, including PostgreSQL, SQL Server, and Oracle (through mechanisms like PostgreSQL RLS, SQL Server Row-Level Security, and Oracle VPD). Leveraging these built-in capabilities significantly strengthens the overall security posture while simplifying enforcement.
For Text-to-SQL systems targeting enterprise environments, it is strongly recommended to integrate RLS enforcement as part of the core architecture. Several approaches can be considered:
- Database-native RLS policies: Using the database engine’s built-in features to automatically filter rows based on user identity or attributes.
- Dynamic query rewriting: Automatically embedding additional
WHERE
clauses into generated SQL queries, customized according to user roles or permissions. - Prompt-injected constraints: Instructing the LLM at prompt time to always include specific row-level filters based on the user’s context.
Regardless of the method, ensuring that users only access data they are authorized to see is essential for compliance, security, and maintaining organizational trust.
Setting User Expectations
An often overlooked but critical aspect of deploying Text-to-SQL systems is managing user expectations. Clearly communicating the capabilities, limitations, and proper use of the system can significantly improve adoption, trust, and overall effectiveness.
Cost Management for Sustainable Operations
Beyond functionality, security, and governance, cost management is a fundamental pillar when designing enterprise-grade Text-to-SQL systems. Systems that rely on LLMs introduce a new operational dimension: inference cost, which scales with usage volume, prompt size, and system complexity.
If not properly addressed, costs can escalate rapidly as adoption grows, especially when working with dynamic data models and complex query patterns. Just as we enforce data access governance (through mechanisms like Row-Level Security), we must equally enforce operational governance to ensure financial sustainability.
The table below summarizes key factors that influence cost and strategies to mitigate them:
Conclusion
Understanding how SQL engines operate is fundamental to building effective Text-to-SQL systems. Historically, SQL experts have translated business requirements into optimized SQL queries to efficiently retrieve and manipulate data. For projects of this nature, having SQL Subject Matter Experts (SMEs) is not optional — it is essential. However, their role is evolving: rather than focusing solely on writing or tuning queries manually, SMEs will increasingly shift towards optimizing and validating the outputs generated by LLMs.
Relying entirely on an LLM to autonomously decide which queries to execute demands a significant leap of faith. Robust validation mechanisms, expert oversight, and systemic guardrails must be integral parts of any responsible deployment.
Personally, I am very excited to see how LLMs will continue to evolve within the Text-to-SQL space. I would love to contribute to projects leveraging these technologies, influencing LLM behavior towards generating cleaner, more efficient, and more secure SQL code. The projects highlighted above provide excellent examples of what is achievable — but they are not exhaustive. The field is rapidly advancing, and there remains much room for exploration and innovation.
After studying these implementations, my perspective on the real potential of Text-to-SQL systems powered by generative AI has evolved significantly.
If you are planning to build a Text-to-SQL system, the following list of core components may serve as a practical and strategic guide: