Semantic Search in SQL Azure: Practical Example for a Customer Support Department

Eladio Rincón Herrera
19 min readJan 9, 2025

--

Image created with ChatGPT

Back in September, I worked with a few friends on a side project to explore Retrieval-Augmented Generation (RAG) through the Football Copilot project. While experimenting with Postgres, SQL Azure, and a custom vector CLR type, I realized the importance of exploring practical use cases for corporate environments.

This led me to shift my focus on other use cases. I see corporate clients, keen on innovative solutions, but at the same time with conservative mindsets. They often prioritize stable, well-established technologies and adopt a cautious approach to experimentation. This conservative mindset is common among corporate clients: if you can add a small plugin or feature to an existing, trusted system that adds real value, becomes easier to test and eventually move to production.

For this article, I have chosen a distribution company. Their customer support department handles feedback and satisfaction analysis, making it a relatable use case for industries with similar workflows. I will demonstrate how to implement semantic searches with vector datatypes in SQL Azure Database, focusing on server-side code, combined use cases, and showing the challenges as they show-up. Although the technology is relatively new to SQL Azure Databases Public Preview — nov.24, and SQL Server 2025; however, semantic search has been available in other platforms for several years.

TL;DR

  • Semantic searches rely on the numerical representation of text strings as arrays of vectors, commonly referred to as embeddings. These embeddings are compared using mathematical algorithms like cosine or euclidean distance to calculate their similarity.
  • The vector datatype is essentially an array of numbers stored using the real datatype. A vector(1024) type is 1024 instances of the real datatype. SQL Server's data page size is 8KB. If the row size — including vector data — exceeds 8KB, SQL Server will use the row-overflow feature to store the vector data in a separate page. SQL Azure supports a maximum of 1998 dimensions.
  • If the row size fits within an 8KB page, you may end up with a low ratio of rows per page, and internal page fragmentation. For instance, if each row is 5KB in size, you would have only one row per page, wasting approximately 3KB per page. In sequential table scans, this cause non-efficient number of pages read.
  • Using vector similarity search is very easy to add to your code. Once added the datatype, you need to calculate the embedding for the text to search, and then execute the similarity search UDF. Following good coding practices, you only need: 1 stored procedure, 1 UDF, and a SELECT pattern.
  • SQL Azure will soon include DiskANN indexes for vector data. If you REALLY need to index your vector data now, consider other products. However, evaluate data access patterns for your use case, because you might not need to index the vector. Best practices using nonclustered indexes with included columns can reduce the need to index the vector data directly in a few scenarios.

How SQL Azure stores vectors

SQL Azure’s vector data type is specifically designed to store high-dimensional vector data efficiently, a common requirement in machine learning and similarity search applications. Each vector is stored in an optimized binary format, where every element is represented as a single-precision (4-byte) floating-point value. To simplify integration with client applications, vectors are exposed as JSON arrays, allowing seamless interaction while ensuring compact and efficient storage.

For further details, refer to these links: Vector Data Type — SQL Server, Pages and Extents Architecture Guide — SQL Server, and row-overflow feature.

Although the official documentation does not explicitly cover how SQL Azure stores vector data internally, tests using sys.dm_db_index_physical_stats DMV suggest that vector type behaves like varchar, nvarchar, or varbinary.

The use case

I have built a use case for a Customer Support Department that handles the quality of the products shipped for a given company. As sample, The department’s primary responsibility is to resolve customer issues efficiently by reviewing order details and customer interactions. The challenge for such departments lies in managing both traditionally tabular data, like orders, amounts, and customer details, and unstructured data, such as call center conversations. Efficiently combining these data types is essential to streamline operations, identify patterns, and improve the resolution of customer claims.

Creating RAG_Orders table with a vector datatype

I am using the Northwind database as source of data; objects creation script can be downloaded from github. If you want to test the scripts, you need to create the database objects first in a SQL Azure:

-- Drop the table if it exists
DROP TABLE IF EXISTS RAG_Orders;
GO
-- Create the table with a vector data type
CREATE TABLE RAG_Orders (
id INT PRIMARY KEY,
customer_id NCHAR(5),
order_date DATE DEFAULT GETDATE(),
sales_total DECIMAL(10,2) DEFAULT 0.0,
product_details_json NVARCHAR(MAX) NULL,
conversation NVARCHAR(MAX) NULL,
embedding VECTOR(1024) NULL
);
GO

-- Insert 100 rows into the table
INSERT INTO RAG_Orders (id, customer_id, order_date, sales_total, product_details_json)
SELECT top 100
o.OrderID, CustomerID, OrderDate, od.Subtotal,
(
SELECT OrderID, ProductID, ProductName, Quantity, ExtendedPrice LineTotal
FROM [dbo].[Order Details Extended] t2
WHERE t2.OrderID = o.OrderID
FOR JSON PATH
) AS ProductDetails
FROM Orders o
inner join [Order Subtotals] od on o.OrderID = od.OrderID
order by o.OrderID;

if you query the RAG_Orders, you’ll have this resultset:

List of Orders
-- query RAG_Orders
SELECT * FROM dbo.RAG_Orders;

The product_details_json column contains the json representation of a few columns from the Order Details table:

[
{
"OrderID": 10248,
"ProductID": 11,
"ProductName": "Queso Cabrales",
"Quantity": 12,
"LineTotal": 168.0000
},
{
"OrderID": 10248,
"ProductID": 42,
"ProductName": "Singaporean Hokkien Fried Mee",
"Quantity": 10,
"LineTotal": 98.0000
},
{
"OrderID": 10248,
"ProductID": 72,
"ProductName": "Mozzarella di Giovanni",
"Quantity": 5,
"LineTotal": 174.0000
}
]

I will use Azure OpenAI ChatGPT to create/simulate Call Center Support messages. The response will be saved in the conversation column in the RAG_Orders table. This is the stored procedure:

Chat_completion stored procedure

DROP PROCEDURE IF EXISTS [dbo].[get_chat_completion];
GO

CREATE PROCEDURE [dbo].[get_chat_completion]
(
@model VARCHAR(MAX),
@system_message NVARCHAR(MAX),
@user_prompt NVARCHAR(MAX),
@temperature FLOAT,
@max_tokens INT,
@max_attempts INT = 3,
@chat_completion NVARCHAR(MAX) OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRY
DECLARE @retval INT, @response NVARCHAR(MAX);
DECLARE @url NVARCHAR(MAX);
DECLARE @payload NVARCHAR(MAX);
DECLARE @attempts INT = 0;
DECLARE @wait_time_ms INT = 5000;
IF @model IS NULL OR @model = ''
BEGIN
RAISERROR('Model parameter must be provided.', 16, 1);
RETURN;
END
IF @system_message IS NULL OR @system_message = ''
BEGIN
RAISERROR('System message must be provided.', 16, 1);
RETURN;
END
IF @user_prompt IS NULL OR @user_prompt = ''
BEGIN
RAISERROR('User prompt must be provided.', 16, 1);
RETURN;
END
IF @temperature IS NULL OR @max_tokens IS NULL
BEGIN
RAISERROR('Temperature and max tokens must be provided.', 16, 1);
RETURN;
END
IF @temperature < 0 OR @temperature > 1
BEGIN
RAISERROR('Temperature must be between 0 and 1.', 16, 1);
RETURN;
END
IF @max_tokens <= 0
BEGIN
RAISERROR('Max tokens must be greater than 0.', 16, 1);
RETURN;
END
SET @url = N'https://<your-endpoint>.openai.azure.com/openai/deployments/' + @model + '/chat/completions?api-version=2023-05-15';
SET @payload = JSON_OBJECT('messages': JSON_ARRAY(
JSON_OBJECT('role': 'system', 'content': @system_message),
JSON_OBJECT('role': 'user', 'content': @user_prompt)),
'temperature': @temperature,
'max_tokens': @max_tokens);
WHILE @attempts < @max_attempts
BEGIN
BEGIN TRY
SET @attempts = @attempts + 1;
EXEC dbo.sp_invoke_external_rest_endpoint
@url = @url,
@method = 'POST',
@payload = @payload,
@credential = 'https://<your-endpoint>.openai.azure.com',
@response = @response OUTPUT;
SET @chat_completion = JSON_VALUE(@response, '$.result.choices[0].message.content');
IF (NOT @chat_completion IS NULL OR @chat_completion <> '')
BEGIN
BREAK;
END
IF JSON_VALUE(@response, '$.result.error.code') = '429'
BEGIN
DECLARE @retryAfter NVARCHAR(10);
SET @retryAfter = JSON_VALUE(@response, '$.response.headers."Retry-After"');
-- Convertir el valor de Retry-After a un número entero
DECLARE @retryAfterSeconds INT;
SET @retryAfterSeconds = CAST(@retryAfter AS INT) + 1; -- Sumamos 1 segundo
-- Calcular horas, minutos y segundos para el formato HH:MM:SS
DECLARE @hours NVARCHAR(2), @minutes NVARCHAR(2), @seconds NVARCHAR(2);
SET @hours = RIGHT('0' + CAST(@retryAfterSeconds / 3600 AS NVARCHAR), 2);
SET @minutes = RIGHT('0' + CAST((@retryAfterSeconds % 3600) / 60 AS NVARCHAR), 2);
SET @seconds = RIGHT('0' + CAST(@retryAfterSeconds % 60 AS NVARCHAR), 2);
DECLARE @delay NVARCHAR(8);
SET @delay = @hours + ':' + @minutes + ':' + @seconds;
PRINT 'Rate limit error detected. Retry-After: ' + @delay + '. Attempt ' + CAST(@attempts AS NVARCHAR(10)) + '.';
WAITFOR DELAY @delay;
END
END TRY
BEGIN CATCH
THROW;
END CATCH
END
IF @attempts >= @max_attempts AND (@chat_completion IS NULL OR @chat_completion = '')
BEGIN
RAISERROR('Failed to get a valid response after multiple attempts.', 16, 1);
RETURN;
END
SET @chat_completion = REPLACE(REPLACE(@chat_completion, CHAR(13) + CHAR(10) + CHAR(13) + CHAR(10), CHAR(13) + CHAR(10)), CHAR(13) + CHAR(10) + CHAR(10), CHAR(13) + CHAR(10));
END TRY
BEGIN CATCH
DECLARE @ErrorMessage2 NVARCHAR(4000) = ERROR_MESSAGE();
DECLARE @ErrorSeverity INT = ERROR_SEVERITY();
DECLARE @ErrorState INT = ERROR_STATE();
RAISERROR(@ErrorMessage2, @ErrorSeverity, @ErrorState);
END CATCH
END

Later, I will use a stored procedure called get_embeddings to calculate the embedding for each conversation. This is the code:

Get_embeddings stored procedure

DROP PROCEDURE IF EXISTS [dbo].[get_embeddings_1024]
GO

CREATE PROCEDURE [dbo].[get_embeddings_1024]
(
@model VARCHAR(MAX),
@text NVARCHAR(MAX),
@embedding VECTOR(1024) OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @retval INT, @response NVARCHAR(MAX);
DECLARE @url VARCHAR(MAX);
-- DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text);
DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text, 'dimensions': 1024);
-- Set the @url variable with proper concatenation before the EXEC statement
SET @url = 'https://<your-endpoint>.openai.azure.com/openai/deployments/' + @model + '/embeddings?api-version=2023-03-15-preview';
EXEC dbo.sp_invoke_external_rest_endpoint
@url = @url,
@method = 'POST',
@payload = @payload,
@credential = [https://<your-endpoint>.openai.azure.com],
-- @headers = '{"Content-Type":"application/json", "api-key":"<api-key"}',
@response = @response OUTPUT;
-- Use JSON_QUERY to extract the embedding array directly
DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');

SET @embedding = CAST(@jsonArray as VECTOR(1024));
END

Adding fake comments for Order shipping using OpenAI

In order to balance between positive a negative feedback I am ussing different system messages. See the @system_message_negative, and @system_message_positive variables to understand how to influence the requests. This is the code:

-- using chat_completion to create feedback using the chat completion API
DECLARE @model VARCHAR(MAX) = 'gpt-4o-mini';
DECLARE @system_message_negative NVARCHAR(MAX) =
N'System Message:
You are simulating harsh and critical customer feedback for a recent order. The feedback should reflect strong dissatisfaction with the products and delivery.
Focus on highlighting customer dissatisfaction with any of these topics:
1- specifics of the product listed, 2- bad shipping, 3- bad packaging, or 4- delayed delivery.
Feedback has to be clear, and short. Do not exaggerate. Bold with the facts.
Do not start with "I recently..." or "I ordered..." to avoid repetition. Be creative! Do not exagerate the feedback.
'
DECLARE @system_message_positive NVARCHAR(MAX) =
N'System Message:
You are simulating random customer feedback for a recent order. The feedback should reflect the customers experience with the products and delivery.
The feedback should highlight customer satisfaction with any of these topics:
1- specifics of the product listed, 2- good shipping, 3- good packaging, or 4- timely delivery.
Feedback has to be clear, and short. Do not exaggerate. Bold with the facts.
Do not start with "I recently..." or "I ordered..." to avoid repetition. Be creative! Do not exagerate the feedback.
'
DECLARE @user_prompt_template NVARCHAR(MAX) =
N'User Prompt:
The order included the products listed below in json format:
"
{{product_details_json}}
"
Generate random customer feedback based on this order.
The feedback should include references to specific products from the order and describe the customers experience,
such as quality level, satisfaction level, or issues encountered.'
-- iterate over the orders and generate fake feedback
DECLARE @order_id INT, @product_details_json VARCHAR(MAX), @user_prompt NVARCHAR(MAX), @chat_completion VARCHAR(MAX);
DECLARE @row_counter INT = 0, @negative_feedback_threshold INT = 0;
DECLARE order_cursor CURSOR FOR
SELECT id, product_details_json
FROM RAG_Orders
WHERE conversation IS NULL;
OPEN order_cursor;
FETCH NEXT FROM order_cursor INTO @order_id, @product_details_json;
WHILE @@FETCH_STATUS = 0
BEGIN
-- replace ' with '' to escape single quotes in the JSON string
SET @product_details_json = REPLACE(@product_details_json, '''', '''''');
SET @row_counter = @row_counter + 1;
SET @user_prompt = REPLACE(@user_prompt_template, '{{product_details_json}}', @product_details_json);
SET @negative_feedback_threshold = @row_counter % 3; -- 33% chance based on row counter
IF @negative_feedback_threshold = 0
EXEC dbo.get_chat_completion @model, @system_message_negative, @user_prompt, 0.5, 2500, 3, @chat_completion OUTPUT;
ELSE
EXEC dbo.get_chat_completion @model, @system_message_positive, @user_prompt, 0.5, 2500, 3, @chat_completion OUTPUT;
UPDATE RAG_Orders
SET conversation = @chat_completion
WHERE id = @order_id;
FETCH NEXT FROM order_cursor INTO @order_id, @product_details_json;
END
CLOSE order_cursor;
DEALLOCATE order_cursor;

Calculating the conversation embeddings

After updating the conversation column in the RAG_Orders table, I need to calculate the vector representation of the conversation. I iterate through the RAG_Orders table calling the dbo.get_embeddings_1024 stored procedure. This procedure gets as input parameter the conversation column value, and retunrs as output parameter the vector representation - the embedding. Once the embedding is generated, I update the relevant row in the table.

When working with embeddings and vectors, it is key to match the size of the vector to the assigned dimensions. For example, if the vector is set to 1024 dimensions, the generated embedding must have 1024 dimnesions too. Mismatched dimensions can lead to data integrity issues in searches or errors when converting data. Proper alignment ensures consistency and reliability in embedding-based workflows.

-- Calculate embeddings for the generated feedback

DECLARE @embedding_model VARCHAR(MAX) = 'text-embedding-3-small';
DECLARE @conversation NVARCHAR(MAX), @embedding VECTOR(1024), @order_id INT;
DECLARE embedding_cursor CURSOR FOR
SELECT id, conversation
FROM RAG_Orders
WHERE conversation IS NOT NULL AND embedding IS NULL;

OPEN embedding_cursor;
FETCH NEXT FROM embedding_cursor INTO @order_id, @conversation;

WHILE @@FETCH_STATUS = 0
BEGIN
-- Generate embeddings for the conversation
EXEC dbo.get_embeddings_1024
@model = @embedding_model,
@text = @conversation,
@embedding = @embedding OUTPUT;

-- Update the RAG_Orders table with the calculated embeddings
UPDATE RAG_Orders
SET embedding = @embedding
WHERE id = @order_id;

FETCH NEXT FROM embedding_cursor INTO @order_id, @conversation;
END

CLOSE embedding_cursor;
DEALLOCATE embedding_cursor;

Having this data as result:

-- resultset
SELECT top 10 id, customer_id, conversation, embedding FROM RAG_Orders
List of Orders

Querying the Orders table

The type of queries the RAG_Orders table will have are these:

  • Type-1: Traditional relational queries, which retrieve data such as customer_id, order_date, sale_total, and other structured attributes. These queries rely on traditional SQL Azure's indexing and storage mechanisms to ensure quick and efficient access to tabular data. This type of query is typically used in reporting, auditing, or transactional processes where structured and numeric data is the primary focus.
  • Type-2: Queries that leverages the semantic capabilities of the embedding vector column. These queries often involve searching for similar items based on the embedding data, which represents high-dimensional numerical representations of text or other data. In this case, the primary key (id), text fields like conversation, and the embedding vector itself become essential for similarity searches.

Let’s explore both types:

Type-1: Pure Relational Queries

Relational queries in the RAG_Orders table focus on retrieving structured data such as customer_id, order_date, sale_total, and other tabular attributes. These are basically the traditional SQL queries used to read, delete, update and insert data. The clasical T-SQL with use cases like CRUD, reporting, and OLTP-like processess.

I am showing some frequest the patterns:

  1. Fetch the row for an Order
-- t-sql query
SELECT id, customer_id, order_date, sales_total
FROM RAG_Orders
WHERE id = 10251

The query will issue a clustered index seek on the id column as expected. Notice that estimated Row size, which is efficient, as you are fetching normal data columns.

Clustered Index Scan

However, if you issue a SELECT *:

-- t-sql query
SELECT *
FROM RAG_Orders
WHERE id = 10251

See that the estimated row-size is 10Kb!

Estimated Row Size

This is not a bug, this is the reality! you are fetching the vector data (that is a 1024 occurrences of a double datatype), and the conversation that is a large text stores as NVARCHAR(MAX).

As learning from this: “Avoid * as much as you can. Fetch only the data you need!”.

2. Fetch the rows for a period of time

-- Fetch the rows for a period of time
SELECT id, customer_id, order_date, sales_total
FROM RAG_Orders
WHERE order_date >= '1996-07-10'
AND order_date < '1996-07-18'

For this case, the SELECT * rule mentioned earlier applies equally. If the table do not have a covering index, a partial index scan will be performed in the clustering key.

Predicate filter

If you want to improve this query, you need to create an index in the seek column (order_date) including into the leaf-level the columns in the SELECT clause; in this case: customer_id, and sales_total. Note that id is not necessary, nor order_date -- which is the indexed column.

If you are familiar with database optimization, it is not new for you, but it is more relevant in these scenarios as you will see later.

If you create the index, the execution plan for same query will use the nci_rag_orders_order_date as expected:

CREATE NONCLUSTERED INDEX nci_rag_orders_order_date
ON dbo.RAG_Orders (order_date)
INCLUDE (customer_id, sales_total)
Index Seek

3. Fetch the rows for a given customer

The explanation for point #2 applies to this query pattern too.

4. Aggregate data for a time-frame

This is a particular case that depends on the time-frame size. These queries are covered normally by non-clustered indexes like the used in point #2. If you don’t have a covering index, your decission varies depending on:

  • how big is the time-frame size: sometimes you prefere to full-scan the clustering key.
  • how expensive is to maintain the index: effient query vs expensive maintenance.

Type-2: Semantic Searches

Semantic searches, on the other hand, take advantage of the embedding vector column in the Orders table. Remember that the embedding is the numerical representation of text data. With the numerical representation, you can perform similarity search using mathematical algorithms. These queries combine the results of the searches with text data like conversation.

There are 2 types of requent patterns:

1- pure semantic search, where the data store compares numerical representation of the data with algorithms like cosine, eucliean, Negative Inner Product:

  • user provides a question.
  • LLM transform the question in embedding.
  • vector similarity search is run in the data-store.
  • matching rows are returned to the client.

2- sophisticated/combinational searchs, where in addition to point 1) the LLM is asked to build a response:

Semantic Search Flow
  • user provides a question.
  • LLM transform the question in embedding.
  • vector similarity search is run in the data-store.
  • matching rows are sent to the LLM for chat-completion.
  • result is returned to the client.
RAG Pattern

For both cases, the usage pattern is:

  • translate the request to embedding.
  • filter the dataset.
  • return resultset.

In my view, I see highly efficiente running semantic search in relational stores for cases where before running the semantic search, a filter is applied. For example: I need the Orders that got complains in a period of time. For this case, the pseudo-code would be:

  • filter orders that match a time-frame (relational search), and then
  • for the matching rows, do the vector search

This helps aswell the SQL Database adoption where creating indexes on vectors is not supported. Vector databases, PostgreSQL, CosmosDB, and Search engines like Azure Search includes this alredy, but SQL Azure dont. Microsoft announced in Microsoft Ignite vector indexing as a “comming soon” feature, so this barrier will fall sooner than later!

Vector Indexing announcement in Ignite 24

However, this wall should not stop you adopting the technology if you are wise in your use cases. If you need to full scan all the data, implementing semantic search might not be the best option.

Let’s see the use cases:

  1. translate the request to embedding.
TSQL embeddings query pattern

Fist, you need to translate the text search, to embedding.

-- search pattern
DECLARE @e VECTOR(1024);
DECLARE @string nvarchar(max) = 'complains and negative feeedback'
EXEC dbo.get_embeddings_1024 @model = 'text-embedding-3-small', @text = @string, @embedding = @e OUTPUT;
SELECT @e as embedding
Embedding result

Where @e is a vector of 1024 dimensions that represents the text “complains and negative feeedback”

And then, query the database for matching rows:

-- search pattern
DECLARE @e VECTOR(<size>);
DECLARE @string nvarchar(max) = 'complains and negative feeedback'
EXEC dbo.get_embeddings @model = 'text-embedding-3-small', @text = @string, @embedding = @e OUTPUT;

SELECT TOP (10) id, conversation, VECTOR_DISTANCE('cosine', @e, embedding) 'cosine'
FROM dbo.RAG_Orders
ORDER BY
VECTOR_DISTANCE('cosine', @e, embedding);
Similarity search

Everything seems fine so far, but a closer examination reveals the complexities involved. While I’m retrieving only the first 10 rows (TOP 10), you might assume that only those rows are accessed. However, that’s not the case. SQL Azure scans all the rows to obtain the embedding values and then calculates the VECTOR_DISTANCE function to determine the cosine distance.

This is the execution plan, where you can see it (execution plans are executed right to left):

Execution Plan

The computer scalar function, includes this calculation:

[Expr1002] = Scalar Operator(vector_distance('cosine',[@e],[db].[dbo].[RAG_Orders].[embedding]))

In this specific case, I have 100 rows. Now, imagine the impact of a full clustered index scan on a large dataset, both in terms of data access and function calculations. Since SQL Azure is a PaaS service, you might experience delays in query completion. Moreover, if this functionality is heavily utilized, it could significantly impact the VCPUs and IO bandwidth you have provisioned.

As mentioned earlier in the article, if your use case is focused on scenarios that can be filtered using relational data, the absence of vector indexes may not pose a significant issue.

For example, this would be the case:

-- search pattern
DECLARE @e VECTOR(<size>);
DECLARE @string nvarchar(max) = 'complains and negative feeedback'
EXEC dbo.get_embeddings @model = 'text-embedding-3-small', @text = @string, @embedding = @e OUTPUT;

SELECT TOP (10) id, conversation, VECTOR_DISTANCE('cosine', @e, embedding) 'cosine'
FROM dbo.RAG_Orders
WHERE order_date >= '1996-07-10'
AND order_date < '1996-07-18'
ORDER BY
VECTOR_DISTANCE('cosine', @e, embedding);
Dates Filter and Execution Plan

SQL Query Engine, filters the rows that apply, and then, will run the VECTOR_DISTANCE function. Notice, that an Index-Seek + Key-Lookup to the clustered index is performed. The reason is because the embedding, and conversation columns, are not present in the order_data non-clustered index. Adding these columns to a order_date index is a more complex decision that depends on several factors, such as the frequency with which the values are updated, the frequency of index usage, the index size, and the potential reduction in underlying I/O operations.

See that, although the query seeks for a TOP 10, the SQL Engine estimates that there are only 6 rows that match the filter criteria:

Estimated number of rows

When deciding whether to store both the conversation and the embedding within the same table or separately, the primary consideration lies in balancing storage efficiency, query performance, and use-case alignment. SQL Azure organizes data storage in fixed-size pages (typically 8 KB), meaning that embedding vectors, often large in size, can significantly increase the storage footprint of each row. This can lead to page fragmentation where pages would not be fully filled.

If the vector data is big, you may have a scenario where 1 pages can only allocate 1 row. See this picture as sample: the vector is so big, that two vectors don’t fit into the same page. This would affect performance in relational queries due to the large number of pages.

Vector Data Store

If the vector could be overflowed the picture would look like this:

Vector overflowed
EXEC sp_tableoption 'RAG_Orders_overflowed', 'large value types out of row', 1;

However, this option do not work — I do not know if Microsoft will add it in the future, so you should manually simulate the bevahiour. Pleaes note that row-overflow works correctly, but when the sum of the regular data types size exceeds the page size.

This would be tables representation:

2 tables Vector Data Store

And, this would be the script (one table for the relational data, and other for the semantic data):

-- Create the table with a vector data type oveflowed
CREATE TABLE RAG_Orders_v2 (
id INT PRIMARY KEY,
customer_id NCHAR(5),
order_date DATE DEFAULT GETDATE(),
sales_total DECIMAL(10,2) DEFAULT 0.0,
product_details_json NVARCHAR(MAX) NULL
);
GO

CREATE NONCLUSTERED INDEX NCI_RAG_Orders_v2_order_date
ON dbo.RAG_Orders_v2 (order_date)
INCLUDE (customer_id, sales_total);
GO

CREATE TABLE RAG_Orders_v2_semantic (
id INT PRIMARY KEY,
conversation NVARCHAR(MAX) NULL,
embedding VECTOR(1024) NULL
);
GO

INSERT INTO RAG_Orders_v2
SELECT id, customer_id, order_date, sales_total, product_details_json
FROM RAG_Orders;
INSERT INTO RAG_Orders_v2_semantic
SELECT id, conversation, embedding
FROM RAG_Orders;

Using sys.dm_db_index_physical_stats DMV, this is the data_pages, and row_size used for the pattern:

declare @dbid int = db_id();

select name, type_desc, index_type_desc, alloc_unit_type_desc,
page_count, cast( avg_page_space_used_in_percent as decimal(4,2)) avg_page_space_used_in_percent, avg_fragmentation_in_percent,
avg_record_size_in_bytes, min_record_size_in_bytes, max_record_size_in_bytes
from sys.tables t
cross APPLY sys.dm_db_index_physical_stats(@dbid, t.object_id, null, null, 'DETAILED') ips
where t.name in ('RAG_Orders', 'RAG_Orders_v2', 'RAG_Orders_v2_semantic')
and index_type_desc = 'CLUSTERED INDEX';

Note, that table RAG_Orders_v2, supports in 8 pages, all the relational data for the 100 rows. In the original table, it was necessary 100 pages. See aswell, that the average row-size is 500 bytes, which is a great size to support the relational columns (id, order_date, customer_id, and product_details_json). Although storage is 8 page more, it is more efficient when reading only relational data.

You will be responsible to JOIN the RAG_Orders_v2, and RAG_Orders_v2_semantic tables when needed.

Pages used

The decision also depends on the query patterns and their frequency. If relational queries (e.g., filtering by order_id, date, or metadata) dominate over semantic similarity queries, separating embeddings into a dedicated table or external storage might be beneficial. This separation allows the main table to remain compact and efficient for transactional operations while isolating the vector data. Conversely, if semantic similarity queries are frequent and closely tied to the relational data, keeping embeddings within the same table could streamline the querying process by eliminating the need for joins or additional lookup operations. There is an additional cost of accessing row-overflowed data aswell, which I do not know how to measure.

I cannot give a bold advice; however, these are the items to analyze:

  • frequency of the pure-relational queries.
  • frequency of the semantic search queries.
  • data frequency update for the relational data.
  • data frequency update for the vectors.
  • data size, rows size, and ammount of indexes.

Conclusions

The procedures provided in this article, including those for generating embeddings and creating semantic searches, offer a repeatable approach to use these technologies. With very few stored procedures and leveraging SQL Azure’s capabilities, you can enhance existing systems without big changes in your current architecture. If you have SQL Azure projects, it is a fantastic scenario for testing. If your team is proficient in SQL, don’t hesitate to run a Proof of Concept (PoC).

Determining the best approach to storing and querying semantic and relational data within SQL Azure depends on the specifics of your use case. Although not having native vector indexing is a bit challenging, Microsoft will fix it soon. Meanwhile do not say “no” right away, draw your use case, and evaluate the results, and see where it takes you.

Finally, it is worth noting that this article does not cover the efficiency or performance of embedding algorithms themselves. For a deeper analysis of these methods, you can explore resources such as the Hugging Face Embedding Leaderboard. You will have to test the embedding models during your PoC. But if you remember the code, it will be as complicated as changing two paremeters in the code: the model-name, and the vector-size.

--

--

Eladio Rincón Herrera
Eladio Rincón Herrera

Written by Eladio Rincón Herrera

Data projects: Dev, Azure, DBA, Mentor, Manager, Suppport, Trainer, Speaker, former MS SQL Server MVP. https://www.linkedin.com/in/erincon/

No responses yet