I have seen that SQL developers use varchar(max) data while designing the tables or temporary tables. We might not be sure about the data length, or we want to eliminate the string or binary truncation error.
我已经看到SQL开发人员在设计表或临时表时使用varchar(max)数据。 我们可能不确定数据长度,或者我们想消除字符串或二进制截断错误。
Is it a good practice to use varchar(max) for each usage?
在每种用法中都使用varchar(max)是一种好习惯吗?
We can define a specific range for the varchar (n) data type, and it is the recommended way to do so. In order to gain understanding about this data type, read SQL varchar(n) article.
我们可以为varchar(n)数据类型定义一个特定范围,这是推荐的方式。 为了了解这种数据类型,请阅读SQL varchar(n)文章。
We will discuss the use of varchar max and its implications, comparison with the varchar (n) data type in this article.
我们将讨论varchar max的用法及其含义,并与本文中的varchar(n)数据类型进行比较。
VARCHAR(max)SQL Server数据类型概述 (Overview of the VARCHAR(max) SQL Server Data Type)
The SQL Server 2005 introduced this varchar(max) data type. It replaces the large blob object Text, NText and Image data types. All these data types can store data up to 2 GB. As you might be aware that the basic unit of storage in SQL Server is a page. The page size is 8 KB (8192 byes) in SQL Server, and it is fixed. On a page, SQL Server uses 96 bytes for the page header. We can store 8096 bytes ( 8192-96 bytes) for data in SQL Server. Apart from this, page also contains row overhead and row offset and leaves 8000 bytes to use for data storage. Due to this, we can store up to 8000 bytes of data using varchar (8000) data type.
SQL Server 2005引入了这种varchar(max)数据类型。 它替换了较大的Blob对象Text,NText和Image数据类型。 所有这些数据类型最多可以存储2 GB的数据。 您可能已经知道,SQL Server中的基本存储单位是页面。 在SQL Server中,页面大小为8 KB(8192个字节),并且它是固定的。 在页面上,SQL Server使用96字节作为页面标题。 我们可以在SQL Server中存储8096字节(8192-96字节)的数据。 除此之外,页面还包含行开销和行偏移量,并保留8000个字节用于数据存储。 因此,我们可以使用varchar(8000)数据类型存储多达8000个字节的数据。
You might think of using the varchar(max) data type to store 2 GB data to resolve the string truncation issues.
您可能会想到使用varchar(max)数据类型存储2 GB数据来解决字符串截断问题。
Let’s create a few sample tables with different size in varchar data type. We will also create a table with a varchar(max) data type.
让我们在varchar数据类型中创建一些大小不同的示例表。 我们还将创建一个具有varchar(max)数据类型的表。
CREATE TABLE dbo.Employee_varchar_2000
(id INT IDENTITY PRIMARY KEY,
Col1 VARCHAR(2000)
);
CREATE TABLE dbo.Employee_Varchar_4500
(id INT IDENTITY PRIMARY KEY,
Col1 VARCHAR(4500)
);
CREATE TABLE dbo.Employee_Varchar_8000
(id INT IDENTITY PRIMARY KEY,
Col1 VARCHAR(8000)
);
CREATE TABLE dbo.Employee_Varchar_Max
(id INT IDENTITY PRIMARY KEY,
Col1 VARCHAR(MAX)
);
Let’s insert records into these sample tables using the following queries.
让我们使用以下查询将记录插入这些样本表中。
INSERT INTO Employee_varchar_2000 (Col1)
SELECT REPLICATE('A', 2000);
INSERT INTO Employee_varchar_4500 (Col1)
SELECT REPLICATE('A', 4500);
INSERT INTO Employee_varchar_8000 (Col1)
SELECT REPLICATE('A', 8000);
INSERT INTO Employee_varchar_max (Col1)
SELECT REPLICATE('A', 8000);
We can verify the data length in these tables using the following queries.
我们可以使用以下查询来验证这些表中的数据长度。
Use SQLShackDemo
go
SELECT LEN(col1) AS columnlength
FROM Employee_varchar_2000;
SELECT LEN(col1) AS columnlength
FROM Employee_varchar_4500;
SELECT LEN(col1) AS columnlength
FROM Employee_varchar_8000;
SELECT LEN(col1) AS columnlength
FROM Employee_varchar_max;
In the following screenshot, we can verify the data length is similar to existing table column length.
在下面的屏幕截图中,我们可以验证数据长度是否与现有表列长度相似。
Now, we can check the object statistics like page count, row count, and allocation unit using the DMV sys.dm_db_index_physical_stats.
现在,我们可以使用DMV sys.dm_db_index_physical_stats检查对象统计信息,例如页数,行数和分配单位。
SELECT OBJECT_NAME([object_id]) AS TableName,
alloc_unit_type_desc,
record_count,
page_count,
round(avg_page_space_used_in_percent,0) as avg_page_space_used_in_percent ,
min_record_size_in_bytes,
max_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED')
WHERE OBJECT_NAME([object_id]) LIKE 'Employee_varchar%';
We can see that all tables contains the allocation unit IN_ROW_Data. SQL Server stores all data in the IN_ROW_Data allocation unit.
我们可以看到所有表都包含分配单元IN_ROW_Data。 SQL Server将所有数据存储在IN_ROW_Data分配单元中。
We cannot insert more than 8000 bytes data in the varchar(n) data type. If we try to do so , we get the following error message.
我们不能在varchar(n)数据类型中插入超过8000个字节的数据。 如果尝试这样做,则会收到以下错误消息。
INSERT INTO Employee_varchar_8000 (Col1)
SELECT REPLICATE('A', 8001);
Go
INSERT INTO Employee_varchar_8000 (Col1)
SELECT REPLICATE('A', 10000);
It inserts the data successfully but truncates the values to 8000 characters. Similar truncation occurs for the Employee_varchar_max table containing the varchar(max) data type.
它成功插入数据,但将值截断为8000个字符。 对于包含varchar(max)数据类型的Employee_varchar_max表,也会发生类似的截断。
We need to cast the value to varchar(max) and insert for the length above 8000 characters. We get the error message while trying to insert records in Employee_varchar_8000 table.
我们需要将该值转换为varchar(max)并插入8000个字符以上的长度。 尝试在Employee_varchar_8000表中插入记录时收到错误消息。
INSERT INTO Employee_varchar_8000 (Col1)
SELECT REPLICATE(CONVERT(VARCHAR(max), 'x'), 8001);
It successfully inserts records in the Employee_varchar_max table.
它成功地将记录插入Employee_varchar_max表中。
Rerun the query to check the allocation unit. We get the LOB_Data allocation unit to store the data more than 8000 bytes in the Employee_Varchar_Max table. We have a pointer to this data in the IN_Row_DATA allocation unit.
重新运行查询以检查分配单元。 我们获得LOB_Data分配单元,以将数据存储在Employee_Varchar_Max表中超过8000个字节。 我们在IN_Row_DATA分配单元中有一个指向该数据的指针。
We can get the following conclusion from this.
由此我们可以得出以下结论。
- SQL Server uses the IN_ROW_DATA page for the varchar(max) data type if the data is less than or equal to 8000 bytes. 如果数据小于或等于8000个字节,则SQL Server将IN_ROW_DATA页用于varchar(max)数据类型。
- If the data grows beyond the 8000 bytes, SQL Server uses LOB_DATA page for the varchar(max) data type 如果数据增长到超过8000个字节,则SQL Server将LOB_DATA页用于varchar(max)数据类型
varchar(max)和varchar(n)数据类型之间的性能比较 (Performance comparison between varchar(max) and varchar(n) data type)
Let’s insert 10,000 records into each of the tables we created earlier. We want to check the data insertion time. You can use the ApexSQL Generate tool to insert the data without writing the t-SQL code for it.
让我们在之前创建的每个表中插入10,000条记录。 我们要检查数据插入时间。 您可以使用ApexSQL生成工具插入数据,而无需为其编写t-SQL代码。
In the following screenshot, you can note the following.
在以下屏幕截图中,您可以注意以下几点。
- Employee_varchar_2000 insertion time 0.08 Seconds Employee_varchar_2000插入时间0.08秒
- Employee_varchar_4500 insertion time 0.19 Seconds Employee_varchar_4500插入时间0.19秒
- Employee_varchar_8000 insertion time 0.31 Seconds Employee_varchar_8000插入时间0.31秒
- Employee_varchar_Max insertion time 2.72 Seconds Employee_varchar_Max插入时间2.72秒
VARCHAR(N)和VARCHAR(MAX)列上的索引
(Indexes on VARCHAR(N) and VARCHAR(MAX) columns
)
As a DBA, you might not design the table. However, it is required to create an Index on the tables to improve the performance of the query.
作为DBA,您可能不设计表。 但是,需要在表上创建索引以提高查询的性能。
We can create an index on the key column of the table holding varchar(n) data type.
我们可以在包含varchar(n)数据类型的表的键列上创建索引。
CREATE INDEX IX_Employee_varchar_2000_1
ON dbo.Employee_varchar_2000(col1)
GO
If we try to do the same for the varchar(max) data type, it gives the following error message.
如果我们尝试对varchar(max)数据类型执行相同的操作,则会给出以下错误消息。
CREATE INDEX IX_Employee_varchar_max
ON dbo.Employee_varchar_max(col1)
GO
Msg 1919, Level 16, State 1, Line 23 Column ‘col1’ in table ‘dbo.Employee_varchar_max’ is of a type that is invalid for use as a key column in an index.
消息1919,级别16,状态1,第23行在表'dbo.Employee_varchar_max'中的列'col1'具有无效的类型,不能用作索引中的键列。
We can use the varchar(max) column as an included column in the index, but you cannot perform the index seek on this column. It will also require additional storage. Therefore, you should avoid creating an index with the varchar(max) data type.
我们可以将varchar(max)列用作索引中的包含列,但是不能在此列上执行索引查找。 它还需要额外的存储空间。 因此,应避免使用varchar(max)数据类型创建索引。
执行计划比较 (Execution plan comparison)
Let’s compare the execution plan of two select statements.
让我们比较两个选择语句的执行计划。
In the first query, we want to retrieve data from the Employee_Varchar_2000 table and get the actual execution plan.
在第一个查询中,我们要从Employee_Varchar_2000表中检索数据并获取实际的执行计划。
In the actual execution plan, we can see a non-clustered index seek operator.
在实际的执行计划中,我们可以看到一个非聚集索引查找运算符。
If we run the same query with the varchar(max) data type, it uses a clustered index scan operator, and it can be a resource-intensive operator depending upon the number of rows in the table.
如果我们使用varchar(max)数据类型运行相同的查询,它将使用聚簇索引扫描运算符,并且它可能是资源密集型运算符,具体取决于表中的行数。
select col1 from Employee_varchar_max where col1 like ‘xxxx%’
从Employee_varchar_max中选择col1,其中col1喜欢'xxxx%'
Let’s compare the execution plan using the Compare Showplan option of SSMS. To compare two execution plans, save one execution plan by right click on the plan and Save Execution Plan as and provide the location to save the plan.
让我们使用SSMS的Compare Showplan选项比较执行计划。 要比较两个执行计划,请右键单击一个计划,然后将执行计划另存为,以保存一个执行计划,并提供保存计划的位置。
In another query execution plan, right-click and choose Compare Showplan. It opens a window, and you can specify the path of the earlier saved execution plan.
在另一个查询执行计划中,右键单击并选择“比较显示计划”。 它打开一个窗口,您可以指定先前保存的执行计划的路径。
In the following screenshot, you can see the comparison between both execution plans.
在以下屏幕截图中,您可以看到两个执行计划之间的比较。
- The estimated CPU cost is higher in the varchar(max) data type for a similar activity as of varchar(2000) 对于与varchar(2000)类似的活动,在varchar(max)数据类型中,估计的CPU成本较高。
- For the varchar(max) it uses clustered index scan operator and scans all records. You can see this the estimated number of rows is 10000 while in the varchar(2000) data type it uses index seek operator and estimated number of rows is 1.96078 对于varchar(max),它使用聚簇索引扫描运算符并扫描所有记录。 您可以看到,估计的行数为10000,而在varchar(2000)数据类型中,它使用索引查找运算符,估计的行数为1.96078
- Estimated row size 4035 B is greater than in varchar(max) compare to the 1011 B for the varchar(2000) data type 与varchar(2000)数据类型的1011 B相比,估计的行大小4035 B大于varchar(max)
varchar(max)和varchar(n)数据类型之间的区别 (Difference between the varchar(max) and varchar(n) data type)
varchar(max) | varchar(n) |
We can store up to 2 GB of data in this data type | We can store up to 8000 bytes data in this data type |
It uses the allocation unit IN_ROW_Data up to 8000 bytes of data. If data is more than 8000 bytes, it uses the LOB_Data page and stores its pointer in the IN_ROW_Data page | It stores data in the standard data page |
We cannot create an index on the key column of the varchar(max) data type | We can create an index on this data type |
We cannot compress the LOB data | We can compress data for this data type |
Data retrieval and updation on the LOB data is relatively slow | We do not face such issue in the varchar(n) data type |
varchar(最大) | varchar(n) |
我们可以在这种数据类型中存储多达2 GB的数据 | 我们可以在这种数据类型中存储多达8000个字节的数据 |
它使用分配单元IN_ROW_Data最多8000字节的数据。 如果数据超过8000个字节,它将使用LOB_Data页并将其指针存储在IN_ROW_Data页中 | 它将数据存储在标准数据页面中 |
我们无法在varchar(max)数据类型的键列上创建索引 | 我们可以在此数据类型上创建索引 |
我们无法压缩LOB数据 | 我们可以为此数据类型压缩数据 |
LOB数据的数据检索和更新相对较慢 | 我们不会在varchar(n)数据类型中遇到此类问题 |
结论 (Conclusion)
In this article, we demonstrated varchar(max) data type and also explored several differences between the varchar(max) and varchar(n) data types. You should use an appropriate data type. We should consider the database design, performance, compression, indexes in mind. You should review the data types in your database and change it if required with proper testing.
在本文中,我们演示了varchar(max)数据类型,还探讨了varchar(max)和varchar(n)数据类型之间的一些区别。 您应该使用适当的数据类型。 我们应该考虑数据库的设计,性能,压缩,索引。 您应该查看数据库中的数据类型,并在需要时通过适当的测试进行更改。