What’s the Point of Using VARCHAR(n) Anymore?

The arrival of the (MAX) data types in SQL Server 2005 were one of the most popular feature for the database developer. At the time, there was a lot of discussion as to whether this freedom from having to specify string length came at a cost. Rob attempts to give a final answer as to any down-side.

When you are storing large strings, or any other data types that you’d want to store as VARCHAR or VARBINARY, there are factors that affect how and where that data is stored inside SQL Server. This in turn will affect the performance of inserts and updates.

Basics: Allocation Unit Types, Row and Page Size Limits

SQL Server’s unit of physical storage is a page. Each page has a fixed size of 8K bytes (ref). Where possible, data is stored in rows within these ‘pages’. A row cannot overlap a page boundary. If the space allocated to a datatype is fixed, or if the variable sized data is small enough to allow the row to fit, the base allocation unit type, “IN_ROW_DATA”, is used. However, there are two ways of storing data that would otherwise overflow a page boundary.

  1. When a single row grows too large to be stored in-row, data can be offloaded to “ROW_OVERFLOW_DATA” pages.
  2. When a single column stores more than 8,000 bytes, or if the developer chooses to force this behavior, data is stored in “LOB_DATA” pages.

This becomes much clearer through examples.

Exercising Row and Page Size Limits

Single String Column with VARCHAR(n) and VARCHAR (MAX)

8,000 Characters

First, let’s build two tables and exercise some long strings to see how SQL Server handles both normal and large strings. Here, we will be exercising row size limits.

 

1

2

3

4

5

6

7

8

CREATE TABLE demo.OneColVarcharN   (Col1 VARCHAR(8000));

CREATE TABLE demo.OneColVarcharMax (Col1 VARCHAR(MAX));

 

INSERT INTO  demo.OneColVarcharN   (Col1)

SELECT REPLICATE('x', 8000);

 

INSERT INTO  demo.OneColVarcharMax (Col1)

SELECT REPLICATE('x', 8000);

 

After each insert, we will use versions of the following query to see into which type of page these records were loaded (see MSDN for details on the DMV):

 

1

2

3

4

5

6

7

8

9

10

SELECT

    OBJECT_NAME([object_id])        AS TableName,

    alloc_unit_type_desc            AS AllocUnitTp,

    page_count                      AS PgCt,

    avg_page_space_used_in_percent  AS AvgPgSpcUsed,

    record_count                    AS RcdCt,

    min_record_size_in_bytes        AS TableName,,

    max_record_size_in_bytes        AS MaxRcdSz,

    forwarded_record_count          AS FwdRcdCt

FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL , 'DETAILED');

 

The query results show that only “IN_ROW_DATA” pages were written for both tables, and the pages are 98.97% full.

TableNameAllocUnitTpPgCtAvgPgSpcUsedRcdCtMinRcdSzMaxRcdSzFwdRcdCt
OneColVarcharMaxIN_ROW_DATA198.971801180110
OneColVarcharNIN_ROW_DATA198.971801180110

8,001 Characters

Clearly we can’t insert 8,001 characters into an 8,000-character column, but we can in the VARCHAR(MAX) column.

 

1

2

INSERT INTO demo.OneColVarcharMax (Col1)

SELECT REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 8001);

 

The query results here show that pages were written to both “IN_ROW_DATA” and “LOB_DATA” pages.

TableNameAllocUnitTpPgCtAvgPgSpcUsedRcdCtMinRcdSzMaxRcdSzFwdRcdCt
OneColVarcharMaxIN_ROW_DATA10.43135350
OneColVarcharMaxLOB_DATA199.02180158015NULL

Note that the “in row” record is very small. In this case, the only thing stored “in row” is the 24-byte pointer. The full 8,001-character string is moved to the LOB page.

So SQL Server is storing normal VARCHAR(n) columns and VARCHAR(MAX) columns “in row” by default. When VARCHAR(MAX) exceeds 8,000 characters, the pointer is stored “in row”, and the string is stored in “LOB” pages.

Two String Columns with VARCHAR (n) and VARCHAR(MAX)

Let’s try this again but with two columns. This time, we will exercise page size limits.

8,000 Characters

As expected, two 4,000-character strings fit fine in both tables.

 

1

2

3

4

5

6

7

8

CREATE TABLE demo.TwoColVarcharN   (Col1 VARCHAR(8000), Col2 VARCHAR(8000));

CREATE TABLE demo.TwoColVarcharMax (Col1 VARCHAR(MAX),  Col2 VARCHAR(MAX));

 

INSERT INTO demo.TwoColVarcharN   (Col1, Col2)

SELECT REPLICATE('x', 4000), REPLICATE('x', 4000);

 

INSERT INTO demo.TwoColVarcharMax (Col1, Col2)

SELECT REPLICATE('x', 4000), REPLICATE('x', 4000);

 

The query results show that only “IN_ROW_DATA” pages are written for each table, and the pages are 99.00% full.

8,060 Characters

The maximum number of bytes per page is 8,060. Let’s see how this works.

It actually doesn’t require the full 8,060 characters to fill that page. 8,047 characters get written to just “in row” pages, but at 8,048 characters, it writes to both “in row” pages and either “row overflow” pages (VARCHAR(n)) or “LOB” pages (VARCHAR(MAX)).

Results with 8,047 Characters (4,023 and 4,024 characters)

TableNameAllocUnitTpPgCtAvgPgSpcUsedRcdCtMinRcdSzMaxRcdSzFwdRcdCt
TwoColVarcharMaxIN_ROW_DATA199.581806080600
TwoColVarcharNIN_ROW_DATA199.581806080600

Results with 8,048 Characters (4,024 and 4,024 characters)

TableNameAllocUnitTpPgCtAvgPgSpcUsedRcdCtMinRcdSzMaxRcdSzFwdRcdCt
TwoColVarcharMaxIN_ROW_DATA150.171406140610
TwoColVarcharMaxLOB_DATA149.89140384038NULL
TwoColVarcharNIN_ROW_DATA150.171406140610
TwoColVarcharNLOB_DATA149.98140384038NULL

Notice that at 8,047 characters, the record size is exactly 8,060 bytes. What is in that 23 bytes? “Each data page contains a page header which stores page meta-data, such as database file identifier, current page number, previous and next page numbers, number of free bytes per page and so forth.” (ref)

200,000 Characters

For the VARCHAR(MAX) table, the results are basically the same even if the column width is pushed far past the 8,000-character limit. Here, we write two 100,000-character strings.

 

1

2

3

4

INSERT INTO demo.TwoColVarcharMax (Col1, Col2)

SELECT

    REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 100000),

    REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 100000);

 

Results

TableNameAllocUnitTpPgCtAvgPgSpcUsedRcdCtMinRcdSzMaxRcdSzFwdRcdCt
TwoColVarcharMaxIN_ROW_DATA10.75161610
TwoColVarcharMaxLOB_DATA2791.89282288054NULL

Here, we have a single page storing two pointers and 27 pages storing the large strings.

In-Row vs. Out-of-Row LOB Storage

For “large-value data types” like VARCHAR(MAX), SQL Server allows the data to be stored in-row (up to 8,000 bytes) or out-of-row (ref).

We will use two new tables, one in-row and one out-of-row:

 

1

2

CREATE TABLE demo.OneColVarcharMaxIn  (Col1 VARCHAR(MAX));

CREATE TABLE demo.OneColVarcharMaxOut (Col1 VARCHAR(MAX));

 

To change a VARCHAR(MAX) column from the default (in row) to out-of-row, execute this statement:

 

1

EXEC sp_tableoption 'demo.OneColVarcharMaxOut', 'large value types out of row', 1;

 

1 Character

Writing a single-character string to each column in these tables creates an interesting result:

TableNameAllocUnitTpPgCtAvgPgSpcUsedRcdCtMinRcdSzMaxRcdSzFwdRcdCt
OneColVarcharMaxInIN_ROW_DATA10.15112120
OneColVarcharMaxOutIN_ROW_DATA10.33127270
OneColVarcharMaxOutLOB_DATA11.0418484NULL

The in-row table created just one page. The out-of-row page created one page for the pointer and one page for the out-of-row string. Notice the byte count difference: the in-row record takes 12 bytes, where the out-of-row record takes 111 bytes.

This illustrates two downsides of out-of-row storage when small records are written:

  1. Multiple pages are affected regardless of the size of the data being written.
  2. Extra storage space is required.

8,000 Characters

TableNameAllocUnitTpPgCtAvgPgSpcUsedRcdCtMinRcdSzMaxRcdSzFwdRcdCt
OneColVarcharMaxInIN_ROW_DATA198.971801180110
OneColVarcharMaxOutIN_ROW_DATA10.33127270
OneColVarcharMaxOutLOB_DATA250.022848014NULL

Notice that the out-of-row table writes two pages instead of just one (“PgCt” column). It actually takes only 65 characters in the out-of-row table to cause it to create the second page.

8,001 Characters

TableNameAllocUnitTpPgCtAvgPgSpcUsedRcdCtMinRcdSzMaxRcdSzFwdRcdCt
OneColVarcharMaxInIN_ROW_DATA10.43135350
OneColVarcharMaxInLOB_DATA199.0212680158015NULL
OneColVarcharMaxOutIN_ROW_DATA10.33127270
OneColVarcharMaxOutLOB_DATA298.43127848015NULL

Once the byte count goes over 8,000, the in-row table writes to a LOB page.

1,000,000 Characters

TableNameAllocUnitTpPgCtAvgPgSpcUsedRcdCtMinRcdSzMaxRcdSzFwdRcdCt
OneColVarcharMaxInIN_ROW_DATA10.43135350
OneColVarcharMaxInLOB_DATA12698.4212620208054NULL
OneColVarcharMaxOutIN_ROW_DATA10.33127270
OneColVarcharMaxOutLOB_DATA12698.43127848054NULL

With such large strings, the results are very similar between default (in-row) and out-of-row settings.

Performance

So how do these various choices affect performance? Let’s code some inserts and record results with different settings and values.

INSERT – 0 to 8,000 Characters

First, we’ll test the performance of a process that inserts strings which fit in VARCHAR(n).

The basic idea here is to insert 10,000 rows of data and record the timing. I ran the tests on my desktop. This is not a server-class machine but is a very high-powered desktop. All tests were executed on SQL Server 2008 R2 and Windows 7 Enterprise.

The results here show microseconds per insert for record lengths between 0 and 8,000 characters. The complete collection of test scripts is linked in the References section at the end of the article.

1432-image001small.png

See detailed results in Table 1.

The results here show that VARCHAR(n) is slightly faster than VARCHAR(MAX) (in-row) only for 0-character and 1,000-character strings. At 2,000 characters or above, VARCHAR(MAX) (in-row) is fastest. This result was not expected.

Intuitively, we expect that VARCHAR(n) should be faster than VARCHAR(MAX) (in-row), and in some cases it is. What I think this shows is that, if there is a performance difference, it is small enough that we can’t measure it consistently.

See detailed results in Table 1.

INSERT – Greater Than 8,000 Characters

These tests are limited to VARCHAR(MAX) but compare in-row to out-of-row.

The results here show microseconds per insert for records lengths between 9,000 and 30,000 characters.

1432-image002small.png

See detailed results in Table 2.

UPDATE – Column Limit

When a VARCHAR(MAX) in-row column goes over the 8,000-character limit, the data is moved to a LOB_DATA page. The linked script Allocation Units – UPDATE.sql tests this 8,000-, then 8,001-, then 8,000-character data shift. This test illustrates the move of data from an IN_ROW_DATA page to a LOB_DATA page and then back for the in-row table.

These operations can be expensive, but how expensive? Make a quick mental prediction before reading on.

The linked script PerfTest – UPDATE.sql tests both the 8,000-to-8,001 scenario and the 8,001-to-8,000 scenario. The results were actually quite surprising. I kept checking the code to see if I had done something wrong because the results didn’t match my expectations at all.

UPDATE Test Results

 

In/Out

From

To

 Elapsed Mcs

Penalty vs

Out-of-Row

In-Row

8,000

8,001

34,238,839

82%

Out-of-row

8,000

8,001

18,804,768

 

In-Row

8,001

8,000

9,969,989

-47%

Out-of-row

8,001

8,000

18,804,768

 

As expected, the 8,000-to-8,001 update takes longer for in-row than for out-of-row. The surprise is that the 8,001-to-8,000 update is faster for in-row. Based on these results, there must be a shortcut employed inside SQL Server that optimizes this move of data from a LOB_DATA page to an IN_ROW_DATA page.

UNICODE

All of these tests were conducted using non-UNICODE strings. Without compression, all the byte counts have to be doubled for UNICODE strings.

Recommendations

Don’t Use VARCHAR(MAX) Everywhere

One design-simplification technique that has been suggested is to use VARCHAR(MAX) for every string column. Though the tests here show that using VARCHAR(MAX) (in-row) instead of VARCHAR(n) is not a performance problem for strings under 8,000 characters, there are a number of reasons to limit the length of strings in your database.

UI Issues

Anyone with experience creating an application UI knows that handling long strings is difficult. When laying out the UI for an application, it is important to know the expected and maximum length of strings that will be displayed. Leaving this open to whatever someone writes to the database makes designing, building, and testing an application very difficult.

Performance – Writing or Updating Long Strings

From the performance results, you can see that writing long strings definitely affects write delays. While writing a record of any size (even zero characters) takes time, writing longer strings takes more time. As an example, using VARCHAR(n), writing 1,000 characters takes an average of 217 microseconds while writing 8,000 characters takes an average of 448 microseconds.

Compression

What I believe is the most compelling reason to avoid over-8,000-character strings is compression. LOB data never gets compressed.

“When considering data compression one very key thing to remember is that out of row data (LOB data) isn’t compressed. If the LOB data is stored in row, then it will be compressed, but only when you compress the table with PAGE level compression. If however you use ROW level compression then the LOB data will never be compressed, no matter if it is stored in row or if it is stored out of row.” (ref)

Conclusions

  • Where appropriate, use VARCHAR(n) over VARCHAR(MAX)
    • for reasons of good design if not performance benefits, and
    • because VARCHAR(MAX) data does not compress
  • Storing large strings takes longer than storing small strings.
  • Updating an in-row VARCHAR(MAX) value from below 8,000 to over 8,000 will be relatively slow, but the difference for a single transaction will likely not be measurable.
  • Updating an in-row VARCHAR(MAX) value from over 8,000 to below 8,000 will be faster than if the table is set to store data out-of-row.
  • Using the out-of-row option for VARCHAR(MAX) will cause slower writes until the strings are very long.

All the scripts used here are attached, so you are encouraged to retest the results.

References

Appendix

Table 1

TableName

MaxRecords

StrLen

ElapsedMcs

VarcharMaxIn

10,000

0

1,341,626

VarcharMaxOut

10,000

0

1,950,038

VarcharN

10,000

0

1,216,823

VarcharMaxIn

10,000

1,000

2,246,443

VarcharMaxOut

10,000

1,000

2,511,649

VarcharN

10,000

1,000

2,168,441

VarcharMaxIn

10,000

2,000

2,308,844

VarcharMaxOut

10,000

2,000

3,104,460

VarcharN

10,000

2,000

2,823,654

VarcharMaxIn

10,000

3,000

2,667,651

VarcharMaxOut

10,000

3,000

3,057,659

VarcharN

10,000

3,000

2,698,852

VarcharMaxIn

10,000

4,000

2,839,255

VarcharMaxOut

10,000

4,000

3,307,264

VarcharN

10,000

4,000

3,010,858

VarcharMaxIn

10,000

5,000

3,229,262

VarcharMaxOut

10,000

5,000

3,525,668

VarcharN

10,000

5,000

3,603,669

VarcharMaxIn

10,000

6,000

3,416,466

VarcharMaxOut

10,000

6,000

4,056,078

VarcharN

10,000

6,000

3,712,871

VarcharMaxIn

10,000

7,000

3,728,472

VarcharMaxOut

10,000

7,000

4,602,088

VarcharN

10,000

7,000

4,056,078

VarcharMaxIn

10,000

8,000

4,321,284

VarcharMaxOut

10,000

8,000

4,945,295

VarcharN

10,000

8,000

4,477,286

Table 2

TableName

MaxRecords

StrLen

ElapsedMcs

VarcharMaxIn

10,000

9,000

5,670,605

VarcharMaxOut

10,000

9,000

6,326,708

VarcharMaxIn

10,000

10,000

5,295,689

VarcharMaxOut

10,000

10,000

5,280,067

VarcharMaxIn

10,000

11,000

5,717,469

VarcharMaxOut

10,000

11,000

5,123,852

VarcharMaxIn

10,000

12,000

6,014,277

VarcharMaxOut

10,000

12,000

5,623,740

VarcharMaxIn

10,000

13,000

6,514,166

VarcharMaxOut

10,000

13,000

6,686,002

VarcharMaxIn

10,000

14,000

6,576,651

VarcharMaxOut

10,000

14,000

6,873,460

VarcharMaxIn

10,000

15,000

6,764,110

VarcharMaxOut

10,000

15,000

6,748,488

VarcharMaxIn

10,000

16,000

6,639,137

VarcharMaxOut

10,000

16,000

7,154,647

VarcharMaxIn

10,000

17,000

7,279,619

VarcharMaxOut

10,000

17,000

7,420,213

VarcharMaxIn

10,000

18,000

7,435,834

VarcharMaxOut

10,000

18,000

7,089,147

VarcharMaxIn

10,000

19,000

7,804,550

VarcharMaxOut

10,000

19,000

7,070,923

VarcharMaxIn

10,000

20,000

7,336,277

VarcharMaxOut

10,000

20,000

8,023,077

VarcharMaxIn

10,000

21,000

8,538,178

VarcharMaxOut

10,000

21,000

7,976,250

VarcharMaxIn

10,000

22,000

7,679,677

VarcharMaxOut

10,000

22,000

8,007,468

VarcharMaxIn

10,000

23,000

7,617,241

VarcharMaxOut

10,000

23,000

7,788,941

VarcharMaxIn

10,000

24,000

7,835,768

VarcharMaxOut

10,000

24,000

8,194,778

VarcharMaxIn

10,000

25,000

8,475,741

VarcharMaxOut

10,000

25,000

8,506,960

VarcharMaxIn

10,000

26,000

8,819,141

VarcharMaxOut

10,000

26,000

8,382,087

VarcharMaxIn

10,000

27,000

9,084,496

VarcharMaxOut

10,000

27,000

8,881,578

VarcharMaxIn

10,000

28,000

8,975,233

VarcharMaxOut

10,000

28,000

9,053,278

VarcharMaxIn

10,000

29,000

9,505,942

VarcharMaxOut

10,000

29,000

8,944,014

VarcharMaxIn

10,000

30,000

9,100,105

VarcharMaxOut

10,000

30,000

9,022,060

源文链接:https://www.red-gate.com/simple-talk/sql/database-administration/whats-the-point-of-using-varcharn-anymore/ 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值