Performance Comparison - Identity() x NewId() x NewSequentialId

原文链接:http://www.codeproject.com/Articles/32597/Performance-Comparison-Identity-x-NewId-x-NewSeque
AgileWare, 16 Jan 2009

目录

Introduction

The NEWSEQUENTIALID system function is an addition to SQL Server 2005. It seeks to bring together, what used to be, conflicting requirements in SQL Server 2000; namely identity-level insert performance, and globally unique values.

Background

GUIDs (Globally Unique Identifiers) are pseudo-random 16-byte (128-bit) numbers whose generation algorithm provides a sufficiently high degree of probability that the same GUID will never be generated twice on any computer at any time. Although a GUID is a binary array, it is often represented in its hexadecimal form using the following format, for example: dbbc2827-edf8-4a2d-92ad-c1e0059304d7. This makes them suitable for the ‘distributed uniqueness requirement’ of our Orders table.

The problem would be pretty much solved there unless we have some non-trivial performance requirements, especially around insert volumes and speed. As GUIDs are effectively pseudo-random (the degree of randomness / predictability was changed by Microsoft in response to privacy concerns), they do not possess the same orderly insert benefits as identity columns when used as the clustering key. The insert IO pattern changes from sequential to random-access when using GUIDs instead of an IDENTITY column. Not only do GUIDs require far more IO to locate the insert location in a clustered index (the index has to be traversed from the root for every insert), fragmentation at the leaf level leads to less dense pages, which in turn requires defragmentation / reindexing.

Using the Code

NEWSEQUENTIALID()

Enter NEWSEQUENTIALID(), stage left. This is a new system function included in SQL Server 2005 that combines the best of both worlds. NEWSEQUENTIALID() will generate a GUID that will be greater in value than the previously generated one. It is easy to see how this function works by looking at the output the following TSQL script generates:

create table TestTable ( 
   id uniqueidentifier default newsequentialid() 
      not null primary key clustered, 
   sequence int); 
go 

-- Insert 100 rows. 
declare @count int; 
set @count = 0; 
while @count < 100 begin 
   insert TestTable (sequence) 
   values (@count); 

   set @count = @count + 1; 
end; 
go 

select * 
from TestTable 
order by id; 
go

Output:

id sequence 
------------------------------------ ----------- 
FA780E3B-03C2-DB11-BD9F-0011D82F3F23 0 
FB780E3B-03C2-DB11-BD9F-0011D82F3F23 1 
FC780E3B-03C2-DB11-BD9F-0011D82F3F23 2 
FD780E3B-03C2-DB11-BD9F-0011D82F3F23 3 
FE780E3B-03C2-DB11-BD9F-0011D82F3F23 4 
...
5C790E3B-03C2-DB11-BD9F-0011D82F3F23 98 
5D790E3B-03C2-DB11-BD9F-0011D82F3F23 99

You’ll notice that there is a clear bit-shifting pattern that SQL Server is employing to keep the GUIDs increasing in relative value. These are the restrictions in NEWSEQUENTIALID’s use: NEWSEQUENTIALID cannot be used in arbitrary TSQL statements. It can only be specified as the DEFAULT value for an unique identifier column. More than one column in a table can use NEWSEQUENTIALID. It cannot be combined with other scalar functions, e.g., REVERSE(NEWSEQUENTIALID()) is not allowed. The network card’s MAC address is easily identifiable in the generated GUID. My laptop’s MAC address is 00-01-4A-28-64-8B. This is one of the NEWSEQUENTIALID GUIDs that was generated: 4EAC7708-30C3-DB11-B902-00014A28648B. Hence the privacy warnings in BOL: Correlating the position of the MAC address in the GUID with the bits that get incremented / shifted for each successive GUID sheds light on why the GUIDs remain globally unique. The MAC address portion of the GUID does not change.

Performance Comparison

The following test demonstrates the benefits of NEWSEQUENTIALID. The insert performance of the various clustering key variations was measured and compared. For each test, the same TestTable was recreated with a different Id column definition:

IDENTITY() 
NEWID() 
NEWSEQUENTIALID()

The following TSQL contains the DDL used to construct the tables for the three clustered keys:

-- IDENTITY 
create table TestTable ( 
   id int identity(1,1) not null primary key clustered, 
   sequence int not null, 
   data char(250) not null default ''); 
go 

-- NEWID 
create table TestTable ( 
   id uniqueidentifier default newid() not null primary key clustered, 
   sequence int not null, 
   data char(250) not null default ''); 
go 

-- NEWSEQUENTIALID 
create table TestTable ( 
   id uniqueidentifier default newsequentialid() not null primary key clustered, 
   sequence int not null, 
   data char(250) not null default ''); 
go

For each test, 50,000 rows were inserted into the TestTable table using the following TSQL script:

-- Insert 50,000 rows. 
declare @count int; 
set @count = 0; 
while @count < 50000 begin 
   insert TestTable (sequence) 
   values (@count); 

   set @count = @count + 1; 
end; 
go

The following metrics were gathered after each run:

-- Get the number of read / writes for this session... 
select reads, writes 
from sys.dm_exec_sessions 
where session_id = @@spid; 

-- Get the page fragmentation and density at the leaf level. 
select index_type_desc, index_depth, page_count, 
       avg_page_space_used_in_percent, avg_fragmentation_in_percent, record_count 
from sys.dm_db_index_physical_stats(db_id(), 
               object_id('TestTable'), null, null, 'detailed') 
where index_level = 0; 
go

The results are quite compelling:

-ReadsWritesLeaf PagesAvg Page UsedAvg FragmentationRecord Count
IDENTITY()01,6831,66798.9%0.7%50,000
NEWID()05,3862,48669.3%99.2%50,000
NEWSEQUENTIALID()01,7461,72599.9%1.0%50,000

Conclusion

The most striking is the number of writes required by the NEWID system function. This, coupled with the average page density of 69%, is evidence of the page splitting caused by the random distribution of inserts at the leaf level. As soon as a page fills up, it needs to be split into two pages of 50% each for the insert to complete. Not only has page splitting resulted in poor page density, it has fragmented the data pages quite badly (there is a 99% probability that the next data page is not next to the current one). In our tests, the most likely place for a free page required for the page split is at the end of the table irrespective of where the row is being inserted. Therefore, to read the rows in order, the scan needs to keep jumping back and forth between widely distributed split pages, hence the appalling fragmentation. The minor difference in page count between IDENTITY and NEWSEQUENTIALID metrics is due to the difference in size between IDENTITY’s INT (4 bytes) and NEWSEQUENTIALID’s UNQUEIDENTIFIER (16 bytes). However, note that any non-clustered index on the UNIQUEIDENTIFIER table will be four times larger than the same index on the IDENTITY table. Therefore, IDENTITY is still preferred over NEWSEQUENTIALID when choosing between the two. The NEWSEQUENTIALID system function has clearly lived up to its claim of providing GUID-like uniqueness coupled with identity-level insert performance. The number of writes, fragmentation, and page density are all inline with identity-level metrics. These benefits make NEWSEQUENTIALID a compelling feature.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合;、下 4载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合;、下载 4使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合;、下载 4使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值