Microsoft SQL Server I/OBasics

Chapter 2

SQL Server 2000 SP4 and SQL Server 2005

Author: Bob Dorr (SQL Server Senior Escalation Engineer)

Reviewers:Mike Zwilling (SQL Server Development Manager)
Paul Randal (SQL Server Lead ProgramManager)
Mirek Sztajno (SQL Server ProgramManager)
Steve Schmidt (SQL Server SoftwareDevelopment Engineer)
Peter Byrne (SQL Server SoftwareDevelopment Engineer)
Eric Christensen (SQL ServerSoftware Development Engineer)
Ryan Stonecipher (SQL ServerSoftware Development Engineer)
George Reynya (SQL Server SoftwareDevelopment Engineer)
Kevin Farlee (SQL Server ProgramManager)
Burzin Patel (SQL Server ProgramManager)
Wei Xiao (SQL Server Technical Lead)
Kathy Lu (SQL Server Engine Test)
Bob Ward (SQL Server Senior EscalationEngineer)
Suresh Kandoth (SQL Server Escalation Engineer)

Published: June 2006

SUMMARY: Outlines the I/O features and enhancementsin SQL Server 2000 Service Pack 4 and SQL Server 2005.


Copyright

The information that is contained in this document represents thecurrent view of Microsoft Corporation on the issues discussed as of the date ofpublication. Because Microsoft must respond to changing market conditions, itshould not be interpreted to be a commitment on the part of Microsoft, and Microsoftcannot guarantee the accuracy of any information presented after the date ofpublication.

This White Paper is for informational purposes only. MICROSOFT MAKESNO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THISDOCUMENT.

Complying with all applicable copyright laws is the responsibilityof the user. Without limiting the rights under copyright, no part of thisdocument may be reproduced, stored in or introduced into a retrieval system, ortransmitted in any form or by any means (electronic, mechanical, photocopying,recording, or otherwise), or for any purpose, without the express writtenpermission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks,copyrights, or other intellectual property rights covering subject matter inthis document. Except as expressly provided in any written license agreementfrom Microsoft, the furnishing of this document does not give you any licenseto these patents, trademarks, copyrights, or other intellectual property.

2006 Microsoft Corporation. All rights reserved.

Microsoft, Windows, Windows NT, and Windows Server are registeredtrademarks of Microsoft Corporation in the United States and/or othercountries.

The names of actual companies and products mentioned herein may bethe trademarks of their respective owners.


Contents

Introduction. 1

Terms. 1

Maintenance and Configuration. 4

Pull-the-plug power outage testing. 4

Atomic writes6

Defragmenting disk drives6

Backup hardening. 8

4-KB disk sector sizes9

Sector rewrite. 9

Align with physical sector boundary. 10

Align with 8KB boundary. 12

Larger transaction logs13

Restore and attach. 14

Format for 4KB sectors run onsmaller sectors14

System and sample databases15

Determining the formatted sectorsize of database. 15

What sector sizes does SQL Serversupport?15

Remote mirroring. 15

Microsoft SQL Server 2005 I/O ErrorMessage Changes and Additions. 17

Error 823. 17

Error 824. 18

Error 832. 20

Error 833. 21

Microsoft SQL Server 2005Enhancements. 22

Checksum.. 22

Writes23

Reads23

Damage. 23

PAGE_VERIFY usage. 24

In-memory checksums26

Latch enforcement29

Checksum on backup and restore. 30

Page-level restore. 31

Database available during Undo phase. 31

Torn page protection. 31

Common reasons32

Implementation. 32

Stale read protection. 33

Stalled I/O detection. 34

sys.dm_io_pending_io_requests (DMV)35

Read retry. 37

Resource-based retries37

Sort retries37

Other read failure retries38

Page audit38

Log audit39

Checkpoint39

WriteMultiple extended. 41

Read-ahead enhanced. 42

Sparse files / Copy on write /Streams42

Streams42

Copy-on-write and sparse files43

Stream and sparse file visibility. 44

Snapshot reads45

Instant file initialization. 45

I/O affinity and snapshot backups47

Locked memory pages47

Idle server48

Database mirroring (DBM)50

Multiple instance access toread-only databases51

Ramp up of local cache. 51

Encrypted file systems (EFS)52

DiskPar.exe. 52

Always On high-availability datastorage. 53

SQLIOSim.. 53

Conclusion. 53

References. 54



Introduction

Microsoft SQL Server 2005 continues toinnovate and extend I/O performance and reliability. With new innovations comenew terms, designs, and algorithms.

The SQLServer 2000 I/O Basics document is a prerequisite to the informationcontained in this document. Read it before you read this paper.

As the administrator of a SQL Server 2005installation, you will find that visibility into the SQL Server I/O subsystem hasbeen significantly increased. A primary focus of the SQL Server 2005 I/O designwas overall stability of performance and data integrity as these relate to the outsideinfluences of the I/O subsystem/path. The new and extended features provide bothmore visibility and more capabilities. You can maintain a SQL Server 2005 installationwith high confidence in the system.

This white paper introduces new terms, discussesmaintenance and configuration issues, new and improved error messages, and I/Oenhancements.

After reading this document you will betterunderstand SQL Server I/O needs and capabilities.

Terms

This section defines new terms that are usedboth in this paper and in documents that are referenced by this paper.

Subsystem / Path

The I/O subsystem or path includes thosecomponents that are used to support an I/O operation. These include, but arenot limited to, the operating system, drivers, disk drives, controller cards,and caches.

Checksum

Checksumis an error-detection scheme that uses a formulated numeric value to representa grouping of bits. The same formula is applied to the grouping to verify thatthe accompanying numeric value is the same. If it is not, the data isconsidered to be changed or damaged.

Inother words, a formula is applied to a given set of bytes. The same formula canlater be used to examine the data for damage.

Asa simple example, take the following 2-byte string value of “AB” and apply aformula to it (Value = Position + ASCII value). Checksum is the sum of thevalues.


Character

POS

ASCII Value

Formula Value

A

1

65

67

B

2

66

68



Checksum

135



Nowassume that the string was stored on disk but when read back it was “AC”.


Character

POS

ASCII Value

Formula Value

A

1

65

67

C

2

67

69



Checksum

136


Comparingthe checksum values indicatesthat the values do not match and damage has occurred to the data.

Clearlythis simple example would not handle a change of the string to “BA.” However, theroutines that are used by SQL Server and Microsoft Exchange are more sophisticatedand detect position reversal as a damaged data error condition.

ReadRetry

Whena read from stable media returns an error, the read operation is tried again. Undercertain conditions, issuing the same read returns the correct data. This is aserious I/O subsystem problem that should be addressed immediately to avoid SQLServer and overall system stability problems and data loss. Obtain assistancefrom your I/O vendor if you encounter retries.

DBCCPage Auditing

DBCCpage auditing is the process of executing common DBCC consistency checks aspages are read from disk.

Time ofLast Access

Thetime of last access is a caching algorithm that enables cache entries to beordered by their access times. SQL Server 2005 changed its lazy writer sothat it returns data pages to the free list based on the time they were lastaccessed. This is different from the reference count design that was used inSQL Server 7.0 and 2000.

IdleServer

Anidle server provides the ability, for certain SQL Server SKUs, to become silent;the idle server suspends the execution of core tasks such as checkpoint and lazywriter. The suspension takes place when no user requests are present and wakeswhen a new request arrives. This is similar to the suspend and resume operationsperformed by the operating system.

Copy-On-Write/ Snapshots

Copy-on-writeis the act of making a copy of the data before a modification is complete. SQLServer copy-on-write actions are used to maintain snapshot databases in SQLServer 2005. Before a data page in the database can be modified, theunchanged p_w_picpath of the page is written (copied) to the snapshot database. Thesnapshot contains only the original p_w_picpaths of data pages that have beenmodified after snapshot creation. A union of the unmodified database pages andthe snapshot pages provides the point-in-time view of the data.

SQLServer 2005 introduces the snapshot database feature for point-in-timedatabases and online DBCC operations. For more information, see DatabaseSnapshots in SQL Server 2005 Books Online.

SparseFiles

Sparsefiles provide a way to save disk space for files that contain large sections ofdata that is composed of zeros. An NTFS file marked as sparse only allocatesdisk clusters for the data explicitly written by the application. Ranges of thefile that have not been written to are represented by non-allocated space. Whendata is read from allocated ranges, the data is returned as it was stored. Dataread from non-allocated ranges is returned as zeros.

Forinformation on sparse file behavior, see Understanding Sparse FileSizes in Database Snapshots (http://msdn2.microsoft.com/en-us/library/ms175823.aspx) and Sparse Files (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/sparse_files.asp) on the Microsoft Developer Network (MSDN).

Forexample, you can create a sparse file of 100 GB on a 20-GB drive. Thestorage space that is required for the sparse file is only that of the actualbytes written to the file and not the maximum file size.

SQLServer 2005 uses sparse files for snapshot database files and online DBCCoperations against files stored on NTFS volumes.

Streams

NTFSvolumes enable data files to have one or more secondary storage streams. Thesestreams are all part of the same file; if the file is copied, all streams arecopied. Each stream works independently, enablingindividualdata storage attributes and sizing.

SQLServer 2005 online DBCC checks are based on the database snapshottechnology by using transient, secondary file streams to store point-in-timep_w_picpaths of data pages when the DBCC is in progress.

Note: Many common utilities do not expose the secondary streaminformation. This includes size.

Formore information, see: File Streams on MSDN (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/file_streams.asp)

Hardenor Hardening

To harden is to require thatdata be stored on stable media during a specific event; the data is not just ina cache that could be lost if power was lost. For example, when checkpoint is finished,the dirty data pages have been written to stable media and are considered to behardened. As soon as the hardened state is declared, activities such astruncating the inactive part of the transaction log can occur because the logrecords are no longer needed for crash recovery.


Memory Scribbler

TheSQL Server support team uses the term scribblerto indicate anunexpected memory change for memory where the caller is not the true owner.

Thefollowing code example shows the setting of values in illegal array positions.


chargchArray[10];

gchArray[-1]= ‘B’; -- Underflows the array bounds

gchArray[10]= ‘R’; -- Overflows the array bounds


Thefirst example puts the value ‘B’ in the memory location just in front of thearray allocation and the second puts an ‘R’ in the memory location just afterthe array allocation. In both cases, they ‘scribble’ in memory that the arraydoes not own.

Anotherexample of a scribbler is using a stale pointer.


char* pchData = new char[10];
DELETE [] pchData;

memcpy(pchData, 0, sizeof(char) * 10)); -- pchDatais not valid


Thememory controlled by the pointer was released with the DELETE[] operation and is no longer owned by the current codeline. The memory could be reused by another thread and the memcpy would damagememory it did not own, thereby scribbling on it.

Scribblingoccurs because of bugs. Some of these are SQL Server bugs but many othersinvolve other components. Be cautious of third-party extended storedprocedures, COM objects, and Linked Servers that may be present in the SQLServer address space.

Maintenance and Configuration

Thissection outlines maintenance and configuration issues that should be fullyunderstood before deploying SQL Server.

Pull-the-plug power outage testing

Poweroutage testing is a fundamental and critical part of the administrator’s datasafety and integrity requirements. Because the number of caching,defragmentation, and other products that enhance I/O performance has greatlyincreased, it is important to perform pull-the-plugpower outage tests before production deployment. Performing safety checks ofSQL Server and all other system components before production deployment is veryimportant. Many products successfully meet pull-the-plug requirements when theyare configured correctly. Incorrect configurations lead to data loss.

Pull-the-plugtesting should be done by using either the production environment or a replicaof this environment. This includes third-party products that will be used inproduction.

Thefollowing is a list of pull-the-plug situations. To guarantee equipment safety, make sure that the correct electricalstandards are followed.

·Remove the power cord thatsupplies power to the mother board. Most tests implement this by using aswitched outlet.

·Remove the power cord thatsupplies power to storage devices. This may be the same as the motherboard or itmay be separate for external storage devices.

·Disconnect external devices,wait and then reconnect the data cabling.

As withall power outage testing, ensure sufficient delay during the outage tests.

We stronglyencourage you to establish the appropriate pull-the-plug scenario testing fordatabase log and data files in addition to backup files. Test actual poweroutages in a variety of scenarios to guarantee data integrity. Somepull-the-plug scenario examples follow.

Establisha wide recovery interval during these tests. While this is likely not theproduction configuration, it helps expose misconfigured I/O subsystemcomponents. We suggest setting the sp_configurerecovery interval to a high value, such as 5000.


Scenario

Basic Testing Steps

Transaction Safety: Commits

1.Insert a known set of data and commit the transaction(s).

2.Pull the plug.

3.Restart the server.

4.Validate database(s) with DBCC CHECKDB.

5.Validate all committed data exists.

Transaction Safety: Aborts

1.Insert a known set of data and commit the transaction(s).

2.Create several active transactions that modify data and leave the transactions open.

3.Issue a CHECKPOINT in each database.

4.Pull the plug.

5.Restart the server.

6.Validate database(s) with DBCC CHECKDB.

7.Validate all committed data exists and all uncommitted modifications where rolled back.



Backup Safety

1.Take a full database backup.

2.Take a transaction log backup.

3.Start a series of known data insert, update, and delete transactions.

4.While data modification is occurring, continue to take a series of log backups.

5.Immediately after a backup is finished, pull the plug.

6.Restart the server.

7.Validate databases(s) with DBCC CHECKDB.

8.Validate the state of the data.

9.Restore the backups and execute appropriate validation steps.

Note: Always do these tests with the checksum option enabled on thedatabases.

Restorebackups on secondary systems frequently to make sure that your complete backupstrategy is functioning correctly. That way you can recover from a failurecondition.

For moreinformation about I/O caching requirements and SQL Server, see the following whitepapers on MSDN.

Microsoft SQL Server I/O subsystem requirementsfor the tempdb database
(http://support.microsoft.com/kb/917047/en-us)

Key factors to consider when evaluatingthird-party file cache systems with SQL Server
(http://support.microsoft.com/kb/917043/en-us)

Atomic writes

The I/Osubsystem must support atomic writes. Controllers and I/O subsystem componentsshould not allow physical writes to occur before all data is safely transferredto stable media. If writes can start before the complete I/O request has beentransferred, a power outage leads to a partial write and data is torn anddamaged.

Defragmenting disk drives

Becausephysical data access is the most expensive part of an I/O request, defragmentationcan provide performance gains for SQL Server and other applications. Positioningrelated data close to each other reduces I/O operation requirements.

Variousdefragmentation utilities are available on the market today. Some utilities enabledefragmentation on open files, whereas others require closed-file defragmentationor perform better when used under closed-file conditions. Additionally, some utilitieshave transactional capabilities, whereas others do not.

When youevaluate a defragmentation utility for use with SQL Server, the utility shouldprovide transactional data capabilities. Use defragmentation utilities thatprovide the following transactional data capabilities.

·The original sector should notbe considered moved until the new sector has been successfully established and thedata successfully copied.

·The utility should protectagainst a system failure, such as a power outage, in a safe way that enables thefiles to remain logically and physically intact. To guarantee data integrity, apull–the-plug test is highly recommended when a defragmentation utility is runningon a SQL Server-based file.

·The Write-Ahead Logging (WAL)protocol requires the prevention of sector re-writes to avoid data loss. Theutility must maintain the physical integrity of the file as long as it does anydata movement. In fact, it should work on sector boundaries in a transactionalway to keep the SQL Server files correctly intact.

·The utility should provideappropriate locking mechanisms to guarantee that the file retains a consistentp_w_picpath for any modifications. For example, the original sector cannot bemodified when it is copied to a new location. Therefore, a defragmentationutility could lose the write if modifications are allowed.

Makesure that you understand any write-caching strategies that the utility uses. Cachingby such a utility might involve a non-battery-backed cache and could violateWAL protocol requirements.

Open-filedefragmenting raises several possible issues that closed-file defragmenting typicallydoes not.

·Open-file defragmenting affectsperformance. Defragmentation utilities may lock sections of the file,preventing SQL Server from completing a read or write operation. This canaffect the concurrency of the server that is running SQL Server. Contact thedefragmentation manufacturer to learn how files are locked and how this couldaffect SQL Server concurrency.

·Open-file defragmenting canaffect write caching and ordering. Open-file-based utilities require I/O pathcomponents; these components must not change the ordering or intended nature ofthe write operation. If the write-through or WAL protocol tenants are broken,database damage is likely to occur. The database and all associated files areconsidered to be a single entity. (This is covered in many Knowledge Basearticles, SQL Server Books Online, various white papers, and in Remote Mirroring later in this paper.) All writes must retain the original write-orderingsequences and write-through capabilities.

We alwaysrecommend performing a full backup before you defragment those locations thatcontain SQL Server database and backup files.


Backup hardening

Thehardening of backup streams has existed for all Microsoft versions of SQLServer, but the topic has not been formally documented. To harden a backupmeans that all data written to backup media has been successfully stored onstable media; sections of the backup are not just held in the cache. If hardening of the backup is not correctlyachieved, the subsequent truncation of the database transaction log could be performedprematurely.

For anexample related to hardening of the backup stream, consider a power outage. Thesequence occurs in the following steps.

1.The backup stream I/Os have finished,but the tail of the backup stream data is held in a cache.

2.The SQL Server backup processingexpects that the data was stored on stable media (hardened) and it continuesprocessing, truncating the database transaction log.

3.A pull-the-plug event occurs sothe tail of the backup stream is lost.

4.Upon restart, SQL Server crashrecovery successfully recovers the database without error.

5.However, the backup stream ismissing data because of the power outage. No specific error about this damageis produced.

Thebackup strategy is now damaged and the administrator does not know about thedamage until a restore attempt of the backup sequence finds the damaged stream.All log backups following this kind of event are unusable.

Note: SQL Server 2005 improves the restoreverification capabilities and extends checksum capabilities to backups, butonly a full restore can be used to fully validate the backup strategy. It is not safe to assume that a successful RESTORE VERIFYONLY immediately following a backup operation indicates all the data is correctlystored on stable media. A caching system couldprovide the data from cache and the backup stream could still be exposed to dataloss. Always contact a cache manufacturer to understand the boundaries of thecache and its relationship to stable storage. Make sure that you have a way to readthe backup stream data directly from stable media for verification.

The SQLServer backup process does not use FILE_FLAG_WRITE_THROUGH, as it does for the database data and log files; instead, it enablessystem-level I/O buffering. To guarantee data integrity, at the end of a backupoperation SQL Server uses FlushFileBuffers to force all data written to the backup steams to stable media andthen closes the file.

Noticethat for a FlushFileBuffers API call, the NTFS file system issues a write forall dirty pages in the file cache and then invokes IRP_MJ_FLUSH_BUFFERS to makesure that all disk caches are flushed. All third-party caching systems shouldprovide similar semantics to guarantee that data is correctly stored on stablemedia.


FlushFileBuffers isthe mechanism that is used to harden the backup stream. Therefore, make surethat FlushFileBuffersachieves stable media storage because when the backup successfully completes theFlushFileBuffers, itcloses the streams and then truncates the database transaction log.

Oneother caveat to remember about backup hardening involves moving or compressingthe backup. Many backup strategies include the ability to take a backup and thencompress it or copy/move it. The compression or copy/move operation is anextension of the implied WAL contract. In-place compression must includetransactional capabilities (as outlined in Defragmenting Disk Drives earlier in this paper) or else the compression action exposes dataloss possibilities. Always make sure that the integrity of the backup stream ismaintained and correctly secured on stable media before destroying the originalfile.

4-KB disk sector sizes

Newmanufacturing techniques and specifications will produce products that includedrives that support larger sectors than the current 512-byte sector formatswhere one 8-KB SQL Server data page requires sixteen 512-byte sectors. Alongsidethe increased sector size are various manufacturer implementations of the 512-bytesto 4-KB sectors. For example, actual sector sizes might be 512-bytes, 1-KB, and4-KB. The increased sector sizes and manufacturer implementation derivationsaffect the SQL Server Write-Ahead-Logging protocol in several ways, which are describedin this section.

Sector rewrite

Some ofthe current manufacturer implementations request a logical view of the on-disksectors. This means the operating system sees a sector size of 512 bytes butthe drive performs the appropriate logical-to-physical mapping to the largersector sizes as stored on physical disk. A logical sector write can include thereading of nearby data known as Read Modify Write (RWM) to finish the completesector-sized write.

For moreinformation on logical to physical sector mapping, see Implementation Guidelinesfor 1K/4K Sector Sizes (http://www.t13.org/docs2005/e05122r3-WD_Comments_on_Long_Sectors.pdf).

The SQLServer transaction log files are always written with sector-aligned sizes andat sector-aligned boundaries. A small-logical to larger-physical mapping actioncauses the rewrite (RMW) of log sectors that were already saved to stable media.If the subsequent write fails, the transaction log is damaged and previouslycommitted transactions can be lost. This is termed a tear or torn condition ofthe already hardened log.

“…SQL Server 6.x may not see the same performance impact fromfrequent and small transaction log writes. SQL Server 6.x rewritesthe same 2-KB log page as transactions are committed. This can reduce thesize of the log significantly compared to the 512-byte sector boundaryflushes in SQL Server 7.0, SQL Server 2000, and SQL Server 2005.Reducing the size of the log directly relates to the amount of mechanical driveactivity. However, as explained above, the SQL Server 6.x algorithmmay expose committed transactions….”

For thisand more information on how SQL Server 7.0 changed the transaction log designto prevent sector rewrites, see SQL Server 7.0, SQL Server 2000, and SQL Server2005 logging and data storage algorithms extend data reliability on MSDN
(http://support.microsoft.com/kb/230785/en-us).

Logical-to-physicalmapping implementations may be dangerous and could lead to complex data losssituations.

Align with physical sector boundary

Implementationsthat present logical sector sizes introduce various complexities for SQL Serverand the operating system. For this discussion, we use an actual sector size of4 KB and a presented (logical) sector size of 512 bytes.

SQLServer database log or data files should be created on a physical sector-alignedboundary. However, because the operating system is presented with sectors ofsize 512 bytes, it can align the start of the SQL Server file at alocation that is not aligned with the physical sectors.

Thefollowing table shows how the logical sector presentation can cause SQL Serverdatabase log and data files to span sectors in an unaligned way. Sector read,modify, writes (RMWs) occur in order to handle the leading and trailing databoundaries.

As shownin the table, a write to Database File Page 0 must handle sectors from FileABCand File XYZ, which is at the head of the I/O in addition to Database File Page 1,which is at the tail of the I/O. I/O performance can decrease because of theextra misaligned activities.


Physical 4KB Sectors

Logical 512-Byte Sectors

Sector #1

File ABC

File XYZ

Database File Page 0 – Sector 1 of 16

Database File Page 0 – Sector 2 of 16

Database File Page 0 – Sector 3 of 16

Database File Page 0 – Sector 4 of 16

Database File Page 0 – Sector 5 of 16

Database File Page 0 – Sector 6 of 16




Sector #2

Database File Page 0 – Sector 7 of 16

Database File Page 0 – Sector 8 of 16

Database File Page 0 – Sector 9 of 16

Database File Page 0 – Sector 10 of 16

Database File Page 0 – Sector 11 of 16

Database File Page 0 – Sector 12 of 16

Database File Page 0 – Sector 13 of 16

Database File Page 0 – Sector 14 of 16


Sector #3

Database File Page 0 – Sector 15 of 16

Database File Page 0 – Sector 16 of 16

Database File Page 1 – Sector 1 of 16

Database File Page 1 – Sector 2 of 16

Database File Page 1 – Sector 3 of 16

Database File Page 1 – Sector 4 of 16

Database File Page 1 – Sector 5 of 16

Database File Page 1 – Sector 6 of 16



Youmight be tempted to work around this by creating a database on a freshlyformatted drive so that an alignment issue would not occur. The first faultwith this workaround is that there is no guarantee of physical sector alignmentwhen the operating system is presented a logical sector size. Also, it is generallyimpractical on a production system.

Anotherclever workaround would be to use a defragmentation utility. The fault here isthat the defragmentation utility is likely to work from the logical sector sizeand not the physical sector size. Although better data alignment proximitymight occur, it is just as likely that the movement using logical sectorboundaries could lead to sector-spanning situations shown in the example.

Nevertheless,another workaround would be to back up and restore the database to achievebetter physical alignment. To achieve new physical allocations, drop theoriginal database and run the restore. The new file creation and data copy mayalign with the physical sectors better or even reduce logical fragmentation. However,the start of the file could be unaligned and the restore attempt could causeall database pages to be unaligned as in the example shown earlier.

The peak of poor logical-to-physical fragmentation couldbe as bad as Max Sector Frags = Physical SectorSize / Logical Sector Size. In the example, this couldcreate a 16:1 ratioof sectors to SQL Server 8-KB data page space, causing pages of the database tobe stored on 16 unique sectors, intermixed with other data.

SQLServer lets you grow and shrink database log and data files. This causes theacquisition and release of disk space. This can lead to furtherlogical-to-physical fragmentation.

Anexample that uses files that can automatically grow is helpful. In thisexample, the original database was 10 GB with an auto-grow increment of 1 GB.The original 10GB allocation occurred on a clean drive and enabled all datapages to align on the physical sector size correctly. However, the same drivesupports storage of backups (this is a poor practice) and other files. Data isadded to the database that causes an auto-grow but the growth acquires the 1GBportion from the drive starting at an unaligned physical sector location. Thisleaves part of the database aligned and part of it unaligned to physical sectorboundaries.

Theperformance implications may be such that data values stored in the original 10GBpages can be read and written faster than that in the new 1-GB section of thedatabase. This would be a very difficult issue to track down and correct.

Align with 8KB boundary

Sectorsizes should always allow for 8KB alignment of SQL Server data pages. SQLServer writes data pages in 8KB increments; the physical sector sizes andalignment should always enable SQL Server to write the 8 KB so that it is alignedon sector boundaries or as a multiple of a smaller sector size. This is toprevent torn I/O conditions, data page spanning, and sector rewrites.

Sectorsizes of 4 KB enable the SQL Server data page to occupy two physicalsectors and allows for SQL Server data pages to be aligned on 8KB boundariesefficiently.

If asector size of 1536 bytes is used, the alignment for data pages is broken. Thiscreates 8KB pages that span the same physical sector. This results in the riskof unwanted sector rewrites (RMWs) of data pages. This hinders database performanceand could lead to unexpected data loss (torn) situations.


Larger transaction logs

The SQLServer transaction log files are always written with sector-aligned sizes andat sector-aligned boundaries. Most ordinary workloads do not require SQL Serverto significantly increase log space usage due to larger physical sector sizes. Forexample, concurrent transactions share log-block space.

AlthoughSQL Server tries to use the log space as efficiently as possible, certainapplication patterns cause the log-block fill percentages to remain small. Thefollowing extreme example points out the clear difference in the transaction logspace requirements as the sector size varies.


WHILE(@I< 10000)

BEGIN

BEGINTRAN

INSERTINTO tblTest values (‘A’, @I)

COMMITTRAN

SET@I = @I + 1

END

- vs -

BEGIN TRAN

WHILE(@I< 10000)

BEGIN

INSERT INTO tblTest values (‘A’, @I)

SET @I = @I + 1

END

COMMITTRAN


Thefirst example requires SQL Server to write the log records 10,000 times, aseach commit is processed. The second example enables SQL Server to pack logrecords and write all 10,000 inserts and the single commit at the same time.

Calculating the log usage for theexamples results in the following approximations.

·By using a 4KB sector size forthe first example, ~40 MB of disk space is used.

·By using a 512byte sector sizefor the first example, ~5 MB of disk space is used.

·By using a 4KB sector size forthe second example, ~1 MB of disk space is used.

·By using a 512-byte sector sizefor the second example, ~1 MB of disk space is used.


Thismeans for an I/O subsystem that reports sector sizes larger than 512 bytes, theSQL Server transaction log file could acquire physical space at the rate of “Rate= n / 512” where “n” is the new sectorsize. Application design can become critical to efficient log space usage.

Note: Be cautious. Expandingthe length of a transaction can have adverse affects on concurrency as locks continueto be held until the commit is finished.

Althoughtempdb is primarily non-logged,internal operations such as the allocation of a data page can be logged.Therefore, larger sector sizes can affect tempdbsize.

Restore and attach

A SQLServer database can be restored or attached on a system that has a smallersector size. To help guarantee data integrity, the larger sector size should beevenly divisible by the smaller sector size. For example a 4KB sourcerestored/attached to a 1KB or 512-byte destination is an evenly divisible size.Restoring or attaching to a smaller sector size such as 1536 bytes does notfill the “evenly divisible” requirement and immediately requires sector rewriteoperations to occur.

SQLServer is not designed to dynamically upgrade the database to the larger sectorsizes. SQL Server disallows restoring or attaching a database on a system with alarger sector size; it generates an error message and prevents the restore orattach operation. Enabling a database to function by using a smaller formatted sectorsize than the actual sector size violates the WAL protocol because the sectorsize variation guarantees the log records will not be correctly aligned withthe physical sector size and log records will be rewritten.

At thetime of publication, some current subsystems report sector sizes larger than512 bytes but most do not. The larger sector sizes involve newertechnology and require API changes at the operating system level. Futurechanges in Microsoft Windows and SQL Server will correctly support the largersector sizes and allow for dynamic adjustment of sector sizes.

Format for 4KB sectors run on smaller sectors

SQLServer prevents restoring or attaching a database in an environment that has alarger physical sector size than the sector size the database was formattedwith.

SQLServer can restore or attach a database formatted with a larger sector size ontosmaller, evenly divisible physical sectors. This is possible because the logwrite uses the original formatted size (the larger size of the two). SQL Servermay use a bit more log space in this configuration, but the writes to the logare larger than the physical sector size. This prevents rewrites as long as thesmaller sector size is an even divisor. An irregular division of (originalsector size / physical sector size) creates rewrites (RMWs) and should not beallowed.


System and sample databases

SQLServer 2005 ships all system (master,model, and msdb) and sample databases formatted with 4-KB sector sizes sothat they can be installed on a device with up to 4 KB. SQL Server 2000system and sample databases are 512-byte based, causing Setup to fail onlarger-sector drives.

Themodel database is used as the template when SQL Server creates tempdb and user databases. Only thecontents of the modeldatabase are used, not the full physical format. tempdb and new database creations usethe sector size reported by the operating system at the time of creation. Thisis performed on a per-file basis. Variance of sector sizes can then occuracross the files throughout a database. Make sure that the sector sizes of agiven I/O path are equal.

Determining theformatted sector size of database

Run the DBCCfileheader(‘<<dbname>>’) command to outputthe formatted sector size for each file in the database. The SectorSize column shows the formatted sector size in bytes.

What sector sizes does SQL Server support?

SQLServer currently supports the following sector sizes that are equal to or lessthan 4 KB.

·Physical sector sizes thatevenly divide into 4 KB

·Physical sector sizes that aresmaller than that of the database’s original formatted sector size as long asthe smaller sector size is evenly divisible into the originally formattedsector size

·512

·1024

·2048

·4096

Remote mirroring

Severalhardware vendors provide remote mirroring solutions. Remote mirroring captureswrites and duplicates them remotely. Because of the distance between theprincipal and mirror and the reluctance to affect the performance on theprincipal by making it wait for remote writes, remote mirroring systemsgenerally operate asynchronously. Very strict rules and protocols must beadhered to in order to safely use a remote mirroring technology with SQL Serverdata, log, and backup files.

For moreinformation on SQL Server support for remote mirroring solutions, search MSDNfor the Using Third Party Mirroring Solutions with SQL Server 2000 and 2005User Databases white paper.


Thefollowing is a brief recap of those attributes beyond the local WAL protocol thatare necessary for a successful remote mirroring deployment.

Write ordering: The order of the writes onthe primary must be the same exact order as that used to apply changes to themirror. If the write order is not maintained, the WAL protocol is destroyed andthe database will become corrupted.

Consistency groups: A consistency group makesall I/O actions to a set of volumes appear as a single I/O stream, keeping allwrite requests in exact order across the group. Without the ability to keepwrite ordering across a group of volumes intact, it is not safe to mirror SQLServer databases that span volumes. If each volume can keep a separate writeorder, the database log and data files on the mirror will not maintain thecorrect point-in-time semantics and the database will become corrupted.

Noticethat consistency groups can apply to local mirror solutions as well.

Restore points: Following is an example of the restorepoints or forks in a remotemirroring system.


Primary

Mirror

Full Database Backup


Log Backup #1 (LSN 1 to 100)


Log Backup #2 (LSN 101 to 200)


FAILOVER OCCURS



Log Backup #3 (LSN 101 to 150)


Log Backup #4 (LSN 150 to 200)


The restore point issue occurs at LSN 101. If the remote mirroring solution does not guarantee immediate delivery to the mirror,some write operations can be lost under a failure condition. In the example, theLSN changes from 150 to 200 at the primary are assumed to have been ‘in-play’at the time of the failover and nevermade it to the mirror. The mirror is brought online and therefore loses thedata between 150 and 200 because the data never made it to the mirror. Thedatabase is transactionally consistent on the mirror after recovery but somedata loss has occurred.


The log backups start after failover as shown in the example. Atthis point there are two outstanding issues which must be correctly resolved.

1.The primary may have changes ondisk that never made it to the mirror and therefore requires a full synchronizationwith the mirror. Your manufacturer may provide a more direct way to resynchronizethe primary after failover but until the primary is in synchronization with themirror, it is not safe to use the original primary again.

2.During restore, the primary‘Log Backup #2’ must be considered invalid. The restore sequence would be thefollowing:

a.Restore the Full DatabaseBackup.

b.Restore Log Backup #1.

c.Restore Log Backup #3. If LogBackup #2 is restored, log backups #3 and #4 cannot be restored because logrecords 150 to 201 on the mirror differ from originally lost on the primary.

d.Restore Log Backup #4.


转自:

http://technet.microsoft.com/zh-cn/library/cc917726.aspx