SQL Server 2005: SQL Server Storage and Index Structures
The Extent
An extent is the basic unit of storage used to allocate space for tables and indexes. It is made up of eight contiguous 64K data pages.
The Page
Much like an extent is a unit of allocation within the database, a page is the unit of allocation within a specific extent. There are eight pages to every extent.
A page is the last level you reach before you are at the actual data row. Whereas the number of pages per extent is fixed, the number of rows per page is not—that depends entirely on the size of the row, which can vary. You can think of a page as being something of a container for both table and index row data. A row is not allowed to be split between pages.
There are a number of different page types. For purposes of this book, the types we care about are:
❑ Data: Data pages are pretty self-explanatory—they are the actual data in your table, with the exception of any BLOB data that is not defined with the text in row option or varchar(max).
❑ Index: Index pages are also pretty straightforward: they hold both the non-leaf and leaf level pages (we’ll examine what these are later in the chapter) of a non-clustered index, as well as the non-leaf level pages of a clustered index. These index types will become much clearer as we continue through this chapter.
Page Splits
When a page becomes full, it splits. This means more than just a new page being allocated—it also means that approximately half the data from the existing page is moved to the new page.
The exception to this process is when a clustered index is in use. If there is a clustered index, and the next inserted row would be physically located as the last record in the table, then a new page is created and the new row is added to the new page without relocating any of the existing data.
Rows
You will hear much about “Row Level Locking,” so it shouldn’t be a surprise to hear this term. Rows can be up to 8KB.
In addition to the limit of 8,060 characters, there is also a maximum of 1,024 columns.
B-Trees
A Balanced Tree, or B-Tree, attempts to provide a consistent and relatively low-cost method of finding your way to a particular piece of information.
Index Types and Index Navigation
Although there are nominally two types of indexes in SQL Server (clustered and non-clustered), there are actually, internally speaking, three different types:
❑ Clustered indexes
❑ Non-clustered indexes—which comprise:
❑ Non-clustered indexes on a heap
❑ Non-clustered indexes on a clustered index
The way the physical data is stored varies between clustered and non-clustered indexes. The way SQL Server traverses the B-Tree to get to the end data varies between all three index types.
The indexes are built over either a clustered table (if the table has a clustered index) or what is called a heap (what’s used for a table without a clustered index).
Clustered Tables
A clustered table is any table that has a clustered index on it.
Heaps
A heap is any table that does not have a clustered index on it. In this case, a unique identifier, or row ID (RID) is created based on a combination of the extent, pages, and row offset (places from the top of the page) for that row. A RID is only necessary if there is no cluster key available (no clustered index).
Clustered Indexes
A clustered index is unique for any given table.
What makes a clustered index special is that the leaf level of a clustered index is the actual data—that is, the data is re-sorted to be stored in the same physical order that the index sort criteria state.Non-Clustered Indexes on a Heap
Non-clustered indexes on a heap work very similarly to clustered indexes in most ways. They do, however, have a few notable differences.
The leaf level is not the data—instead, it is the level at which you are able to obtain a pointer to that data. This pointer comes in the form of the RID, which is made up of the extent, page, and row offset for the particular row being pointed to by the index.
on-Clustered Indexes on a Clustered Table
With non-clustered indexes on a heap, we didn’t find the actual data, but did find an identifier
that let us go right to the data (we were just one step away). With non-clustered indexes on a clustered table,
we find the cluster-key.
The CREATE INDEX Statement
INDEX < index name > ON < table or view name > ( < column name > [ ASC|DESC ] [ ,n ] )
INCLUDE ( < column name > [ , n ] )
[ WITH
[PAD_INDEX = { ON | OFF } ]
[ [, ] FILLFACTOR = < fillfactor > ]
[ [, ] IGNORE_DUP_KEY = { ON | OFF }]
[ [, ] DROP_EXISTING = { ON | OFF }]
[ [, ] STATISTICS_NORECOMPUTE = { ON | OFF }]
[ [, ] SORT_IN_TEMPDB = { ON | OFF }]
[ [, ] ONLINE = { ON | OFF }
[ [, ] ALLOW_ROW_LOCKS = { ON | OFF }
[ [, ] ALLOW_PAGE_LOCKS = { ON | OFF }
[ [, ] MAXDOP = < maximum degree of parallelism >
]
[ ON {<filegroup> | <partition scheme name> | DEFAULT } ]
Creating XML Indexes
You can create indexes on columns in SQL Server that are of type XML. The primary requirements of doing this are:
❑ The table containing the XML you want to index must have a clustered index on it.
❑ A “primary” xml index must exist on the XML data column before you can create “secondary” indexes.
❑ XML indexes can only be created on columns of XML type.
❑ The XML column must be part of a base table—you cannot create the index on a view.