什么是数据库表分区? (What is a database table partitioning?)
Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan. The main of goal of partitioning is to aid in maintenance of large tables and to reduce the overall response time to read and load data for particular SQL operations.
分区是数据库过程,其中将非常大的表分为多个较小的部分。 通过将大表拆分为较小的单个表,仅访问少量数据的查询可以运行得更快,因为要扫描的数据较少。 分区的主要目的是帮助维护大型表,并减少读取和加载特定SQL操作数据的总响应时间。
SQL Server表上的垂直分区 (Vertical Partitioning on SQL Server tables)
Vertical table partitioning is mostly used to increase SQL Server performance especially in cases where a query retrieves all columns from a table that contains a number of very wide text or BLOB columns. In this case to reduce access times the BLOB columns can be split to its own table. Another example is to restrict access to sensitive data e.g. passwords, salary information etc. Vertical partitioning splits a table into two or more tables containing different columns:
垂直表分区主要用于提高SQL Server性能,特别是在查询从包含大量超宽文本或BLOB列的表中检索所有列的情况下。 在这种情况下,为了减少访问时间,可以将BLOB列拆分到自己的表中。 另一个示例是限制对敏感数据(例如密码,工资信息等)的访问。垂直分区将一个表拆分为两个或多个包含不同列的表:
垂直分区的一个例子 (An example of vertical partitioning)
An example for vertical partitioning can be a large table with reports for employees containing basic information, such as report name, id, number of report and a large column with report description. Assuming that ~95% of users are searching on the part of the report name, number, etc. and that only ~5% of requests are opening the reports description field and looking to the description. Let’s assume that all those searches will lead to the clustered index scans and since the index scan reads all rows in the table the cost of the query is proportional to the total number of rows in the table and our goal is to minimize the number of IO operations and reduce the cost of the search.
垂直分区的示例可以是一个大表,其中包含员工报告,这些报告包含基本信息,例如报告名称,ID,报告编号以及带有报告描述的大列。 假设约95%的用户正在搜索报告名称,编号等部分,并且只有5%的请求正在打开报告描述字段并查找描述。 假设所有这些搜索都将导致聚集索引扫描,并且由于索引扫描会读取表中的所有行,因此查询的成本与表中的总行数成比例,并且我们的目标是最大程度地减少IO数操作并降低搜索成本。
Let’s see the example on the EmployeeReports table:
让我们看一下EmployeeReports表上的示例:
CREATE TABLE EmployeeReports
(
ReportID int IDENTITY (1,1) NOT NULL,
ReportName varchar (100),
ReportNumber varchar (20),
ReportDescription varchar (max)
CONSTRAINT EReport_PK PRIMARY KEY CLUSTERED (ReportID)
)
DECLARE @i int
SET @i = 1
BEGIN TRAN
WHILE @i<100000
BEGIN
INSERT INTO EmployeeReports
(
ReportName,
ReportNumber,
ReportDescription
)
VALUES
(
'ReportName',
CONVERT (varchar (20), @i),
REPLICATE ('Report', 1000)
)
SET @i=@i+1
END
COMMIT TRAN
GO
If we run a SQL query to pull ReportID, ReportName, ReportNumber data from the EmployeeReports table the result set that a scan count is 5 and represents a number of times that the table was accessed during the query, and that we had 113,288 logical reads that represent the total number of page accesses needed to process the query:
如果我们运行SQL查询以从EmployeeReports表中提取ReportID , ReportName , ReportNumber数据,则结果集的扫描计数为5,并且表示在查询期间访问该表的次数,并且我们进行了113,288次逻辑读取,表示处理查询所需的页面访问总数:
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT er.ReportID, er.ReportName, er.ReportNumber
FROM dbo.EmployeeReports er
WHERE er.ReportNumber LIKE '%33%'
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
As indicated, every page is read from the data cache, whether or not it was necessary to bring that page from disk into the cache for any given read. To reduce the cost of the query we will change the SQL Server database schema and split the EmployeeReports table vertically.
如所示,从数据高速缓存中读取每个页面,无论是否有必要针对任何给定的读取将页面从磁盘带入高速缓存中。 为了减少查询的费用,我们将更改SQL Server数据库架构并垂直拆分EmployeeReports表。
Next we’ll create the ReportsDesc table and move the large ReportDescription column, and the ReportsData table and move all data from the EmployeeReports table except the ReportDescription column:
接下来,我们将创建ReportsDesc表和移动大ReportDescription列,ReportsData表和除ReportDescription列EmployeeReports表将所有数据:
CREATE TABLE ReportsDesc
( ReportID int FOREIGN KEY REFERENCES EmployeeReports (ReportID),
ReportDescription varchar(max)
CONSTRAINT PK_ReportDesc PRIMARY KEY CLUSTERED (ReportID)
)
CREATE TABLE ReportsData
(
ReportID int NOT NULL,
ReportName varchar (100),
ReportNumber varchar (20),
CONSTRAINT DReport_PK PRIMARY KEY CLUSTERED (ReportID)
)
INSERT INTO dbo.ReportsData
(
ReportID,
ReportName,
ReportNumber
)
SELECT er.ReportID,
er.ReportName,
er.ReportNumber
FROM dbo.EmployeeReports er
The same search query will now give different results:
现在,相同的搜索查询将给出不同的结果:
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT er.ReportID, er.ReportName, er.ReportNumber
FROM ReportsData er
WHERE er.ReportNumber LIKE '%33%'
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
Vertical partitioning on SQL Server tables may not be the right method in every case. However, if you have, for example, a table with a lot of data that is not accessed equally, tables with data you want to restrict access to, or scans that return a lot of data, vertical partitioning can help.
在每种情况下,对SQL Server表进行垂直分区可能都不是正确的方法。 但是,例如,如果您有一个表,其中有很多数据无法平等地访问,而表中有您想要限制访问的数据,或者扫描中返回的数据很多,那么垂直分区会有所帮助。
SQL Server表上的水平分区 (Horizontal Partitioning on SQL Server tables)
Horizontal partitioning divides a table into multiple tables that contain the same number of columns, but fewer rows. For example, if a table contains a large number of rows that represent monthly reports it could be partitioned horizontally into tables by years, with each table representing all monthly reports for a specific year. This way queries requiring data for a specific year will only reference the appropriate table. Tables should be partitioned in a way that queries reference as few tables as possible.
水平分区将一个表分为多个表,这些表包含相同数量的列,但行数较少。 例如,如果一个表包含代表月度报告的大量行,则可以按年份将其水平划分为多个表,每个表代表特定年份的所有月度报告。 这样,需要特定年份数据的查询将仅引用适当的表。 表的分区方式应使查询引用尽可能少的表。
Tables are horizontally partitioned based on a column which will be used for partitioning and the ranges associated to each partition. Partitioning column is usually a datetime column but all data types that are valid for use as index columns can be used as a partitioning column, except a timestamp column. The ntext, text, image, xml, varchar(max), nvarchar(max), or varbinary(max), Microsoft .NET Framework common language runtime (CLR) user-defined type, and alias data type columns cannot be specified.
根据将用于分区的列以及与每个分区关联的范围,对表进行水平分区。 分区列通常是日期时间列,但是除时间戳列外,所有可用作索引列的有效数据类型都可以用作分区列。 无法指定ntext,text,image,xml,varchar(max),nvarchar(max)或varbinary(max),Microsoft .NET Framework公共语言运行时(CLR)用户定义的类型和别名数据类型的列。
There are two different approaches we could use to accomplish table partitioning. The first is to create a new partitioned table and then simply copy the data from your existing table into the new table and do a table rename. The second approach is to partition an existing table by rebuilding or creating a clustered index on the table.
我们可以使用两种不同的方法来完成表分区。 首先是创建一个新的分区表,然后只需将现有表中的数据复制到新表中并进行表重命名。 第二种方法是通过在表上重建或创建聚簇索引来对现有表进行分区。
创建新分区表的水平分区示例 (An example of horizontal partitioning with creating a new partitioned table)
SQL Server 2005 introduced a built-in partitioning feature to horizontally partition a table with up to 1000 partitions in SQL Server 2008, and 15000 partitions in SQL Server 2012, and the data placement is handled automatically by SQL Server. This feature is available only in the Enterprise Edition of SQL Server.
SQL Server 2005引入了内置分区功能,可以在SQL Server 2008中对表进行水平分区,最多包含1000个分区,而在SQL Server 2012中对表进行最多15000个分区,并且数据放置由SQL Server自动处理。 此功能仅在SQL Server企业版中可用。
To create a partitioned table for storing monthly reports we will first create additional filegroups. A filegroup is a logical storage unit. Every database has a primary filegroup that contains the primary data file (.mdf). An additional, user-defined, filegrups can be created to contain secondary files (.ndf). We will create 12 filegroups for every month:
要创建用于存储月度报告的分区表,我们将首先创建其他文件组。 文件组是逻辑存储单元。 每个数据库都有一个包含主要数据文件(.mdf)的主要文件组。 可以创建其他用户定义的文件组,以包含辅助文件(.ndf)。 我们将为每个月创建12个文件组:
ALTER DATABASE PartitioningDB
ADD FILEGROUP January
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP February
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP March
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP April
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP May
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP June
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP July
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP Avgust
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP September
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP October
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP November
GO
ALTER DATABASE PartitioningDB
ADD FILEGROUP December
GO
To check created and available file groups in the current database run the following query:
要检查当前数据库中已创建的文件组和可用的文件组,请运行以下查询:
SELECT name AS AvailableFilegroups
FROM sys.filegroups
WHERE type = 'FG'
When filegrups are created we will add .ndf file to every filegroup:
创建文件组后,我们将向每个文件组添加.ndf文件:
ALTER DATABASE [PartitioningDB]
ADD FILE
(
NAME = [PartJan],
FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.LENOVO\MSSQL\DATA\PartitioningDB.ndf',
SIZE = 3072 KB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1024 KB
) TO FILEGROUP [January]
The same way files to all created filegroups with specifying the logical name of the file and the operating system (physical) file name for each filegroup e.g.:
以相同的方式将文件归档到所有创建的文件组,并指定文件的逻辑名和每个文件组的操作系统(物理)文件名,例如:
ALTER DATABASE [PartitioningDB]
ADD FILE
(
NAME = [PartFeb],
FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.LENOVO\MSSQL\DATA\PartitioningDB2.ndf',
SIZE = 3072 KB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1024 KB
) TO FILEGROUP [February]
To check files created added to the filegroups run the following query:
要检查添加到文件组中的已创建文件,请运行以下查询:
SELECT
name as [FileName],
physical_name as [FilePath]
FROM sys.database_files
where type_desc = 'ROWS'
GO
After creating additional filegroups for storing data we’ll create a partition function. A partition function is a function that maps the rows of a partitioned table into partitions based on the values of a partitioning column. In this example we will create a partitioning function that partitions a table into 12 partitions, one for each month of a year’s worth of values in a datetime column:
创建用于存储数据的其他文件组后,我们将创建分区功能。 分区功能是一种根据分区列的值将分区表的行映射到分区中的功能。 在此示例中,我们将创建一个分区函数,该函数将一个表划分为12个分区,每个分区对应于datetime列中一年中每个月的值:
CREATE PARTITION FUNCTION [PartitioningByMonth] (datetime)
AS RANGE RIGHT FOR VALUES ('20140201', '20140301', '20140401',
'20140501', '20140601', '20140701', '20140801',
'20140901', '20141001', '20141101', '20141201');
To map the partitions of a partitioned table to filegroups and determine the number and domain of the partitions of a partitioned table we will create a partition scheme:
要将分区表的分区映射到文件组并确定分区表的分区数和域,我们将创建一个分区方案:
CREATE PARTITION SCHEME PartitionBymonth
AS PARTITION PartitioningBymonth
TO (January, February, March,
April, May, June, July,
Avgust, September, October,
November, December);
Now we’re going to create the table using the PartitionBymonth partition scheme, and fill it with the test data:
现在,我们将使用PartitionBymonth分区方案创建表,并在其中填充测试数据:
CREATE TABLE Reports
(ReportDate datetime PRIMARY KEY,
MonthlyReport varchar(max))
ON PartitionBymonth (ReportDate);
GO
INSERT INTO Reports (ReportDate,MonthlyReport)
SELECT '20140105', 'ReportJanuary' UNION ALL
SELECT '20140205', 'ReportFebryary' UNION ALL
SELECT '20140308', 'ReportMarch' UNION ALL
SELECT '20140409', 'ReportApril' UNION ALL
SELECT '20140509', 'ReportMay' UNION ALL
SELECT '20140609', 'ReportJune' UNION ALL
SELECT '20140709', 'ReportJuly' UNION ALL
SELECT '20140809', 'ReportAugust' UNION ALL
SELECT '20140909', 'ReportSeptember' UNION ALL
SELECT '20141009', 'ReportOctober' UNION ALL
SELECT '20141109', 'ReportNovember' UNION ALL
SELECT '20141209', 'ReportDecember'
We will now verify the rows in the different partitions:
现在,我们将验证不同分区中的行:
SELECT
p.partition_number AS PartitionNumber,
f.name AS PartitionFilegroup,
p.rows AS NumberOfRows
FROM sys.partitions p
JOIN sys.destination_data_spaces dds ON p.partition_number = dds.destination_id
JOIN sys.filegroups f ON dds.data_space_id = f.data_space_id
WHERE OBJECT_NAME(OBJECT_ID) = 'Reports'
Now just copy data from your table and rename a partitioned table.
现在,只需从表中复制数据并重命名分区表即可。
使用SQL Server Management Studio分区向导对表进行分区 (Partitioning a table using the SQL Server Management Studio Partitioning wizard)
SQL Server 2008 introduced a table partitioning wizard in SQL Server Management Studio.
SQL Server 2008在SQL Server Management Studio中引入了表分区向导。
Right click on a table in the Object Explorer pane and in the Storage context menu choose the Create Partition command:
右键单击“ 对象资源管理器”窗格中的表,然后在“ 存储”上下文菜单中选择“ 创建分区”命令:
In the Select a Partitioning Column window, select a column which will be used to partition a table from available partitioning columns:
在“ 选择分区列”窗口中,选择一列,该列将用于从可用分区列中对表进行分区:
Other options in the Create Partition Wizard dialog include the Collocate this table to the selected partition table option used to display related data to join with the partitioned column and the Storage Align Non Unique Indexes and Unique Indexes with an Indexed Partition Column option that aligns all indexes of the partitioned table with the same partition scheme.
“ 创建分区向导”对话框中的其他选项包括“ 将该表并置到选定的分区表”选项,该选项用于显示相关数据以与分区列联接,以及“ 存储非唯一索引和唯一索引与带有索引分区列的唯一索引”选项,用于对齐所有索引具有相同分区方案的分区表。
After selecting a column for partitioning click the Next button. In the Select a Partition Function window enter the name of a partition function to map the rows of the table or index into partitions based on the values of the ReportDate column, or choose the existing partition function:
选择要分区的列后,单击“下一步”按钮。 在“ 选择分区函数”窗口中,输入分区函数的名称,以根据ReportDate列的值将表或索引的行映射到分区中,或者选择现有的分区函数:
Click the Next button and in the Select a Partition Scheme window create the partition scheme to map the partitions of the MonthlyReport table to different filegroups:
单击“下一步”按钮,然后在“ 选择分区方案”窗口中创建分区方案,以将MonthlyReport表的分区映射到不同的文件组:
Click the Next button and in the Map Partitions window choose the rage of partitioning and select the available filegroups and the range boundary. The Left boundary is based on Value <= Boundary and the Right boundary is based on Value < Boundary.
单击“下一步”按钮,然后在“ 地图分区”窗口中选择分区范围,然后选择可用的文件组和范围边界。 左边界基于值<=边界,右边界基于值<边界。
By clicking the Set boundaries button you can customize the date range and set the start and the end date for each partition:
通过单击“ 设置边界”按钮,您可以自定义日期范围,并设置每个分区的开始日期和结束日期:
The Estimate storage option determines the Rowcount, the Required space, and the Available space columns that displays an estimate on required space and available space based on number of records in the table.
“ 估计存储”选项确定“行数 ”,“所需空间”和“可用空间”列,这些列根据表中的记录数显示对所需空间和可用空间的估计。
The next screen of the wizard offers to choose the option to whether to execute the script immediately by the wizard to create objects and a partition table, or to create a script and save it. A schedule for executing the script to perform the operations automatically can also be specified:
向导的下一个屏幕提供选择选项,该选项是由向导立即执行脚本以创建对象和分区表,还是创建并保存脚本。 还可以指定执行脚本以自动执行操作的时间表:
The next screen of the wizard shows a review of selections made in the wizard:
向导的下一个屏幕显示对向导中所做选择的回顾:
Click the Finish button to complete the process:
单击完成按钮以完成该过程:
参考资料 (References)
翻译自: https://www.sqlshack.com/database-table-partitioning-sql-server/