sql分区表上创建索引
介绍 (Introduction)
I work for a large, multinational financial institution. Like most companies in this field, ours is conservative and subject to much regulatory oversight. The first has meant that we’re slow to adopt new technologies. We need to be really, really, really sure things won’t break, customer accounts won’t vanish or lose their balances and that the regulators will not raise any red flags. Everything we do is subject to the scrutiny of our internal auditors, even something that seems simple, like rerunning a production job that failed because the host compute was down. I really mean everything!
我在一家大型跨国金融机构工作。 像该领域的大多数公司一样,我们是保守的,并受到许多监管监督。 首先,这意味着我们采用新技术的步伐很慢。 我们需要真的,真的, 真的要确保事情不会破裂,客户账户不会消失或失去余额,并且监管机构也不会提出任何危险信号。 我们所做的一切都受到内部审核员的审查,甚至是看似简单的事情,例如重新执行由于主机计算已失败而导致失败的生产作业。 我真的是什么意思!
As just one example of our conservatism, we just began rolling out SQL Server 2012 a year ago. That was in 2016 and just before SQL Server 2016 was RTM’d. What has this to do with table partitioning? Good question! Hang on just a bit while I fill in just a little more background.
仅作为保守主义的一个例子,我们一年前才开始推出SQL Server 2012。 那是在2016年,就在SQL Server 2016进行RTM之前。 这与表分区有什么关系? 好问题! 稍等一下,我再补充一些背景知识。
分区和索引 (Partitions and Indexes)
The department where I work is concerned with computing risk and giving those results to internal managers to understand and cover that risk with appropriate investments (usually called hedge funds). That means we pull in very large amounts of data every day to understand things like outstanding mortgages, loans and credit cards, among many other things. To manage that data, we partition our tables by month. To satisfy the regulators, we have to save that data for up to seven years. (Thank you, Sarbanes Oxley!)
我工作的部门负责计算风险,并将结果提供给内部经理,以了解并通过适当的投资(通常称为对冲基金)来弥补该风险。 这意味着我们每天都要提取大量数据,以了解诸如未偿还的抵押,贷款和信用卡之类的信息。 为了管理这些数据,我们按月对表进行分区。 为了使监管机构满意,我们必须将该数据保存长达七年。 (谢谢萨班斯·奥克斯利!)
When we ETL data every day, we follow a pretty common practice:
每天ETL数据时,我们都会遵循一种非常普遍的做法:
- Disable the indexes on the target table (at least the non-clustered indexes, or NCIs) 禁用目标表上的索引(至少是非聚集索引或NCI)
- Insert the new data 插入新数据
- Rebuild the NCIs 重建NCI
Working this way is usually optimal, since SQL Server does not have to update the NCIs while the data is being imported. However, imagine that you have seven years of data. That means 7 * 12 = 94 partitions, of which only one partition is active. So, for us, that index rebuild would be almost 99% redundant! Fortunately, you can write:
以这种方式工作通常是最佳的,因为在导入数据时SQL Server不必更新NCI。 但是,假设您有七年的数据。 这意味着7 * 12 = 94个分区,其中只有一个分区处于活动状态。 因此,对于我们来说,该索引重建将几乎有99%是冗余的! 幸运的是,您可以编写:
ALTER INDEX ALL ON REPORTS REBUILD PARTITION = n
For partition n. No more redundant index rebuilds! We’ll get into a real example in a moment, but first, what about those pesky auditors and regulators?
对于分区n。 不再重建多余的索引! 我们稍后将讨论一个真实的例子,但是首先,那些讨厌的审计师和监管者如何处理?
让那些审计师和监管者满意! (Keep those auditors and regulators happy!)
I used to work for a company that employed some very smart accountants who devised a clever – and illegal – accounting practice that they used to provide inflated results. Good for the shareholders, for a while, until the SEC discovered the hack and sent those too-clever-by-half accountants up the river.
我曾经在一家公司工作,该公司雇用了一些非常聪明的会计师,他们设计了一种聪明的,非法的会计惯例,用来提供虚高的业绩。 一段时间以来,对股东有利,直到美国证券交易委员会发现了黑客并把那些过于精明的会计师派往河边。
Given the scrutiny financial institutions are under, reporting correctly and maintaining the data to back up that reporting is critical. What has this to do with table partitioning? Simple, really. We need a way to ensure that partitions containing historical data – which we must retain for up to seven years – will not be changed inadvertently. The company is also responsible for criminal attempts to mess with the data.
考虑到金融机构的审查情况,正确报告并维护数据以支持报告至关重要。 这与表分区有什么关系? 很简单,真的。 我们需要一种方法来确保包含历史数据的分区(必须保留长达7年)不会被无意更改。 该公司还负责犯有篡改数据的犯罪企图。
There’s a mechanism built-in to SQL Server to satisfy even the most particular of auditors: Make the historical partitions read only.
SQL Server内置了一种机制,可以满足最特殊的审核员的需求:将历史分区设为只读。
只读分区 (Read Only Partitions)
In actuality, you cannot make a partition read only. However, what you can do, is make the filegroup that contains the partition read only. It’s really very simple:
实际上,您不能将分区设为只读。 但是,您可以做的是使包含分区的文件组为只读。 这真的非常简单:
ALTER DATABASE database_name MODIFY FILEGROUP filegroup_name READ_ONLY
Of course, there are a couple of big caveats here. First, you need to be sure that you really do not what to write to those partitions again. That’s really the point, actually. The whole strategy is useless if you need to continue to update the historical data (and what will you say to your auditors?)
当然,这里有两个重要的警告。 首先,您需要确保您真的不会再次写那些分区。 实际上,这就是重点。 如果您需要继续更新历史数据,那么整个策略就没有用了(您将对审核员说些什么?)
Second, and this is really important to keep in mind, a filegroup is a database-level object. So, you are affecting all tables that reside in that filegroup. In our case, that’s fine, since the tables have the same type of monthly processing. Your mileage may vary.
其次,记住这一点非常重要,文件组是数据库级的对象。 因此,您正在影响该文件组中的所有表。 就我们而言,这很好,因为这些表具有相同类型的每月处理方式。 你的旅费可能会改变。
试试吧! (Try it!)
I’m not going to put a partitioning tutorial in this article. Milica Medic did an excellent job in a previous article on SQL Shack. (See the references section.) I’m simply going to leverage her examples to create a partitioned table on my test database.
我不会在本文中放置分区教程。 Milica Medic在上一篇有关SQL Shack的文章中做得很好。 (请参阅参考资料部分。)我只是利用她的示例在测试数据库上创建一个分区表。
To create my test database, I used the following script:
要创建测试数据库,我使用以下脚本:
CREATE DATABASE PartitioningDB
ON
PRIMARY
(NAME = PartitioningDB,
FILENAME = 'C:\SQLShack\PartitioningDB.mdf'),
FILEGROUP January
(NAME = January,
FILENAME = 'C:\SQLShack\PartitioningDB_Jan.ndf'),
FILEGROUP February
(NAME = February,
FILENAME = 'C:\SQLShack\PartitioningDB_Feb.ndf'),
FILEGROUP March
(NAME = March,
FILENAME = 'C:\SQLShack\PartitioningDB_Mar.ndf'),
FILEGROUP April
(NAME = April,
FILENAME = 'C:\SQLShack\PartitioningDB_Apr.ndf'),
FILEGROUP May
(NAME = May,
FILENAME = 'C:\SQLShack\PartitioningDB_May.ndf'),
FILEGROUP June
(NAME = June,
FILENAME = 'C:\SQLShack\PartitioningDB_Jun.ndf'),
FILEGROUP July
(NAME = July,
FILENAME = 'C:\SQLShack\PartitioningDB_Jul.ndf'),
FILEGROUP August
(NAME = August,
FILENAME = 'C:\SQLShack\PartitioningDB_Aug.ndf'),
FILEGROUP September
(NAME = September,
FILENAME = 'C:\SQLShack\PartitioningDB_Sep.ndf'),
FILEGROUP October
(NAME = October,
FILENAME = 'C:\SQLShack\PartitioningDB_Oct.ndf'),
FILEGROUP November
(NAME = November,
FILENAME = 'C:\SQLShack\PartitioningDB_Nov.ndf'),
FILEGROUP December
(NAME = December,
FILENAME = 'C:\SQLShack\PartitioningDB_Dec.ndf')
LOG ON
(NAME = PartitioningDB_Log,
FILENAME = 'C:\SQLShack\PartitioningDB_.ldf')
If I look at the files in SSMS for this database now, I see:
如果现在查看此数据库的SSMS中的文件,则会看到:
Using this database, I create a table:
使用此数据库,我创建一个表:
CREATE TABLE Reports
(ReportDate datetime CONSTRAINT PK_Reports_ReportDate PRIMARY KEY,
MonthlyReport varchar(max))
ON PartitionBymonth (ReportDate);
GO
Now, I can do a standard index rebuild like this:
现在,我可以像这样进行标准索引重建:
ALTER INDEX ALL ON REPORTS REBUILD
To satisfy the auditors and regulators though, I need to mark the file group read only:
为了满足审核员和监管者的需要,我需要将文件组标记为只读:
ALTER DATABASE PartitioningDB MODIFY FILEGROUP January READ_ONLY
(January is behind us now.)
(现在1月在我们身后。)
Now, what happens when I want to rebuild my indexes?
现在,当我要重建索引时会发生什么?
ALTER INDEX ALL ON REPORTS REBUILD
Gives me:
给我:
Msg 1924, Level 16, State 2, Line 100
Filegroup ‘January’ is read-only.
消息1924,第16级,州2,第100行
文件组“一月”是只读的。
Oops! Well, I wanted the read only filegroup, so now I have to rebuild my indexes specifically:
糟糕! 好吧,我想要只读文件组,所以现在我必须专门重建索引:
ALTER INDEX ALL ON REPORTS REBUILD PARTITION = 2
Unfortunately, I cannot say “REBUILD PARTITION = February” I have to use the partition number here. But that’s OK. Milica’s article, Database table partitioning in SQL Server, has a query that I can use to get the partition numbers:
不幸的是,我不能说“ REBUILD PARTITION = February”,我必须在这里使用分区号。 但是没关系。 Milica的文章“ SQL Server中的数据库表分区”具有一个查询,我可以使用该查询来获取分区号:
SELECT
p.partition_number AS PartitionNumber,
f.name AS PartitionFilegroup,
p.rows AS NumberOfRows
FROM sys.partitions p
JOIN sys.destination_data_spaces dds ON p.partition_number = dds.destination_id
JOIN sys.filegroups f ON dds.data_space_id = f.data_space_id
WHERE OBJECT_NAME(OBJECT_ID) = 'Reports'
Which gives me:
这给了我:
That tells me that February is partition number 2, so I can write:
这告诉我2月是2号分区,因此我可以这样写:
ALTER INDEX ALL ON REPORTS REBUILD PARTITION = 2
Which gives me the reassuring result:
这给了我令人放心的结果:
Command(s) completed successfully.
命令已成功完成。
READ_ONLY是一个SQL Server标志 (READ_ONLY is a SQL Server flag)
You might think that making a filegroup read only would actually set the read only bit in the data file itself. I know I did the first time I did this. Here’s a screenshot from Explorer that proved me wrong:
您可能认为将文件组设置为只读实际上会在数据文件本身中设置只读位。 我知道我是第一次这样做。 这是资源管理器的屏幕截图,证明我错了:
Even though I told SQL Server that January is now read only, the actual data file is still flagged read write. You can change that yourself if you like, just don’t expect SQL Server to do it for you!
即使我告诉SQL Server一月现在是只读的,实际的数据文件仍标记为可读写。 如果愿意,您可以自己更改它,只是不要期望SQL Server为您做得到!
实施要求 (Implementation Requirements)
There’s no such thing as a free lunch, so let me be up front about the hoops you have to jump through to be able to use this strategy for rebuilding indexes in SQL Server.
没有免费的午餐之类的东西,所以让我首先了解一下您必须克服的障碍,才能使用此策略在SQL Server中重建索引。
When ALL is specified with partition number, this implies that the portion of the index that covers the rows in the partition must also reside in the same partition. Indexes like that are called aligned indexes. To create them you need to use the same partitioning function as for the rows covered by the index.
当用分区号指定ALL时,这意味着覆盖分区中行的索引部分也必须位于同一分区中。 像这样的索引称为对齐索引。 要创建它们,您需要使用与索引覆盖的行相同的分区功能。
This is implicit in my example table, since the Primary Key is created with the table and is also a clustered index. Recall that a clustered index is the actual table itself, sorted according to the index order, with a B-tree built above it.
这在我的示例表中是隐式的,因为主键是使用表创建的,并且也是聚集索引。 回想一下,聚集索引是实际的表本身,根据索引顺序排序,并在其上方构建了B树。
If, however, you are creating other indexes separately, you need to use the same partitioning function. So, say I wanted to add a non-clustered index on the MonthlyReport column. I need to remember to do it like this:
但是,如果要分别创建其他索引,则需要使用相同的分区功能。 因此,假设我想在MonthlyReport列上添加非聚集索引。 我需要记住这样做:
CREATE NONCLUSTERED INDEX IX_PK_Reports_MonthlyReport ON Reports(MonthlyReport)
ON PartitionBymonth (ReportDate);
GO
在线重建索引 (Rebuilding Indexes Online)
In a busy system, it can be a real drag if everything must stop while doing an index rebuild. That’s why SQL Server supports online index rebuilding. You simply add:
在繁忙的系统中,如果在进行索引重建时必须停止所有操作,那将是一次真正的拖累。 这就是SQL Server支持联机索引重建的原因。 您只需添加:
WITH (ONLINE = ON)
to your index rebuild statement. Whether you really need it or not is another question. It depends on how your database is being used. In my case, I’m loading up a database for analysis, not real-time online queries during the ETL process. So, I’m perfectly happy to do my index rebuilding offline. I find that it is often a little faster that way.
到索引重建语句。 您是否真的需要它是另一个问题。 这取决于数据库的使用方式。 就我而言,我正在加载数据库进行分析,而不是在ETL过程中进行实时在线查询。 因此,我非常高兴能够离线进行索引重建。 我发现这样做通常更快一些。
For me (and other users of SQL Server 2012), there’s a second reason to rebuild indexes offline. You cannot use the ONLINE option when rebuilding by partition in SQL Server 2012. That’s a drag, though I can live with it – at least until we get to use SQL Server 2016, even if I have to wait until 2020!
对于我(以及SQL Server 2012的其他用户)而言,还有第二个原因可以离线重建索引。 在SQL Server 2012中按分区重建时,不能使用ONLINE选项。尽管我可以忍受,但这是一个阻力–至少直到我们可以使用SQL Server 2016为止,即使我必须等到2020年!
摘要 (Summary)
I have two problems in my shop related to partitioned tables.
我的商店中有两个与分区表有关的问题。
- I need to protect historical data for up to seven years. 我需要保护历史数据长达七年。
- I need to have fast index rebuilding for daily ETL processing 我需要快速重建索引以进行每日ETL处理
I’ve been able to solve both by
我已经能够解决这两个问题
- Marking filegroups containing historical data read only. 标记包含历史数据的文件组为只读。
- Adding WITH PARTITION = n to my index rebuild statements. 在我的索引重建语句中添加WITH PARTITION = n。
Perhaps you can do the same.
也许您可以做同样的事情。
翻译自: https://www.sqlshack.com/options-for-partitioned-tables-and-indexes-in-sql-server/
sql分区表上创建索引