数据库约束查找的约束
Constraints exist as a way to enforce or document rules within the database. How do they affect performance, and what are the benefits of using them?
约束是在数据库中强制执行或记录规则的一种方式。 它们如何影响性能?使用它们有什么好处?
介绍 (Introduction)
There are a variety of constraints that can be implemented in a SQL Server database that can provide a variety of new functionality. Some are so familiar that we forget their presence, such as NULL constraints, while others may not be used as often, such as check constraints or default constraints.
在SQL Server数据库中可以实现各种约束,这些约束可以提供各种新功能。 有些非常熟悉,以至于我们忘记了它们的存在,例如NULL约束,而有些则可能不那么常用,例如检查约束或默认约束。
This is an opportunity to dive into each type of constraint, and after introducing their purpose and usage, dive into their storage, performance, and notes about their optimal use, as well as pitfalls. Please note that all demos were created and tested in SQL Server 2016 RC1.
这是深入研究每种约束类型的机会,并在介绍了它们的用途和用途之后,深入研究了它们的存储,性能以及关于它们的最佳使用以及陷阱的注意事项。 请注意,所有演示均已在SQL Server 2016 RC1中创建和测试。
目的 (Purpose)
Constraints may be used for a variety of purposes. The following is a short list of what I consider the most common reasons that constraints are created:
约束可以用于多种目的。 以下是我认为创建约束的最常见原因的简短列表:
- Prevent bad data from being entered into tables of interest. 防止将不良数据输入到感兴趣的表中。
- Enforce business logic at the database-level. 在数据库级别实施业务逻辑。
- Documentation of important database rules. 重要数据库规则的文档。
- Enforce relational integrity between any number of tables. 增强任意数量表之间的关系完整性。
- Improve database performance. 提高数据库性能。
- Enforce uniqueness. 增强唯一性。
There are others, of course, but the reasons for using constraints are quite varied, and can be a helpful tool for database administrators and developers alike. Typically, it is recommended to use constraints over more complex, procedural logic-enforcement, such as triggers, rules, stored procedures, or jobs.
当然,还有其他原因,但是使用约束的原因千差万别,对于数据库管理员和开发人员而言,它都是有用的工具。 通常,建议对更复杂的过程逻辑执行使用约束,例如触发器,规则,存储过程或作业。
Let’s review one-at-a-time each type of constraint and some examples of their usage.
让我们一次查看每种约束类型及其用法的一些示例。
非空约束 (NOT NULL constraints)
This is so common that we may forget that a column that does not allow NULLs is technically in possession of a NOT NULL constraint. Its function is straightforward: a column with this constraint will not allow a NULL to be entered, and will throw an error when anyone attempts to enter a NULL into a column that doesn’t allow NULLs.
这是如此常见,以至于我们可能忘记了不允许NULL的列在技术上拥有NOT NULL约束。 它的功能很简单:具有此约束的列将不允许输入NULL,并且当任何人试图在不允许NULL的列中输入NULL时都会抛出错误。
NULL may be defined at the time a table is defined, such as like this:
可以在定义表时定义NULL,例如:
CREATE TABLE dbo.NullDemo
( Null_Id INT NOT NULL,
Null_Data VARCHAR(50) NULL,
Not_Null_Data VARCHAR(50) NOT NULL);
The syntax is simple, and in this table, Null_Id and Not_Null_Data will not allow any NULLs to be entered into either. Null_Data, on the other hand, will allow NULL, in addition to standard string values. When creating a table, you can choose to not specify NULL or NOT NULL, and if you do, the column will default to being NULLable. Since NOT NULL is an explicit constraint, the default setting for any column is to allow NULLs.
语法很简单,在此表中, Null_Id和Not_Null_Data不允许将任何NULL输入其中一个。 另一方面, Null_Data除了标准字符串值外,还将允许NULL。 创建表时,可以选择不指定NULL或NOT NULL,如果这样做,则该列将默认为NULLable。 由于NOT NULL是显式约束,因此任何列的默认设置是允许NULL。
Using this table, let’s insert a few rows into it:
使用此表,让我们在其中插入几行:
INSERT INTO dbo.NullDemo
(Null_Id, Null_Data, Not_Null_Data)
SELECT
1, 'Stegosaurus', 'Spiky';
INSERT INTO dbo.NullDemo
(Null_Id, Null_Data, Not_Null_Data)
SELECT
2, 'Tyrannosaurus', 'Huge';
Both of these inserts succeed without incident, adding two dinosaurs into our demo table.
这两个插入都成功了,没有发生意外,将两个恐龙添加到了我们的演示表中。
INSERT INTO dbo.NullDemo
(Null_Id, Null_Data, Not_Null_Data)
SELECT
3, NULL, 'Long Neck';
This also works without a problem. Inserting a NULL is OK since we explicitly defined Null_Data to allow NULLs.
这也没有问题。 插入NULL是可以的,因为我们明确定义了Null_Data以允许NULL。
INSERT INTO dbo.NullDemo
(Null_Id, Null_Data, Not_Null_Data)
SELECT
4, 'Pterodactylus ', NULL;
Our poor dino-flyer never makes it into the database as we tried to insert a NULL into the last column, Not_Null_Data, which we applied the NOT NULL constraint to. The result is the expected error:
当我们试图在最后一列Not_Null_Data中插入NULL时,我们可怜的dino-flyer从未进入数据库,因为我们将NOT NULL约束应用于了该列。 结果是预期的错误:
Msg 515, Level 16, State 2, Line 27
Cannot insert the value NULL into column ‘Not_Null_Data’, table ‘AdventureWorks2014.dbo.NullDemo’; column does not allow nulls. INSERT fails.
The statement has been terminated.
信息515,第16级,州2,第27行
无法将值NULL插入表'AdventureWorks2014.dbo.NullDemo'的'Not_Null_Data'列中; 列不允许为空。 插入失败。
该语句已终止。
NOT NULL is exceptionally useful when we have a table with columns that are critical to the definition of that data structure. For example, an account without a name would be potentially meaningless, whereas an employee with no start date would cause us to scratch our heads and ask why no one entered it in the first place.
当我们有一个表的列对于该数据结构的定义至关重要时,NOT NULL异常有用。 例如,没有名称的帐户可能毫无意义,而没有开始日期的员工会导致我们挠头,问为什么没有人首先输入。
NOT NULL corrects mistakes by the user as well as mistakes by a software or database developer. If a column is required, then we want to ensure that it is always entered, no matter what. The application should tell a user that they need to enter a required field, returning a friendly message if they forget. In the event that the application somehow allows a NULL where one does not belong, then an error would be returned that the developers could easily troubleshoot and fix. This scenario is preferable to allowing data to be created with critical missing elements.
NOT NULL纠正用户的错误以及软件或数据库开发人员的错误。 如果需要一列,那么我们要确保始终输入该列,无论如何。 应用程序应告知用户他们需要输入必填字段,如果忘记了,则会返回友好的消息。 如果应用程序以某种方式允许一个不属于其中的NULL,则将返回错误,开发人员可以轻松地进行故障排除和修复。 此方案比允许使用严重缺失的元素创建数据更好。
There are potentially dangerous or inappropriate uses for NOT NULL. While we tend to prefer not having NULLs to deal with in our code, forcing arbitrary values into our data can be dangerous. A NOT NULL string that defaults to a blank is relatively harmless, but what about a NOT NULL date or a NOT NULL integer? How many times have we run into a date column that was peppered with “1/1/1900” within its data? Or the integer that uses “-1” to fill in the blanks? This data can become confusing as the columns appear to have valid values, when in fact they contain dummy data. Special application or database code needs to check for these values and act appropriately when they exist. In addition, what happen if a user deliberately enters a value that matches a dummy value? Will the application disallow this? Will the database consider their entry irrelevant and the same as taking some default instead? What if your great grandfather was actually born on 1/1/1900? As of the writing of this article, there are people older than 116 years old and have birthdays that fall around this oft-used dummy date. Improbable — sure, but if I was 116 years old and got an error while signing up for Snapchat, I’d be pretty pissed off 🙂
NOT NULL有潜在的危险或不当使用。 尽管我们倾向于不使用NULL来处理代码,但是将任意值强加到数据中可能会很危险。 默认为空白的NOT NULL字符串相对无害,但是NOT NULL日期或NOT NULL整数呢? 我们有多少次遇到了一个日期列,该日期列的数据中带有“ 1/1/1900”? 还是使用“ -1”填充空格的整数? 由于这些列似乎包含有效值,而实际上它们包含伪数据,因此这些数据可能会造成混乱。 特殊的应用程序或数据库代码需要检查这些值,并在它们存在时采取适当措施。 此外,如果用户故意输入与虚拟值匹配的值会怎样? 应用程序会不允许这样做吗? 数据库是否会认为它们的条目无关紧要,而与默认值相同? 如果您的曾祖父实际上是在1900年1月1日出生的,该怎么办? 截至本文撰写时,有些人的年龄超过116岁,生日通常在该虚拟日期附近。 不太可能-当然,但是如果我116岁,注册Snapchat时遇到错误,我会非常生气🙂
We could avoid this problem in a variety of ways:
我们可以通过多种方式避免此问题:
- Allow NULL in those columns 在这些列中允许NULL
- Create a BIT column that signifies the special case of when data doesn’t exist, such as if a foreign national doesn’t have a US driver’s license, or if a newborn doesn’t have a social security number (yet). 创建一个BIT列,以表示不存在数据的特殊情况,例如,如果外国人没有美国驾驶执照,或者新生儿没有社会保险号(尚未)。
- Create a separate table to store this data when it exists, thereby eliminating the need for odd/NULL data in the interim. This is a potentially good solution when a particular column is very sparse, ie: it is very often not entered or rarely required or present. 创建一个单独的表来存储此数据(如果存在),从而在此期间不需要奇数/ NULL数据。 当特定的列非常稀疏时,这是一个潜在的好解决方案,即:它通常不输入或很少需要或不存在。
- Concatenate the oft-missing data with another column. For example, address lines 1, 2, and 3 could be combined in order to avoid the need for the additional address lines that may or may not have valid data in them. 将经常丢失的数据与另一列连接。 例如,可以将地址线1、2和3组合在一起,以避免需要在其中可能有或没有有效数据的其他地址线。
A column can be changed to NULL or NOT NULL at any time in the future using an ALTER TABLE ALTER COLUMN statement. If we attempt to change a column to NOT NULL that contains any NULLs, then that TSQL will fail:
将来可以随时使用ALTER TABLE ALTER COLUMN语句将列更改为NULL或NOT NULL。 如果我们尝试将包含任何NULL的列更改为NOT NULL,则该TSQL将失败:
ALTER TABLE dbo.NullDemo ALTER COLUMN Null_Data VARCHAR(50) NOT NULL;
We previously entered a single row into this table with a NULL in this column. When we try to change it to NOT NULL, the error we receive is similar to what we saw earlier:
我们之前在此表中输入了一行,此列中为NULL。 当我们尝试将其更改为NOT NULL时,我们收到的错误类似于我们之前看到的错误:
Msg 515, Level 16, State 2, Line 32
Cannot insert the value NULL into column ‘Null_Data’, table ‘AdventureWorks2014.dbo.NullDemo’; column does not allow nulls. UPDATE fails.
The statement has been terminated.
消息515,第16级,州2,第32行
无法将值NULL插入表'AdventureWorks2014.dbo.NullDemo'的列'Null_Data'中; 列不允许为空。 更新失败。
该语句已终止。
The schema change of the column to NOT NULL is seen as an update to all values within, which fails due to a single row containing a NULL.
将列更改为NOT NULL的模式更改被视为对其中所有值的更新,由于包含NULL的单行而失败。
NULL is interesting because it is not stored in the same manner as the rest of your data. Since NULL is not a value, but an indicator of non-existence, it is stored in a separate part of each row in a structure called the NULL bitmap. This structure allows SQL Server to determine the NULLability of a column without the need to read the data itself in order to figure it out. The NULL bitmap exists for all columns in a table, NULLable or not, and therefore a table full of NOT NULL columns will not use less disk space via the absence of the NULL bitmap. The exception to this are sparse columns—if a table is defined entirely by sparse columns, then a NULL bitmap will not be present. We’ll call that an edge case, but there is no harm in being thorough here.
NULL很有趣,因为它的存储方式与其余数据不同。 由于NULL不是值,而是不存在的指示符,因此它存储在每行的单独部分中,称为NULL位图。 这种结构使SQL Server无需确定数据本身就可以确定列的可空性。 NULL位图对于表中的所有列都存在,是否可以为NULL,因此,如果没有NULL位图,则充满NOT NULL列的表将不使用较少的磁盘空间。 稀疏列是个例外,如果表完全由稀疏列定义,则将不存在NULL位图。 我们称这种情况为“边缘情况”,但在这里进行深入介绍没有任何害处。
The performance benefits of NOT NULL are subtle, but do exist. The most straight-forward benefit is if you query a NOT NULL column for NULL:
NOT NULL的性能优势是微妙的,但确实存在。 最直接的好处是,如果您在NOT NULL列中查询NULL:
SELECT
ProductID,
Name
FROM Production.Product
WHERE ReorderPoint IS NULL;
It turns out that ReorderPoint is a NOT NULL column, and since SQL Server knows this via the table metadata, it can bypass the need to read any actual data at all and return an empty result set very efficiently. When run, this query returns no results (no surprise there), but can do so with zero logical reads and practically no query cost:
事实证明ReorderPoint是一个NOT NULL列,并且由于SQL Server通过表元数据知道这一点,因此它可以完全不需要读取任何实际数据并非常有效地返回空结果集。 在运行时,此查询不返回任何结果(在此不出意外),但是可以在逻辑读取为零且几乎没有查询成本的情况下返回结果:
The constant scan operator in the execution plan indicates that no tables were accessed, and scanning internal metadata about Production.Product was all that was needed to complete the query. The query cost is tiny, as is usage of CPU and memory when performing what is a very simple task.
执行计划中的常量扫描运算符指示没有表被访问,而扫描有关Production.Product的内部元数据是完成查询所需的全部。 查询成本很小,执行非常简单的任务时占用的CPU和内存也很少。
Otherwise, reads on NULL or NOT NULL columns will be similar. If we were to create two versions of a column, one that allows NULLs and another that does not and query against each one, the results would be virtually the same. Some experiments I ran with a variety of queries against NULL and NOT NULL columns produced a very small amount of variation, but I was able to ultimately chalk it up to index fragmentation.
否则,对NULL或NOT NULL列的读取将是相似的。 如果我们要创建列的两个版本,一个版本允许NULL,而另一个版本不允许,并针对每个版本进行查询,则结果实际上是相同的。 我对NULL和NOT NULL列进行各种查询的一些实验产生的变化很小,但最终我可以将其归纳为索引碎片。
独特的约束 (Unique constraints)
A unique constraint allows us to take any column or set of columns and enforce uniqueness on their contents. This allows us to apply business or common-sense rules to our data, for example:
唯一性约束使我们可以采用任何一列或一组列并对其内容实施唯一性。 这使我们可以将业务或常识规则应用于我们的数据,例如:
- Ensure that no two people have the same social security number. 确保没有两个人具有相同的社会保险号。
- Prevent any accounts from sharing a name. 防止任何帐户共享名称。
- Verify that a set of data elements are unique, such as credit card type, credit card number, and expiration date. 验证一组数据元素是否唯一,例如信用卡类型,信用卡号和有效期。
- Make sure that a relationship between two entities is never repeated. 确保两个实体之间的关系不会重复。
Unique constraints are physically implemented as unique indexes, and provide the performance benefit of an index on the unique columns, while also enforcing uniqueness. There are a number of ways to define a unique index. If we were looking to add a unique index on Null_Id in our previous example, we could do so with the following syntaxes:
唯一约束在物理上被实现为唯一索引,并在唯一列上提供索引的性能优势,同时还强制执行唯一性。 定义唯一索引的方法有很多。 如果在上一个示例中希望在Null_Id上添加唯一索引,则可以使用以下语法:
ALTER TABLE dbo.NullDemo ADD CONSTRAINT UX_NullDemo_Null_Id UNIQUE (Null_Id);
ALTER TABLE dbo.NullDemo ADD CONSTRAINT UX_NullDemo_Null_Id UNIQUE NONCLUSTERED (Null_Id);<
ALTER TABLE dbo.NullDemo ADD CONSTRAINT UX_NullDemo_Null_Id UNIQUE CLUSTERED (Null_Id);
CREATE UNIQUE NONCLUSTERED INDEX UX_NullDemo_Null_Id ON dbo.NullDemo (Null_Id);
CREATE UNIQUE CLUSTERED INDEX UX_NullDemo_Null_Id ON dbo.NullDemo (Null_Id);
These all accomplish similar goals, but there are differences. The 1st and 2nd options will create unique constraints and do so as non-clustered indexes. The 3rd will do the same, but as a clustered index. Remember that you can only have a single clustered index per table, so if one already exists, creating another will throw an error. The 4th and 5th code snippets also creates unique indexes, but there is an interesting, subtle difference in how these constraints are represented in SQL Server. After executing the 1st code snippet, Management Studio shows the following objects within the NullDemo table:
这些都实现相似的目标,但是存在差异。 第1 次和第2 次的选项将创造独特的约束和非聚集索引这样做。 第三名将执行相同的操作,但将其作为聚集索引。 请记住,每个表只能有一个聚集索引,因此,如果一个聚集索引已经存在,则创建另一个聚集索引将引发错误。 第 4和第 5个的代码片段还创建唯一索引,但有一个有趣的,微妙的差异在这些限制是如何在SQL Server中表示。 执行完第一个代码段后,Management Studio在NullDemo表中显示以下对象:
Note that a non-clustered index was created, in addition to a unique key. Now, let’s drop this index using the standard syntax:
请注意,除了唯一键之外,还创建了非聚集索引。 现在,让我们使用标准语法删除该索引:
DROP INDEX UX_NullDemo_Null_Id ON dbo.NulLDemo;
The result is an error:
结果是一个错误:
Msg 3723, Level 16, State 5, Line 38
An explicit DROP INDEX is not allowed on index ‘dbo.NulLDemo.UX_NullDemo_Null_Id’. It is being used for UNIQUE KEY constraint enforcement.
Msg 3723,第16级,状态5,第38行
索引'dbo.NulLDemo.UX_NullDemo_Null_Id'上不允许使用显式DROP INDEX。 它用于UNIQUE KEY约束实施。
Hmmm, my attempt to drop a unique index failed as it’s considered a constraint. Let’s try the drop constraint syntax:
嗯,我尝试删除唯一索引的尝试失败了,因为它被视为约束。 让我们尝试放置约束语法:
ALTER TABLE dbo.NullDemo DROP CONSTRAINT UX_NullDemo_Null_Id;
This time, the constraint drops as expected:
这次,约束按预期下降:
Command(s) completed successfully.
命令已成功完成。
Strange, but we’ll move on for now and come back to this in a bit. Let’s create the index using the 4th snippet and examine the contents of the table via SQL Server Management Studio:
奇怪,但是我们现在继续,再回到这一点。 让我们使用第4 个代码段创建索引,并通过SQL Server Management Studio检查表的内容:
After refreshing the tree above, we notice that there is nothing under the “Keys” section! We were being lead to believe that a unique index and unique constraint are the same, but clearly there are differences. Let’s drop the constraint as we did before:
刷新上面的树后,我们注意到“键”部分下没有任何内容! 我们被认为唯一索引和唯一约束是相同的,但是显然存在差异。 让我们像以前一样删除约束:
ALTER TABLE dbo.NullDemo DROP CONSTRAINT UX_NullDemo_Null_Id;
Running this yields an error (again!?):
运行此命令将产生错误(再次!?):
Msg 3728, Level 16, State 1, Line 41
‘UX_NullDemo_Null_Id’ is not a constraint.
Msg 3727, Level 16, State 0, Line 41
Could not drop constraint. See previous errors.
消息3728,第16层,状态1,第41行
'UX_NullDemo_Null_Id'不是约束。
消息3727,级别16,状态0,第41行
无法删除约束。 请参阅先前的错误。
SQL Server can’t seem to make up its mind! Let’s use the alternate DROP INDEX syntax:
SQL Server似乎无法下定决心! 让我们使用替代的DROP INDEX语法:
DROP INDEX UX_NullDemo_Null_Id ON dbo.NulLDemo;
Once again, we have success:
我们再次取得成功:
Command(s) completed successfully.
命令已成功完成。
Let’s check out entries in SQL Server’s catalog views for constraints and indexes to verify how these objects are perceived there:
让我们在SQL Server的目录视图中检出约束和索引条目,以验证在那里如何看待这些对象:
SELECT
*
FROM sys.key_constraints
WHERE name = 'UX_NullDemo_Null_Id';
SELECT
*
FROM sys.indexes
WHERE indexes.name = 'UX_NullDemo_Null_Id';
For the first constraint creation, using the ALTER TABLE…ADD CONSTRAINT syntax, a row is returned from each view:
对于第一个约束创建,使用ALTER TABLE…ADD CONSTRAINT语法,从每个视图返回一行:
That’s as expected, now let’s run the exact same queries after we use the CREATE UNIQUE INDEX syntax:
符合预期,现在让我们在使用CREATE UNIQUE INDEX语法之后运行完全相同的查询:
Sys.key_constraints returns nothing, implying that SQL Server does delineate between a unique index and a unique constraint, despite the fact that their functionality appears identical.
Sys.key_constraints不返回任何内容,这意味着SQL Server确实在唯一索引和唯一约束之间进行了描述,尽管它们的功能看起来相同。
The only difference (aside from the various syntax above) that I have arrived at between unique constraints and unique indexes is that unique constraints fit the ANSI standard for unique constraints, whereas a unique index technically does not. Performance between these alternatives is identical in all tests I could dream up, and the behavior when attempting to violate uniqueness is the same. To test this, let’s create a unique constraint and try to insert a duplicate value into our table:
我在唯一约束和唯一索引之间得出的唯一区别(除了上述各种语法)是唯一约束适合唯一约束的ANSI标准,而唯一索引在技术上不适合。 在我可以梦想的所有测试中,这些替代方案之间的性能是相同的,并且尝试违反唯一性时的行为是相同的。 为了测试这一点,让我们创建一个唯一约束,然后尝试在表中插入重复的值:
ALTER TABLE dbo.NullDemo ADD CONSTRAINT UX_NullDemo_Null_Id UNIQUE (Null_Id);
INSERT INTO dbo.NullDemo
(Null_Id, Null_Data, Not_Null_Data)
SELECT
3, 'Wooly Mammoth ', 'Furry!';
The result is the expected error:
结果是预期的错误:
Msg 2627, Level 14, State 1, Line 80 Violation of UNIQUE KEY constraint ‘UX_NullDemo_Null_Id’. Cannot insert duplicate key in object ‘dbo.NullDemo’. The duplicate key value is (3). The statement has been terminated.
消息2627,级别14,状态1,第80行违反UNIQUE KEY约束'UX_NullDemo_Null_Id'。 无法在对象'dbo.NullDemo'中插入重复的密钥。 重复键值为(3)。 该语句已终止。
Let’s repeat the experiment with a unique index:
让我们用唯一索引重复该实验:
ALTER TABLE dbo.NullDemo DROP CONSTRAINT UX_NullDemo_Null_Id;
CREATE UNIQUE NONCLUSTERED INDEX UX_NullDemo_Null_Id ON dbo.NullDemo (Null_Id);
INSERT INTO dbo.NullDemo
(Null_Id, Null_Data, Not_Null_Data)
SELECT
3, 'Wooly Mammoth ', 'Furry!';
Again, we get an error, but the text is slightly different, further differentiating between a unique index and a unqiue constraint:
再次,我们得到一个错误,但是文本略有不同,从而进一步区分了唯一索引和非限制约束:
Msg 2601, Level 14, State 1, Line 91 Cannot insert duplicate key row in object ‘dbo.NullDemo’ with unique index ‘UX_NullDemo_Null_Id’. The duplicate key value is (3). The statement has been terminated.
消息2601,级别14,状态1,第91行无法在具有唯一索引“ UX_NullDemo_Null_Id”的对象“ dbo.NullDemo”中插入重复的键行。 重复键值为(3)。 该语句已终止。
To summarize everything above, a unique constraint is a constraint and an index, whereas a unique index is only an index. Despite the differences, the behavior of each is identical. Given the choice, I prefer to stick to standards and increased documentation, and would lean towards implementing unique constraints instead of unique indexes. Neither is incorrect and Microsoft makes no formal declaration for or against either, but the differing visibility of each is meaningful and inclusion in two sets of useful metadata can be handy when managing schema and metadata in the future.
综上所述,唯一约束是约束和索引,而唯一索引仅是索引。 尽管存在差异,但是每个的行为都是相同的。 如果有选择,我宁愿坚持标准和增加文档编制,并且倾向于实现唯一约束而不是唯一索引。 两者都不是不正确的,并且Microsoft都没有正式声明支持或反对任何声明,但是每种视图的不同可见性都是有意义的,将来在管理架构和元数据时,将它们包含在两组有用的元数据中可能会很方便。
As an aside, creating a unique clustered index serves similar functionality to a clustered primary key, whereas a unique non-clustered index is very similar to a non-clustered primary key. Keep this in mind as we move forward.
顺便说一句,创建唯一的聚集索引的功能与聚集的主键相似,而唯一的非聚集索引与非聚集主键非常相似。 在前进的过程中,请牢记这一点。
Unique constraints can also be created in-line, such as in this example:
唯一约束也可以在线创建,例如以下示例:
CREATE TABLE dbo.NullDemo2
( Null_Id2 INT NOT NULL CONSTRAINT UX_NullDemo2_Null_Id2 UNIQUE,
Null_Data2 VARCHAR(50) NULL,
Not_Null_Data2 VARCHAR(50) NOT NULL);
In this similar table, we define a unique (non-clustered) constraint on Null_Id2 within the table definition. If you know that a unique constraint will be needed up-front when the table is defined, you can take advantage of this simplified syntax and include it, instead of defining the constraint later on in your script.
在这个类似的表中,我们在表定义内的Null_Id2上定义了唯一的(非集群)约束。 如果您知道在定义表时首先需要一个唯一的约束,则可以利用此简化语法并将其包括在内,而不是稍后在脚本中定义约束。
How do unique constraints affect performance? The most significant way is in how they affect cardinality. For a column with a unique constraint, we know that there can never be more than one of any given value. Statistics will often provide the necessary information in order to determine this, though, but in the event that statistics are unavailable or incomplete, this little bit of information provided by the unique index can assist in building a more efficient execution plan.
唯一约束如何影响性能? 最重要的方式是它们如何影响基数。 对于具有唯一约束的列,我们知道任何给定值都不能超过一个。 统计信息通常会提供必要的信息来确定这一点,但是,如果统计信息不可用或不完整,则唯一索引提供的少量信息可以帮助构建更有效的执行计划。
The work of a unique index occurs on an insert or update, when it is necessary to check and see if a value already exists prior to making the change. What does this cost? To investigate, we’ll look at the unique index AK_SalesOrderDetail_rowguid on Sales.SalesOrderDetail. Let’s run a simple update on the table:
唯一索引的工作发生在插入或更新上,需要在更改之前检查并查看值是否已存在。 这要花多少钱? 为了进行调查,我们将查看Sales.SalesOrderDetail上的唯一索引AK_SalesOrderDetail_rowguid 。 让我们在表上运行一个简单的更新:
UPDATE Sales.SalesOrderDetail
SET rowguid = 'A207C96D-D9E6-402B-8470-2CC176C42283'
WHERE rowguid = 'B207C96D-D9E6-402B-8470-2CC176C42283';
After executed, we can review the STATISTICS IO output and execution plan in order to see the work that was performed:
执行后,我们可以查看STATISTICS IO的输出和执行计划,以查看已执行的工作:
Table ‘SalesOrderDetail’. Scan count 0, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
表“ SalesOrderDetail”。 扫描计数为0,逻辑读取为22,物理读取为0,预读为0,lob逻辑读取为0,lob物理读取为0,lob提前读取为0。
We can see that the entire index had to be checked (via an index seek) in order to verify that the value I am assigning does not already exist. Of course, the index had to be read anyway in order to locate the row that needed updating. Let’s replace the unique index with a standard non-clustered index:
我们可以看到必须检查整个索引(通过索引查找)以验证我分配的值不存在。 当然,无论如何都必须读取索引,以便找到需要更新的行。 让我们用标准的非聚集索引替换唯一索引:
ALTER TABLE Sales.SalesOrderDetail DROP CONSTRAINT AK_SalesOrderDetail_rowguid;
CREATE NONCLUSTERED INDEX AK_SalesOrderDetail_rowguid ON Sales.SalesOrderDetail
( rowguid ASC );
Now let’s run a similar update to above:
现在,让我们运行与上面类似的更新:
UPDATE Sales.SalesOrderDetail
SET rowguid = 'B207C96D-D9E6-402B-8470-2CC176C42283'
WHERE rowguid = 'A207C96D-D9E6-402B-8470-2CC176C42283';
The results are similar to before:
结果与之前类似:
Table ‘SalesOrderDetail’. Scan count 1, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
表“ SalesOrderDetail”。 扫描计数1,逻辑读25,物理读0,预读0,lob逻辑读0,lob物理读0,lob预读0。
It turns out that the effort required to enforce uniqueness is the same effort needed to locate the rows for update anyway. Let’s perform the same experiment with an INSERT, which will not require us to locate any rows prior to the operation:
事实证明,强制唯一性所需的工作与定位行以进行更新所需的工作相同。 让我们使用INSERT执行相同的实验,它不需要我们在操作之前定位任何行:
DELETE FROM Sales.SalesOrderDetail WHERE rowguid = 'B207C96D-D9E6-402B-8470-2CC176C42283';
INSERT INTO Sales.SalesOrderDetail
(SalesOrderID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate)
SELECT
43659,
'4911-403C-98',
1,
776,
1,
2024.994,
0.00,
'B207C96D-D9E6-402B-8470-2CC176C42283',
'2011-05-31 00:00:00.000'
With the unique constraint in place, the performance is as follows:
有了唯一的约束,性能如下:
Table ‘SalesOrderDetail’. Scan count 0, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
表“ SalesOrderDetail”。 扫描计数0,逻辑读9,物理读0,预读0,lob逻辑读0,lob物理读0,lob预读0。
Now we’ll replace the unique constraint with a non-unique non-clustered index and repeat the experiment:
现在,我们将非唯一约束替换为非唯一非聚集索引,然后重复实验:
ALTER TABLE Sales.SalesOrderDetail DROP CONSTRAINT AK_SalesOrderDetail_rowguid;
CREATE NONCLUSTERED INDEX AK_SalesOrderDetail_rowguid ON Sales.SalesOrderDetail
( rowguid ASC )
DELETE FROM Sales.SalesOrderDetail WHERE rowguid = 'B207C96D-D9E6-402B-8470-2CC176C42283';
INSERT INTO Sales.SalesOrderDetail
(SalesOrderID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate)
SELECT
43659,
'4911-403C-98',
1,
776,
1,
2024.994,
0.00,
'B207C96D-D9E6-402B-8470-2CC176C42283',
'2011-05-31 00:00:00.000'
The results end up being exactly the same as before. The presence of an index is all that is required in order to validate uniqueness, and since a unique constraint always creates an underlying index, this requirement is fulfilled automatically.
结果最终与以前完全相同。 索引的存在是验证唯一性所需要的全部,并且由于唯一约束始终会创建基础索引,因此该要求会自动满足。
While we may have thought that the need to validate uniqueness would be expensive or a burden to our write operations, these experiments (as well as any more we can dream up) show that the performance difference of writes with a unique constraint versus a non-clustered index are insignificant. Of course, the addition of a unique constraint to a column that previously had no index at all would benefit from the addition of the index in the same way that any indexed column will speed up reads at the expense of the write operations needed to maintain it.
尽管我们可能认为验证唯一性的需求会很昂贵,或者会给我们的写操作带来负担,但这些实验(以及我们梦dream以求的其他事情)表明,具有唯一约束的写性能与非唯一写条件的写性能差异聚簇索引无关紧要。 当然,向以前根本没有索引的列添加唯一约束将受益于索引的添加,其方式与任何索引列都将加快读取速度相同,而维护该列需要进行写操作。
主键约束 (Primary Key constraints)
A primary key is THE uniquely identifying metadata for a table. This is a critical part of database design and it is generally important to define a primary key on all tables within a database. Primary keys serve as a unique constraint, a unique index, and as the ANSI standard that defines uniqueness in a table. You may only define a single primary key per table, regardless of whether it is clustered or non-clustered. A primary key may be created in-line with table creation or via an ALTER TABLE statement.
主键是一个表的唯一标识的元数据。 这是数据库设计的关键部分,在数据库中的所有表上定义主键通常很重要。 主键用作唯一约束,唯一索引以及定义表中唯一性的ANSI标准。 您只能为每个表定义一个主键,而不管它是集群的还是非集群的。 主键可以与表创建内联或通过ALTER TABLE语句创建。
Primary keys carry special significance in database design. Anyone that is looking at a table for the first time will be checking to see what the primary key is defined as for the table. Is it an integer ID? Is it an account name? Is it a combination of several important data elements? Does it auto-number as an IDENTITY, or are values inserted somehow via a programmatic or user-driven process?
主键在数据库设计中具有特殊的意义。 第一次查看表的任何人都将检查以查看为表定义了什么主键。 它是整数ID吗? 是账户名吗? 它是几个重要数据元素的组合吗? 它会自动编号为IDENTITY,还是通过编程或用户驱动的过程以某种方式插入值?
A primary key by definition is a NOT NULL column. Attempts to insert NULL into a primary key column will result in an error. Similarly, adding a primary key on a column with duplicate values or NULLs will also generate an error.
根据定义,主键是NOT NULL列。 尝试将NULL插入主键列将导致错误。 同样,在具有重复值或NULL的列上添加主键也会产生错误。
In our previous example, we could have created the table with a primary key on Null_Id:
在前面的示例中,我们可以在Null_Id上创建具有主键的表:
CREATE TABLE dbo.NullDemo
( Null_Id INT NOT NULL CONSTRAINT PK_NullDemo PRIMARY KEY CLUSTERED,
Null_Data VARCHAR(50) NULL,
Not_Null_Data VARCHAR(50) NOT NULL);
This table definition declares, up-front, that Null_Id is the logical unique identifier for any row within the NulLDemo table. Any other tables that reference NullDemo via a foreign key will need to reference this primary key. It is also likely that most joins to NullDemo will do so via Null_Id as it guarantees a predictable cardinality (one row per Null_Id).
该表定义预先声明Null_Id是NulLDemo表中任何行的逻辑唯一标识符。 通过外键引用NullDemo的任何其他表都需要引用此主键。 大多数加入NullDemo的连接也有可能会通过Null_Id这样做,因为它保证了可预测的基数(每个Null_Id一行)。
A primary key can also be added to a table later on, assuming that the contents of the column fit the prerequisites of a primary key: Uniqueness and no NULLs:
假设列的内容符合主键的前提:唯一性且没有NULL,则以后也可以将主键添加到表中。
ALTER TABLE dbo.NullDemo DROP CONSTRAINT PK_NullDemo;
ALTER TABLE dbo.NullDemo ADD CONSTRAINT PK_NullDemo PRIMARY KEY CLUSTERED (Null_Id);
Within SQL Server Management Studio, a primary key will be given a special notation, separate from all other indexes, keys, and constraints:
在SQL Server Management Studio中,主键将被赋予特殊的表示法,与所有其他索引,键和约束分开:
The yellow key signifies a primary key, and as discussed earlier, there can only be one per table. As with a unique index, a primary key will require an index to be maintained in order to enforce uniqueness. A primary key may be clustered or non-clustered, depending on whether this column is the most common or efficient organizing column for the table.
黄色键表示一个主键,如前所述,每个表只能有一个。 与唯一索引一样,主键将要求维护索引以强制唯一性。 主键可以是群集的,也可以是非群集的,这取决于此列是表的最常见还是最有效的组织列。
Typically, the clustered index on a table is also the primary key, though there is no requirement or enforcement of this convention. Special cases may arise in which it makes sense to define a primary key separately from a clustered index. A plausible scenario for this would be when the primary key on a table is an auto-numbering integer identity, but the clustered index is on a pair of critical identifiers. When designing a table, consider carefully whether it makes sense for the clustered index to be separate from the primary key, or if having them be one and the same is simpler and more efficient.
通常,表上的聚集索引也是主键,尽管没有要求或强制执行此约定。 可能出现特殊情况,其中有必要与聚簇索引分开定义主键。 可能的情况是,表上的主键是自动编号的整数标识,而聚集索引位于一对关键标识符上。 设计表时,请仔细考虑将聚簇索引与主键分开是否有意义,或者使它们成为一个主键是否更简单,更有效。
Note that in order for a primary key to be dropped, any foreign keys on other tables that reference it must also be dropped first. In addition, if the primary key is also the clustered index, then all other non-clustered index will need to be dropped before SQL Server will allow you to drop the clustered primary key. This is because non-clustered indexes reference the clustered index and you are not given the liberty of pulling the rug out from under a table’s logical storage.
请注意,为了删除主键,引用它的其他表上的所有外键也必须先删除。 此外,如果主键也是聚集索引,则在SQL Server允许您删除聚集主键之前,需要删除所有其他非聚集索引。 这是因为非聚簇索引引用了聚簇索引,并且您没有从表的逻辑存储空间中拉出地毯的自由。
The performance of primary keys is based on the performance of a handful of components:
主键的性能基于少数组件的性能:
- The unique index created for use by the primary key. 创建供主键使用的唯一索引。
- Any foreign keys that can benefit from referencing the primary ley. 任何可以从引用主ley中受益的外键。
- If the primary key is also the clustered index, then all benefits of a clustered index will also be realized. 如果主键也是聚簇索引,那么聚簇索引的所有好处也将实现。
- The NOT NULL constraint that is required by a primary key. 主键要求的NOT NULL约束。
A primary key in of itself does not provide any performance magic, but the building blocks that are created for and alongside a primary key each provide their own benefits. In the same way that all tables should have a primary key, tables should also have a carefully chosen clustered index. Reading and writing to tables without a clustered index will be expensive and incur a variety of penalties due to the inefficient nature of heaps.
主键本身并不能提供任何性能魔咒,但是为主键创建的并与主键一起创建的构建块各自具有各自的优势。 与所有表都应具有主键的方式相同,表还应具有精心选择的聚集索引。 读取和写入不具有聚集索引的表将很昂贵,并且由于堆的低效率而导致各种罚款。
When chosen wisely, a clustered primary key can provide the benefits of a variety of constraints: NOT NULL, uniqueness, clustered index, and foreign keys. If you’re omitting a primary key (clustered or otherwise) from a table and believe it to be the right choice, be sure to thoroughly test and verify that such a hypothesis is indeed true. I’ve heard many arguments over the years as to why certain types of tables benefit from the omission of a primary key or clustered index, but rarely have those debates been based on fact. If you can prove that either does not help performance, and never will in the future, then you may consider omitting them, though I suspect that to be a very rare scenario!
如果明智地选择,集群主键可以提供各种约束的好处:NOT NULL,唯一性,集群索引和外键。 如果您从表中省略主键(集群键或其他键),并认为它是正确的选择,请确保彻底测试并验证这种假设的确是正确的。 多年来,我曾听到许多争论,为什么某些类型的表会因省略主键或聚集索引而受益,但是这些争论很少基于事实。 如果您可以证明其中之一对性能没有帮助,将来也无济于事,那么您可以考虑省略它们,尽管我怀疑这是非常罕见的情况!
默认约束 (Default Constraints)
A default constraint allows you to automatically populate a predetermined value for a column when none is provided upon insert. This is handy when a column is optional, but in the absence of a user-defined value, another is used instead. Some common uses of default constraints include:
默认约束允许您在插入时未提供列的预定值时自动为其填充该值。 当列为可选列时,这很方便,但是在没有用户定义的值的情况下,将使用另一个。 默认约束的一些常见用法包括:
- An active bit for accounts that always defaults to 1 for new rows. 帐户的活动位,新行始终默认为1。
- A notes column is populated with an empty string when no notes are entered. 没有输入注释时,注释列中会填充一个空字符串。
- The current date and time are entered into a last updated date/time column. 当前日期和时间输入到最近更新的日期/时间列中。
- The user performing an action is logged whenever a row is inserted. 每当插入行时,都会记录执行操作的用户。
- To differentiate between data that is user-provided and that which is system generated. 区分用户提供的数据和系统生成的数据。
Default constraints allow you to maintain a NOT NULL constraint on a column, even when a user or application may not provide a value for it. Be careful not to use default constraints to eliminate NULLable columns when they make the most sense for your data model. For example, a default of “1/1/1900” for a date column may seem nonsensical enough to avoid confusion, but could result in application complexity in the future if any developers forget about this default, or if they attempt to perform date math on the column without checking for the default dummy value.
默认约束允许您在列上维护NOT NULL约束,即使用户或应用程序可能不为其提供值。 当对数据模型最有意义时,请注意不要使用默认约束来消除NULLable列。 例如,日期列的默认值“ 1/1/1900”看起来似乎很荒谬,可以避免混淆,但是如果任何开发人员忘记了该默认值,或者他们尝试执行日期数学运算,则将来可能导致应用程序复杂化而不检查默认哑元值。
In order to demonstrate default constraints, we’ll create a new table to experiment on:
为了演示默认约束,我们将创建一个新表进行实验:
CREATE TABLE dbo.Accounts
( Account_Id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_Accounts PRIMARY KEY CLUSTERED,
Account_Name VARCHAR(100) NOT NULL,
Account_Notes VARCHAR(1000) NOT NULL CONSTRAINT DF_Accounts_Account_Notes DEFAULT (''),
Is_Active BIT NOT NULL CONSTRAINT DF_Accounts_Is_Active DEFAULT (1),
Create_Datetime DATETIME NOT NULL CONSTRAINT DF_Accounts_Create_Datetime DEFAULT (CURRENT_TIMESTAMP),
Created_By_User VARCHAR(50) NULL CONSTRAINT DF_Accounts_Created_By_User DEFAULT (ORIGINAL_LOGIN()) );
I’ve included plenty of default constraints here, using a variety of data types. Note that for these and all other constraints created in this article, I’ve consistently included constraint names. It is a good practice to ALWAYS name any objects that you create. If you create one without a name, SQL Server will automatically generate a name that will be neither elegant or helpful. For example, if I left the name off of the default on Created_By_User, it automatically gets named DF__Accounts__Create__6521F869. If I drop and recreate the table, it is named DF__Accounts__Create__6ADAD1BF. These names are not very meaningful, nor are they consistent, so always name everything, your life will be easier that way 🙂
我在这里使用了多种数据类型,其中包括许多默认约束。 请注意,对于本文中创建的所有这些约束以及所有其他约束,我一直都包含约束名称。 始终命名您创建的任何对象都是一个好习惯。 如果您创建一个没有名称的名称,SQL Server将自动生成一个既不优雅也不有用的名称。 例如,如果我将名称保留为Created_By_User上的默认名称,它将自动命名为DF__Accounts__Create__6521F869 。 如果删除并重新创建表,该表将命名为DF__Accounts__Create__6ADAD1BF 。 这些名字不是很有意义,也不是一致的,所以总是给所有的名字起名字,这样会使你的生活更轻松🙂
Let’s say I insert a row into the table using this TSQL:
假设我使用此TSQL在表中插入一行:
INSERT INTO dbo.Accounts
(Account_Name, Account_Notes)
SELECT
'Dinosaur Corp.',
'The leader in dinosaur personal care products.';
While I only provided the account name and notes, the remaining columns were populated for me using the defaults that we expect and would want. This is a great way to audit data as it is created and simplify INSERT statements.
虽然我只提供了帐户名称和注释,但其余的列是使用我们期望和想要的默认值为我填充的。 这是审核数据创建和简化INSERT语句的好方法。
Default constraints will automatically fill-in-the-blanks when you do not provide a value, but will not override an explicit NULL. If you insert a row and intentionally attempt to put a NULL into a NOT NULL column with a default, then you’ll get an error.
当您不提供值时,默认约束将自动填充空白,但不会覆盖显式NULL。 如果您插入一行并有意尝试将NULL设置为具有默认值的NOT NULL列,那么您将得到一个错误。
INSERT INTO dbo.Accounts
(Account_Name, Account_Notes, Is_Active)
SELECT
'Pterodactyls Anonymous',
'The world''s trusted support group for our flying friends.',
NULL
The TSQL above will throw the following error:
上面的TSQL将引发以下错误:
Msg 515, Level 16, State 2, Line 152 Cannot insert the value NULL into column ‘Is_Active’, table ‘AdventureWorks2014.dbo.Accounts’; column does not allow nulls. INSERT fails.
消息515,级别16,状态2,第152行无法将值NULL插入表'AdventureWorks2014.dbo.Accounts'的'Is_Active'列中; 列不允许为空。 插入失败。
Default constraints do not override explicit values, even if they are NULL. Similarly, an attempt to insert a NULL into a column that allows NULLs will do so and ignore the default:
默认约束不会覆盖显式值,即使它们为NULL。 同样,尝试将NULL插入允许NULL的列中也会这样做,并且会忽略默认值:
INSERT INTO dbo.Accounts
(Account_Name, Account_Notes, Created_By_User)
SELECT
'Pterodactyls Anonymous',
'The world''s trusted support group for our flying friends.',
NULL
This TSQL executes successfully, resulting in the following data:
该TSQL成功执行,得到以下数据:
Note that the NULL in the INSERT statement overrides the default constraint on that column.
请注意,INSERT语句中的NULL会覆盖该列上的默认约束。
Any constant may be used in a default constraint so long as its value is deterministic as of the time it is assigned. The current date or time will default to the date/time at the moment the row is created, whereas the login of the user will default to the user at that same moment, and will not change later. Defaults cannot reference other tables, contain subqueries, reference other columns in the same table, or otherwise require any information beyond a static constant. Computed columns are a nice way to gain some of those benefits, if there is no need for a combination of user/system generated values.
任何常量都可以在默认约束中使用,只要其值在分配时就具有确定性即可。 当前日期或时间将默认为创建行时的日期/时间,而用户的登录名将同时为该用户的默认名,并且以后将不会更改。 默认值不能引用其他表,包含子查询,引用同一表中的其他列或以其他方式要求除静态常量之外的任何信息。 如果不需要用户/系统生成的值的组合,则计算列是获得其中一些好处的好方法。
The performance impact of default constraints will only be felt if a new column is added and there is a need to backfill all existing rows with a default constraint for that new column. Otherwise, you will not see any increased IO as a result of the use of default constraints. The cost to insert a value yourself vs. the cost to have the default constraint do so are identical. The cost to insert an explicit value into a column with a default constraint is the same as the cost to let the constraint do the work for you.
仅当添加新列并且需要用该新列的默认约束回填所有现有行时,才会感觉到默认约束的性能影响。 否则,由于使用默认约束,您不会看到任何增加的IO。 自己插入值的成本与具有默认约束的成本是相同的。 在具有默认约束的列中插入显式值的成本与让约束为您完成工作的成本相同。
检查约束 (Check Constraints)
Sometimes we want to enforce specific logic on our data, but doing so through stored procedures or application logic would be extremely complex or risky. Check constraints allow us to validate data whenever it is written, thus allowing us to check for specific data or compare different columns in a table. These constraints can verify the value of a single column or they can compare any number of columns to ensure data integrity.
有时我们想对数据执行特定的逻辑,但是通过存储过程或应用程序逻辑来执行则非常复杂或冒险。 检查约束使我们能够在每次写入数据时对其进行验证,从而使我们可以检查特定数据或比较表中的不同列。 这些约束可以验证单个列的值,也可以比较任何数量的列以确保数据完整性。
For example, here is a table with some check constraints defined on it:
例如,这是一个表,上面定义了一些检查约束:
CREATE TABLE dbo.Dinosaur
( Dinosaur_Id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_Dinosaur PRIMARY KEY CLUSTERED,
Dinosaur_Name VARCHAR(100) NOT NULL,
Dinosaur_Type VARCHAR(25) NOT NULL CONSTRAINT CK_Dinosaur_Dinosaur_Type CHECK (Dinosaur_Type IN ('Theropod', 'Sauropod', 'Thyreophora', 'Cerapod')),
Is_Herbivore BIT NOT NULL );
ALTER TABLE dbo.Dinosaur ADD CONSTRAINT CK_Dinosaur_Is_Herbivore CHECK ((Dinosaur_Type = 'Theropod' AND Is_Herbivore = 0) OR (Dinosaur_Type <> 'Theropod' AND Is_Herbivore = 1));
Here, we define a check constraint on a single column in the Dinosaur table, and then create an additional one afterwards that references multiple columns. If we insert a row into the table and it meets both conditions, then we’ll have no problem:
在这里,我们在Dinosaur表中的单个列上定义一个检查约束,然后再创建一个引用多个列的约束。 如果我们在表中插入一行并且满足两个条件,那么我们将没有问题:
INSERT INTO dbo.Dinosaur
(Dinosaur_Name, Dinosaur_Type, Is_Herbivore)
SELECT
'Micropachycephalosaurus',
'Cerapod',
1;
This dinosaur meets the criteria we imposed on the table, so it is inserted normally:
该恐龙符合我们强加在桌子上的条件,因此可以正常插入:
How cute! Let’s say we tried to add a row for T-Rex and accidentally set him as an herbivore…
多么可爱! 假设我们尝试为T-Rex添加一行,并将其偶然设置为草食动物……
INSERT INTO dbo.Dinosaur
(Dinosaur_Name, Dinosaur_Type, Is_Herbivore)
SELECT
'Tyrannosaurus Rex',
'Theropod',
1;
When you violate a check constraint, the result is an error that tells you specifically which constraint was the cause of the error, as well as where it is located:
当您违反检查约束时,结果是一个错误,该错误会具体告诉您是哪个约束导致了错误,以及错误的位置:
Msg 547, Level 16, State 0, Line 196 The INSERT statement conflicted with the CHECK constraint “CK_Dinosaur_Is_Herbivore”. The conflict occurred in database “AdventureWorks2014”, table “dbo.Dinosaur”.
消息547,级别16,状态0,行196 INSERT语句与CHECK约束“ CK_Dinosaur_Is_Herbivore”相冲突。 冲突发生在数据库“ AdventureWorks2014”的表“ dbo.Dinosaur”中。
We immediately know that the error was caused by CK_Dinosaur_Is_Herbivore and can quickly look at the definition of that constraint and verify if my insert statement was no good or if the check constraint was erroneous. This raises an important point with regards to the maintenance of check constraints in that they require it. Any check constraint that references specific data that could change will require review in the same way that application or database code is reviewed. If a new dinosaur is discovered tomorrow that comprises a type not shown here, we would need to update the check constraint to include it. Alternatively, if a new variety of herbivorous cryolophosaurus is discovered, we’d need to adjust our rules to account for it. This can result in check constraints becoming very complex, hard to maintain, and error-prone. For scenarios where lots of metadata is involved or it changes often, consider creating a table to store valid types, rather than trying to squeeze all that logic into a check constraint.
我们立即知道该错误是由CK_Dinosaur_Is_Herbivore引起的,可以快速查看该约束的定义并验证我的insert语句是否不好,或者检查约束是否错误。 这在维护检查约束方面提出了一个重要的观点,因为他们需要它。 引用可能更改的特定数据的任何检查约束都将需要以与检查应用程序或数据库代码相同的方式进行检查。 如果明天发现一种新的恐龙,其类型未在此处显示,则我们需要更新检查约束以将其包括在内。 另外,如果发现了一种新的草食性低温龙,我们需要调整规则以解决这一问题。 这可能导致检查约束变得非常复杂,难以维护并且容易出错。 对于涉及大量元数据或它经常变化的场景,请考虑创建一个表来存储有效类型,而不是尝试将所有逻辑都挤压到检查约束中。
One additional note on check constraints: NULL is not a value in the same way that a number or string is. As a result, if a check constraint exists on a NULLable column, and the constraint doesn’t explicitly validate NULL in some fashion, then NULL will not fail. For example, if our Dinosaur_Type column from above were NULLable, then an operation that puts NULL into that column will not violate the check constraint. Remember that any comparison that operates against a NULL evaluates to unknown, which is not false. This is a similar effect to when you use an equality in a WHERE clause against a column with NULLs in it. If a check constraint exists on a column that allows NULL, be sure to account for it by either including a NULL check in the constraint, or by changing the column to NOT NULL.
关于检查约束的另一条注释:NULL不是与数字或字符串相同的值。 结果,如果在NULLable列上存在检查约束,并且该约束没有以某种方式显式验证NULL,则NULL不会失败。 例如,如果上面的Dinosaur_Type列可为NULL,则将NULL放入该列的操作不会违反检查约束。 请记住,任何针对NULL进行的比较都会得出未知值,这不是错误的。 这与在WHERE子句中对其中包含NULL的列使用等于时的效果类似。 如果在允许NULL的列上存在检查约束,请确保通过在约束中包括NULL检查或将列更改为NOT NULL来解决它。
There are other reasons to be cautious with check constraints. While they are an effective way to bolster data integrity, it is easy to go a bit wild and apply excessive amounts of business logic directly to tables within the database. This can result in lazy development and failure to vigorously check and sanitize inputs before those values reach the database. A well-written application will validate data thoroughly in code and then in the database for critical scenarios. We cannot check for every possible bit of bad data, so it is up to us to determine the most important scenarios and protect against those, if needed. Sometimes excessive use of check constraints is indicative of design flaws elsewhere that we are trying to patch over after-the-fact, rather than fixing the underlying design.
还有其他原因要注意检查约束。 尽管它们是提高数据完整性的有效方法,但很容易变得疯狂,直接将过多的业务逻辑直接应用于数据库中的表。 这可能导致延迟开发,并且无法在这些值到达数据库之前大力检查和清理输入。 编写良好的应用程序将针对关键场景在代码中然后在数据库中彻底验证数据。 我们无法检查所有可能的不良数据,因此我们需要确定最重要的方案并在需要时采取措施。 有时过度使用检查约束表示其他地方的设计缺陷,我们试图在事后修补这些缺陷,而不是修复基础设计。
In no way am I discouraging the use of check constraints, but be mindful of the best way to enforce data integrity without creating technical debt. Definitely encourage the use of dinosaurs in your data as needed, though!
我绝不鼓励使用检查约束,但要牢记在不增加技术负担的情况下强制执行数据完整性的最佳方法。 不过,绝对鼓励您在数据中使用恐龙!
How do check constraints affect performance? In terms of write operations, the impact is generally negligible. I attempted to create a variety of tables and apply all sorts of check constraints to them in order to generate latency, but was unable to move the needle much. In general, constraints are checked against the explicit scalar values prior to writing to the database. In other words, set-based logic does not revert to iterative approaches as a result of check constraints, even if they are complex, for example using math or string functions. If a check constraint includes a function that accesses other tables/views/objects, then the cost of those operations will weigh heavily on write operations, though.
检查约束如何影响性能? 在写操作方面,影响通常可以忽略不计。 我试图创建各种表并对它们应用各种检查约束,以产生延迟,但无法进行很多操作。 通常,在写入数据库之前,针对显式标量值检查约束。 换句话说,基于集合的逻辑不会由于检查约束而恢复为迭代方法,即使它们比较复杂(例如使用数学或字符串函数)也是如此。 If a check constraint includes a function that accesses other tables/views/objects, then the cost of those operations will weigh heavily on write operations, though.
Read operations can benefit from check constraints by providing the query optimizer valuable information about a table and its contents. For example, what if I write a query against our dinosaur table that looks for dinosaurs of a type that are not allowed by the CK_Dinosaur_Dinosaur_Type constraint? To mimic a more realistic query scenario, I’ll add an index on the Dinosaur_Type column first:
Read operations can benefit from check constraints by providing the query optimizer valuable information about a table and its contents. For example, what if I write a query against our dinosaur table that looks for dinosaurs of a type that are not allowed by the CK_Dinosaur_Dinosaur_Type constraint? To mimic a more realistic query scenario, I'll add an index on the Dinosaur_Type column first:
CREATE NONCLUSTERED INDEX IX_Dinosaur_Dinosaur_Type ON dbo.Dinosaur (Dinosaur_Type);
SELECT
Dinosaur_Name
FROM dbo.Dinosaur
WHERE Dinosaur_Type = 'Ceratopsian';
Well, there are no Ceratopsian dinosaurs in our table because the check constraint simply doesn’t allow them. The column is NOT NULL, so NULL isn’t valid either. The result set is empty, but the execution plan and IO statistics are interesting:
Well, there are no Ceratopsian dinosaurs in our table because the check constraint simply doesn't allow them. The column is NOT NULL, so NULL isn't valid either. The result set is empty, but the execution plan and IO statistics are interesting:
For this query, there were zero reads against the table and the execution plan shows a constant scan. Similar to when we tried to pull NULL from a NOT NULL column, the check constraint provided valuable information to SQL Server about the Dinosaur_Type column and what values it is allowed to contain. Without accessing the table, SQL Server was able to deduce from table metadata and the WHERE clause of my query that what I was looking for was not possible, and knowing that information, it could complete query optimization without any significant load on the server.
For this query, there were zero reads against the table and the execution plan shows a constant scan. Similar to when we tried to pull NULL from a NOT NULL column, the check constraint provided valuable information to SQL Server about the Dinosaur_Type column and what values it is allowed to contain. Without accessing the table, SQL Server was able to deduce from table metadata and the WHERE clause of my query that what I was looking for was not possible, and knowing that information, it could complete query optimization without any significant load on the server.
Check constraints are utilized early in the query optimization process and provide a very fast way to rule out invalid values. If a column only allows four dinosaur types, then we can deduce that all other types will not exist in the table. In a very large table, this performance benefit can be significant. Be sure to weigh the documentation and maintenance costs of a check constraint versus data integrity and performance to make a smart decision as to whether or not a given check constraint makes sense.
Check constraints are utilized early in the query optimization process and provide a very fast way to rule out invalid values. If a column only allows four dinosaur types, then we can deduce that all other types will not exist in the table. In a very large table, this performance benefit can be significant. Be sure to weigh the documentation and maintenance costs of a check constraint versus data integrity and performance to make a smart decision as to whether or not a given check constraint makes sense.
Foreign Key constraints (Foreign Key constraints)
An important part of data modelling and enforcing data integrity comes via foreign key constraints. These constraints link a column within one table to a primary key in another table. The foreign key column must always contain a valid value from the parent primary key column. Like with check constraints, whenever a scenario arises in which data is changed in which the foreign key column contains invalid data, an error will be thrown. Foreign keys can reference multiple columns, if the parent table has a compound primary key.
An important part of data modelling and enforcing data integrity comes via foreign key constraints. These constraints link a column within one table to a primary key in another table. The foreign key column must always contain a valid value from the parent primary key column. Like with check constraints, whenever a scenario arises in which data is changed in which the foreign key column contains invalid data, an error will be thrown. Foreign keys can reference multiple columns, if the parent table has a compound primary key.
To demo foreign keys, we can use the table Production.Product in Adventureworks, which contains four different foreign keys:
To demo foreign keys, we can use the table Production.Product in Adventureworks, which contains four different foreign keys:
The definitions of these keys are as follows:
The definitions of these keys are as follows:
ALTER TABLE [Production].[Product] WITH CHECK ADD CONSTRAINT [FK_Product_ProductModel_ProductModelID] FOREIGN KEY([ProductModelID])
REFERENCES [Production].[ProductModel] ([ProductModelID]);
ALTER TABLE [Production].[Product] WITH CHECK ADD CONSTRAINT [FK_Product_ProductSubcategory_ProductSubcategoryID] FOREIGN KEY([ProductSubcategoryID])
REFERENCES [Production].[ProductSubcategory] ([ProductSubcategoryID]);
ALTER TABLE [Production].[Product] WITH CHECK ADD CONSTRAINT [FK_Product_UnitMeasure_SizeUnitMeasureCode] FOREIGN KEY([SizeUnitMeasureCode])
REFERENCES [Production].[UnitMeasure] ([UnitMeasureCode]);
ALTER TABLE [Production].[Product] WITH CHECK ADD CONSTRAINT [FK_Product_UnitMeasure_WeightUnitMeasureCode] FOREIGN KEY([WeightUnitMeasureCode])
REFERENCES [Production].[UnitMeasure] ([UnitMeasureCode]);
For each product, the product model, subcategory, size, and weight are all tied directly to other tables. Whenever new rows are inserted or existing ones are updated, the values for these columns will be validated against their parent tables. Consider the following UPDATE statement:
For each product, the product model, subcategory, size, and weight are all tied directly to other tables. Whenever new rows are inserted or existing ones are updated, the values for these columns will be validated against their parent tables. Consider the following UPDATE statement:
UPDATE Production.Product
SET WeightUnitMeasureCode = 'UM'
WHERE ProductID = 317
This results in an error:
This results in an error:
Msg 547, Level 16, State 0, Line 1
The UPDATE statement conflicted with the FOREIGN KEY constraint “FK_Product_UnitMeasure_WeightUnitMeasureCode”. The conflict occurred in database “AdventureWorks2014”, table “Production.UnitMeasure”, column ‘UnitMeasureCode’.
The statement has been terminated.
Msg 547, Level 16, State 0, Line 1
The UPDATE statement conflicted with the FOREIGN KEY constraint “FK_Product_UnitMeasure_WeightUnitMeasureCode”. The conflict occurred in database “AdventureWorks2014”, table “Production.UnitMeasure”, column 'UnitMeasureCode'.
The statement has been terminated.
The code “UM” is not present in the Production.Unitmeasure table, and therefore any attempt to set the WeightMeasureCode to this value will immediately fail. On the other hand, assigning a value that is present will work normally:
The code “UM” is not present in the Production.Unitmeasure table, and therefore any attempt to set the WeightMeasureCode to this value will immediately fail. On the other hand, assigning a value that is present will work normally:
UPDATE Production.Product
SET WeightUnitMeasureCode = 'MM'
WHERE ProductID = 317
A foreign key column can allow NULL, and if so, will let you enter NULL instead of a valid value, if that makes sense:
A foreign key column can allow NULL, and if so, will let you enter NULL instead of a valid value, if that makes sense:
UPDATE Production.Product
SET WeightUnitMeasureCode = NULL
WHERE ProductID = 317
This is completely valid, but only if the column does not possess the NOT NULL constraint.
This is completely valid, but only if the column does not possess the NOT NULL constraint.
Foreign keys can be created with the NOCHECK keyword, which will cause the key to enforce data integrity in the future, but will allow any existing bad data to continue to reside in the table. This is generally frowned upon as it allows bad data to exist in a column where we clearly want to maintain a certain level of integrity. While there are some valid use-cases for NOCHECK, typically temporary in nature, I’d advise against using it. Constraints are a form of database documentation, and the presence of a NOCHECK constraint provides mixed information about the contents of the column(s) it applies to.
Foreign keys can be created with the NOCHECK keyword, which will cause the key to enforce data integrity in the future, but will allow any existing bad data to continue to reside in the table. This is generally frowned upon as it allows bad data to exist in a column where we clearly want to maintain a certain level of integrity. While there are some valid use-cases for NOCHECK, typically temporary in nature, I'd advise against using it. Constraints are a form of database documentation, and the presence of a NOCHECK constraint provides mixed information about the contents of the column(s) it applies to.
There is a performance cost to pay for writing to columns that are part of a foreign key. In order to validate the values being inserted or updated, it is necessary to read the parent table in order to ensure the value being added is present. If the parent column is indexed (which it should be!), then the cost is at best a clustered index seek. If the parent column is not indexed (it’s a heap), then a table scan is the result.
There is a performance cost to pay for writing to columns that are part of a foreign key. In order to validate the values being inserted or updated, it is necessary to read the parent table in order to ensure the value being added is present. If the parent column is indexed (which it should be!), then the cost is at best a clustered index seek. If the parent column is not indexed (it's a heap), then a table scan is the result.
To see this in action, let’s consider an update of the SizeMeasureCode column in Production.Product:
To see this in action, let's consider an update of the SizeMeasureCode column in Production.Product :
UPDATE Production.Product
SET SizeUnitMeasureCode = 'DM'
WHERE ProductID = 317;
Without a foreign key, we would simply read Production.Product in order to find the row we want to update, and then update it with the new value as we requested. With the foreign key, though, here is what we get for IO and execution plan:
Without a foreign key, we would simply read Production.Product in order to find the row we want to update, and then update it with the new value as we requested. With the foreign key, though, here is what we get for IO and execution plan:
Table ‘UnitMeasure’. Scan count 0, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Product’. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'UnitMeasure'. Scan count 0, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Product'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
In order to check the foreign key for the value we are assigning, a seek against the UnitMeasure table was required. Since this is a small lookup table that contains 38 rows, the cost of this operation is negligible. The Assert operator is what validates that the values we are writing to Production.Product exist in Production.UnitMeasure. If this were not the case, then the query would fail here with a foreign key conflict error that would look something like this:
In order to check the foreign key for the value we are assigning, a seek against the UnitMeasure table was required. Since this is a small lookup table that contains 38 rows, the cost of this operation is negligible. The Assert operator is what validates that the values we are writing to Production.Product exist in Production.UnitMeasure . If this were not the case, then the query would fail here with a foreign key conflict error that would look something like this:
Msg 547, Level 16, State 0, Line 357
The UPDATE statement conflicted with the FOREIGN KEY constraint “FK_Product_UnitMeasure_SizeUnitMeasureCode”. The conflict occurred in database “AdventureWorks2014”, table “Production.UnitMeasure”, column ‘UnitMeasureCode’.
Msg 547, Level 16, State 0, Line 357
The UPDATE statement conflicted with the FOREIGN KEY constraint “FK_Product_UnitMeasure_SizeUnitMeasureCode”. The conflict occurred in database “AdventureWorks2014”, table “Production.UnitMeasure”, column 'UnitMeasureCode'.
When adding a foreign key to an important table, consider this write cost when doing so. Make sure your server and storage systems can handle the additional load that will result from the read operations needed to check that foreign key constraint. This is especially important if the parent table contains a very large amount of data. Whereas a seek against a 38 row table was very fast, the same seek against a table with a hundred million rows could be quite expensive!
When adding a foreign key to an important table, consider this write cost when doing so. Make sure your server and storage systems can handle the additional load that will result from the read operations needed to check that foreign key constraint. This is especially important if the parent table contains a very large amount of data. Whereas a seek against a 38 row table was very fast, the same seek against a table with a hundred million rows could be quite expensive!
Some companies with exceptionally large tables will consider removing foreign keys in order to improve performance on write-heavy tables. This is a valid approach, but one in which the need for database integrity must be weighed against performance and then the best decision made based on those criteria. If foreign keys are omitted, what protects against bad data? If invalid values are created, how will this affect the application? This is not a simple decision to make, and is one that should be handled on a case-by-case basis. There are other tools available that can greatly help in offsetting the costs of foreign keys, such as partitioning, indexed views, and in-memory OLTP. In general, any tool that helps alleviate the performance cost of reading very large tables will help in managing the costs of foreign keys. Performance optimization strategies that help speed up foreign key checks are likely beneficial to any environment in which the database administrators are considering dropping foreign keys for performance reasons anyway.
Some companies with exceptionally large tables will consider removing foreign keys in order to improve performance on write-heavy tables. This is a valid approach, but one in which the need for database integrity must be weighed against performance and then the best decision made based on those criteria. If foreign keys are omitted, what protects against bad data? If invalid values are created, how will this affect the application? This is not a simple decision to make, and is one that should be handled on a case-by-case basis. There are other tools available that can greatly help in offsetting the costs of foreign keys, such as partitioning, indexed views, and in-memory OLTP. In general, any tool that helps alleviate the performance cost of reading very large tables will help in managing the costs of foreign keys. Performance optimization strategies that help speed up foreign key checks are likely beneficial to any environment in which the database administrators are considering dropping foreign keys for performance reasons anyway.
Performance issues due to foreign keys are not a problem that everyone will face, so please don’t consider foreign keys a hog, as they certainly are not. They are exceptional tools for enforcing data integrity and the costs are very often worth the benefits they provide. Managing big data is a broader challenge with many effective solutions available to alleviate performance problems. Omitting foreign keys should be a last resort, and one I would not recommend without a serious design discussion to ensure that removing one problem is not creating a larger one in the future.
Performance issues due to foreign keys are not a problem that everyone will face, so please don't consider foreign keys a hog, as they certainly are not. They are exceptional tools for enforcing data integrity and the costs are very often worth the benefits they provide. Managing big data is a broader challenge with many effective solutions available to alleviate performance problems. Omitting foreign keys should be a last resort, and one I would not recommend without a serious design discussion to ensure that removing one problem is not creating a larger one in the future.
结论 (Conclusion)
Constraints provide a variety of methods for enforcing data and relational integrity. They allow us to model our schema after business needs and allow database objects to provide some level of built-in documentation that helps explain what tables relate to which other tables, what columns are required, and what significant rules must be followed at all times.
Constraints provide a variety of methods for enforcing data and relational integrity. They allow us to model our schema after business needs and allow database objects to provide some level of built-in documentation that helps explain what tables relate to which other tables, what columns are required, and what significant rules must be followed at all times.
Constraints have performance implications, both improving performance in some cases and consuming resources in others. Most of the time, the costs are worth it in that we can improve the quality of our data without the need to build complex application logic or database scripts that try to keep data in order. A well thought out database design will often answer constraint questions up-front. This makes it so that we do not need to address technical debt years later when we suddenly realize that one column should always have been populated, or that dinosaurs shouldn’t be assigned to non-existent species. Constraints are valuable tools, and ones that can improve documentation, performance, data integrity, and make sense of what would otherwise be complex data.
Constraints have performance implications, both improving performance in some cases and consuming resources in others. Most of the time, the costs are worth it in that we can improve the quality of our data without the need to build complex application logic or database scripts that try to keep data in order. A well thought out database design will often answer constraint questions up-front. This makes it so that we do not need to address technical debt years later when we suddenly realize that one column should always have been populated, or that dinosaurs shouldn't be assigned to non-existent species. Constraints are valuable tools, and ones that can improve documentation, performance, data integrity, and make sense of what would otherwise be complex data.
References and further reading (References and further reading)
Microsoft has done a good job of documenting constraints, their syntax and usage, and demos of how to create and use them. The internet as a whole is on a bit shakier ground. I’ve read too many articles or Q&A sites where constraints are completely shunned, where performance data is completely wrong, or there are misunderstandings as to why we use them or how they work.
Microsoft has done a good job of documenting constraints, their syntax and usage, and demos of how to create and use them. The internet as a whole is on a bit shakier ground. I've read too many articles or Q&A sites where constraints are completely shunned, where performance data is completely wrong, or there are misunderstandings as to why we use them or how they work.
The following are some useful links from Microsoft that help explain the usage of constraints in SQL Server and the optimal ways to implement them:
The following are some useful links from Microsoft that help explain the usage of constraints in SQL Server and the optimal ways to implement them:
- Basic list of constraints and their usesBasic list of constraints and their uses
- Unique constraints, and links to check constraint detailsUnique constraints, and links to check constraint details
SQL Server catalog views for constraints:
SQL Server catalog views for constraints:
- Check ConstraintsCheck Constraints
- Default ConstraintsDefault Constraints
- Foreign KeysForeign Keys
- Foreign Key ColumnsForeign Key Columns
- All Key ConstraintsAll Key Constraints
翻译自: https://www.sqlshack.com/the-benefits-costs-and-documentation-of-database-constraints/
数据库约束查找的约束