唯一索引和非唯一索引_唯一索引及其依据

最新推荐文章于 2021-09-23 21:07:58 发布

culi3182

最新推荐文章于 2021-09-23 21:07:58 发布

阅读量1.3k

点赞数

文章标签：数据库 mysql java 索引大数据

原文链接：https://www.sitepoint.com/unique-index-rationale/

版权

唯一索引和非唯一索引

This article is inspired by Craig's recent post "How to Use Unique Indexes in MySQL and Other Databases".

本文的灵感来自于Craig的最新文章“ 如何在MySQL和其他数据库中使用唯一索引 ”。

There are a few more tightly related and equally important aspects regarding Primary Key (PK) and Unique Index (UI) which were not covered in that article.

关于主键(PK)和唯一索引(UI)还有一些更紧密相关且同等重要的方面，在本文中未涉及。

So in this article, we will take a further look at these areas.

因此，在本文中，我们将进一步研究这些领域。

自动递增为PK (Auto Increment as PK)

A common practice for database designers is to use an auto-increment as the PK for a particular table.

数据库设计人员的常见做法是将自动增量用作特定表的PK。

As the definition tells us:

定义告诉我们：

The PRIMARY KEY constraint uniquely identifies each record in a database table.

PRIMARY KEY约束唯一地标识数据库表中的每个记录。

To be exact, the PK is there to tell the database that the record is unique.

确切地说，PK是用来告诉数据库记录是唯一的。

Using auto-increment as the PK is an easy but lazy way to accomplish this. A record with id=1 is different from a record with id=2. That's it.

使用自动增量作为PK是完成此任务的一种简单但懒惰的方法。 id=1的记录与id=2的记录不同。而已。

The most critical thing to bear in mind is that in most cases, an auto-increment PK field is NOT part of the record itself. So how can we rely on the auto-increment PK itself, which has nothing related to the record content to identify a record?

要记住的最关键的事情是，在大多数情况下，自动增量PK字段不属于记录本身。那么，我们如何依靠自动增量PK本身(与记录内容无关的内容来识别记录)呢？

Take a typical record to hold a sales item for example. It may include the following fields:

以典型的记录来保存销售项目为例。它可能包括以下字段：

id (auto increment)
id(自动递增)
invoice number
发票号码
client name (or id)
客户名称(或ID)
date of entry
入境时间
item name (or id)
项目名称(或ID)
quantity
数量
unit price
单价
subtotal (a calculated field)
小计(计算字段)

A "natural" structure will lead us to use id as the auto-increment field and thus as the PK. I am not convinced at all that this will suffice.

一个“自然”的结构将导致我们将id用作自动递增字段，从而将其用作PK。我完全不相信这足以满足要求。

Consider the following two entries:

考虑以下两个条目：

1 12345 GoodGuy 13-12-12 item1 2 199 398
2 12346 GoodGuy 13-12-12 item1 2 199 398

Are they unique or just duplicated? I would rather treat them as duplicated entries where the person who keys in the entries accidentally typed the invoice number wrong for at least one of the entries.

它们是唯一的还是只是重复的？我宁愿将它们视为重复的条目，其中键入条目的人不小心为至少一个条目键入了错误的发票编号。

Furthermore, in this example, we must argue that even the invoice number wouldn't be a good candidate to act as a PK or unique index.

此外，在此示例中，我们必须辩称，即使invoice number也不适合用作PK或unique index 。

We will have to use a compound unique index, i.e., a unique index comprised of several fields to assure us of the uniqueness of a record.

我们将必须使用复合唯一索引，即由多个字段组成的唯一索引，以确保记录的唯一性。

复合唯一索引 (Compound unique index)

Let's step back from the database structure first and look at the physical real world entity: a sales item. Based on our experience and discussion with the sales people, how can we distinguish two sales entries?

让我们首先从数据库结构退后一步，看一下真实的现实世界实体：一种销售商品。根据我们的经验和与销售人员的讨论，我们如何区分两个销售条目？

The general logic could be:

一般逻辑可以是：

If purchased items are different, definitely they are different sales items;
如果购买的商品不同，则肯定是不同的销售商品。
If purchased items are the same, but with different client, date, quantity, they are different.
如果购买的物品相同，但客户，日期，数量不同，则它们是不同的。
…
…

Depending on the situation, more approaches could be possible.

根据情况，可能有更多的方法。

Now it is clearer: if two sales entries have the same item, client, date, quantity, it is very likely they are duplicated entries. So in this database structure we will set up a compound unique index based on the fields listed above:

现在更清楚了：如果两个销售条目具有相同的项目，客户，日期，数量，则很可能它们是重复的条目。因此，在此数据库结构中，我们将基于上面列出的字段设置复合唯一索引：

ALTER TABLE `test`.`salesitem`
ADD UNIQUE INDEX `uniqueentry` (`itemid` ASC, `clientid` ASC, `date` ASC, `quantity` ASC);

With this uniqueness constraint applied to our table, we will be able to pick out the above two entries as duplicated.

将此唯一性约束应用到我们的表后，我们将能够挑选出上述两个重复项。

NOTE: Keep in mind that the duplication detected by the uniqueentry index is very likely. In real cases, they may not be duplicated after all (say the GoodGuy just placed a repeated order at the same day with the exactly same items). If that is the case, the database designer must work with the sales person and further discuss what other fields should be introduced in the table structure and the unique index composition to distinguish two entries.

注意：请记住，由uniqueentry索引检测到的重复很有可能 。在实际情况下，它们可能根本不会重复(例如， GoodGuy只是在同一天使用完全相同的商品下了重复订单)。在这种情况下，数据库设计人员必须与销售人员合作，并进一步讨论应在表结构和唯一索引组成中引入哪些其他字段以区分两个条目。

NOTE: To avoid a duplicated entry in invoice numbers, it is also highly recommended to create another unique index based on the invoice number field.

注意：为避免发票编号重复输入，强烈建议根据invoice number字段创建另一个唯一索引。

In combination of the above two unique index constraints, we have avoided such human errors as:

结合以上两个独特的索引约束，我们避免了以下人为错误：

A typo in invoice number that causes a duplicated invoice number;
发票号输入错误，导致发票号重复；
A typo in invoice number that makes a duplicated entry in the table with the same item, client, date and quantity;
发票编号中的错字，该错字在表中使用相同的项目，客户，日期和数量重复输入；

We are not able to eliminate all human errors by our design. It is in the realm of business process design and relying on supporting hardware (for example, a scanner to scan the invoice number to avoid manual input) to enhance the accuracy.

我们无法通过设计消除所有人为错误。它是在业务流程设计领域中并依靠支持硬件(例如，扫描仪扫描发票编号以避免人工输入)来提高准确性。

Rules of thumb:

经验法则：

In any real world application, never ever rely on the auto-increment PK to determine the uniqueness of a record;
在任何实际应用中，永远不要依赖自动增量PK来确定记录的唯一性。
Don't rely on a single man-made field (in this case, the invoice number) to check the uniqueness.
不要依赖单个人工字段(在这种情况下为发票编号)来检查唯一性。
The fields in a compound unique index should be those physically existing and generated by business activities. In our example above, date, item (which is indirectly referred to as a foreign key constraint to our product table), quantity and client (which is indirectly referred to as a foreign key constraint to our client table) all have these characteristics.
复合唯一索引中的字段应该是业务活动实际存在和生成的字段。在上面的示例中，日期，项目(间接称为product表的外键约束)，数量和客户(间接称为client表的外键约束)都具有这些特征。

Further note on 3 above, that is why I don't like to use UUID as PK or UI. Besides the complexity to calculate a UUID, the UUID itself has no real-world corresponding attribute for a record.

关于以上3的进一步说明，这就是为什么我不喜欢将UUID用作PK或UI的原因。除了计算UUID的复杂性外，UUID本身没有记录的真实世界对应属性。

With these principles in mind, our table structure design will be more robust.

牢记这些原则，我们的表结构设计将更加可靠。

存在自动递增PK的理由 (Justification of the existence of an auto-increment PK)

After saying all this and setting up the two unique indexes, we may still use an auto-increment field as PK. Why?

说完所有这些并设置了两个唯一索引之后，我们仍然可以使用自动增量字段作为PK。为什么？

The answer is simple: to facilitate the data location and make it simpler and faster.

答案很简单：方便数据定位并使其更简单，更快。

In our example, to locate a single record, we can either use select * from sales_entry where id=1 or use select * from sales_entry where invoice='...' or use select * from sales_entry where itemid=... and clientid=... and date='...' and quantity=.... Obviously, locating through an integer will be the fastest to execute and the easiest to write.

在我们的示例中，要查找单个记录，我们可以使用select * from sales_entry where id=1或使用select * from sales_entry where invoice='...'或使用select * from sales_entry where itemid=... and clientid=... and date='...' and quantity=... 显然，通过整数定位将是执行最快和最容易编写的。

Also, an auto-increment PK will help other tables in the database to have a simpler Foreign Key setup to enforce data integrity. It is still recommended to set up an auto-increment field as the PK but it is strongly NOT recommended to rely on the PK to determine the uniqueness of a record.

同样，自动增量PK将帮助数据库中的其他表具有更简单的外键设置来强制数据完整性。它仍然是建议设立一个自动递增字段的PK，但强烈建议不要依靠PK来确定A记录的唯一性。

复合(唯一)索引中的字段顺序 (The order of fields in a compound (unique) index)

The design of a database and its tables is an art. Especially so when we are designing the indexes of each table.

数据库及其表的设计是一门艺术。特别是在设计每个表的索引时。

After the above setup, there are already 3 indexes in our table:

完成上述设置后，我们的表中已经有3个索引：

One PK using the auto-increment id field.
使用自动递增id字段的一个PK。
One UI using the invoice field.
一个使用invoice字段的用户界面。
One UI using the combination of several fields.
一个UI，使用多个字段的组合。

Let's look again at the SQL that creates the 3rd index:

让我们再次看一下创建第三个索引SQL：

ADD UNIQUE INDEX `uniqueentry` (`itemid` ASC, `clientid` ASC, `date` ASC, `quantity` ASC);

and consider the following SQL statement and its corresponding EXPLAIN output excerpt:

并考虑以下SQL语句及其对应的EXPLAIN输出摘录：

select * from salesitem where clientid=1

The table does not have an index on clientid yet, so the above SQL will be unable to use any indexes (possible_keys = null) to speed up the query and has to do a full table scan to fetch the record (Extra = Using Where).

该表尚未在clientid上建立索引，因此上述SQL将无法使用任何索引( possible_keys = null )来加速查询，并且必须进行全表扫描以获取记录( Extra = Using Where )。。

select * from salesitem where itemid=1 and clientid=1 and date='13-12-14'

This SQL's where statement contains several filters and they ARE in the strict order that we created our uniqueentry index. It takes the full advantage of our UI and we can expect the best speed out of it.

这个SQL是where语句包含几个过滤器，他们在严格的顺序，我们创造了我们现在的 uniqueentry指数。它充分利用了我们的UI，我们可以期待最快的速度。

select * from salesitem where itemid=1 and clientid=1 and quantity=1

This SQL's where statement contains several filters and they ARE NOT in the strict order that we created our uniqueentry index. MySQL still tries hard to use the UI and we can expect a moderate speed.

该SQL的where语句包含多个过滤器，它们的排列顺序并不严格，这与我们创建uniqueentry索引的顺序uniqueentry 。 MySQL仍在努力使用UI，我们可以预期会有中等速度。

select * from salesitem where itemid=1 and date='13-12-14' and quantity=1 and clientid=1

This is bad. The 4 fields in the UI are in the SQL but in a totally distorted order. It will also invoke a full table scan to fetch the record and most unlucky, there is no record found after we seek through the full table!

这不好。 UI中的4个字段位于SQL中，但顺序完全失真。它还将调用全表扫描以获取记录，最不幸的是，在我们搜索全表后找不到任何记录！

EXPLAIN is a very useful command in MySQL and MariaDB. To use it, simply issue EXPLAIN *your sql statment to run* in MySQL terminal or in your MySQL GUI applications like PhpMyAdmin.

EXPLAIN是MySQL和MariaDB中非常有用的命令。要使用它，只需在MySQL终端或您MySQL GUI应用程序(例如PhpMyAdmin)中发布EXPLAIN *your sql statment to run* 。

When designing a compound index, be cautious about the order of the fields in the index.

设计复合索引时，请注意索引中字段的顺序。

Our uniqueentry index is created with 4 fields in the strict order: item, client, date, quantity.

我们的uniqueentry索引是按照严格的顺序由4个字段创建的： 项目，客户，日期，数量 。

In our SQL statement, if our where clause matches the full or partial (like the 2nd and 3rd SQL) order of a compound index fields order, MySQL will be smart enough to match the longest field chain in the compound index to speed up the search. Otherwise, MySQL won't be able to find a matching index and will have to do a full table scan, which is always discouraged in any database development.

在我们SQL语句中，如果where子句匹配复合索引字段顺序的全部或部分(例如第二和第三SQL)顺序，则MySQL将足够聪明以匹配复合索引中最长的字段链以加快搜索速度。否则，MySQL将无法找到匹配的索引，并且必须进行全表扫描，这在任何数据库开发中均不建议这样做。

Our uniqueentry UI's fields order may not be useful if our program will most often query on a client's purchase history (like in our 1st SQL statement). If that is the case, we need either change the order of fields in uniqueentry UI, or create a new (compound) index with clientid as the only (or the first) field in that index.

如果我们的程序最经常查询客户的购买历史记录(例如在我们的第一个SQL语句中)，则我们的uniqueentry UI的字段顺序可能没有用。在这种情况下，我们需要更改uniqueentry UI中字段的顺序，或者创建一个新的(复合)索引，其中clientid是该索引中唯一(或第一个)字段。

Too many indexes will slow down the database performance as the database engine, upon every create, update and delete operation, will have to do some extra work to update the indexes. In a multi-user, high-frequency-write environment, this can cause significant performance issues.

索引太多会降低数据库性能，因为数据库引擎在每次创建，更新和删除操作时都必须做一些额外的工作来更新索引。在多用户，高频写入环境中，这可能会导致严重的性能问题。

Thus, there must be a balance: less table/record lock time, faster query/aggregation time, reasonable number of indexes for a particular table. There are no set rules on how to achieve this – everything can be accomplished with experimentation and knowing the purpose and usage of the app in great detail.

因此，必须有一个平衡：更少的表/记录锁定时间，更快的查询/聚合时间，特定表的合理索引数。关于如何实现此目标，没有设定的规则-一切都可以通过实验并详细了解应用程序的用途和用途来完成。

结论 (Conclusion)

In this article, we further elaborated on some basic and key concepts in designing a table and its indexes. In particular, we focused a lot on how to determine the uniqueness of a record by formulating a meaningful compound unique index. We also explained a bit on how a well designed compound index can be used by MySQL to speed up the query.

在本文中，我们进一步阐述了设计表及其索引的一些基本概念和关键概念。尤其是，我们非常关注如何通过制定有意义的复合唯一索引来确定记录的唯一性。我们还解释了MySQL如何使用精心设计的复合索引来加快查询速度。

In future articles, we will look at another important aspect of the database: Stored Procedures and their usage in PHP.

在以后的文章中，我们将研究数据库的另一个重要方面：存储过程及其在PHP中的用法。

Stay tuned!

敬请关注！

翻译自: https://www.sitepoint.com/unique-index-rationale/

唯一索引和非唯一索引

culi3182

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
唯一索引和非唯一索引_唯一索引及其依据

唯一索引和非唯一索引This article is inspired by Craig's recent post "How to Use Unique Indexes in MySQL and Other Databases". 本文的灵感来自于Craig的最新文章“ 如何在MySQL和其他数据库中使用唯一索引 ”。 There are a few more tightly related...
复制链接

扫一扫