Cardinality对表Join的影响

最新推荐文章于 2024-09-08 15:12:17 发布

oliverpeng

最新推荐文章于 2024-09-08 15:12:17 发布

阅读量230

点赞数

分类专栏： MySQL 文章标签： MySQL 算法

MySQL 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

如果你执行SHOW INDEX FROM TABLE_NAME察看所引信息，中间会有一列Cardinality，MySQL在解析多重Join的时候会根据Cardinality的信息决定选择什么路径来执行Join，这种算法的前提条件是索引数据时很平均分配的，但如果索引中的数据非常不均衡，会导致MySQL做出错误的选择。下面是一个完整的例子：

首先创建三个测试表：

CREATE TABLE customer (
-- Unique ID
customer_id integer NOT NULL,

CONSTRAINT PK_customer PRIMARY KEY(customer_id)
);

CREATE TABLE my_order (
-- Unique ID
order_id integer NOT NULL,

product_id integer NOT NULL,

customer_id integer NOT NULL,

CONSTRAINT PK_order PRIMARY KEY(product_id, customer_id ),
CONSTRAINT FK_order2customer_id FOREIGN KEY (customer_id) REFERENCES customer(customer_id),
CONSTRAINT UQ_order_id UNIQUE(order_id)

);

CREATE TABLE delivery (

order_id int NOT NULL,

time datetime NOT NULL,

CONSTRAINT PK_order PRIMARY KEY(order_id, time),
CONSTRAINT FK_order FOREIGN KEY (order_id) REFERENCES my_order(order_id)
);

然后加入数据：

set @N=0;

insert into customer select @N:=@N+1 from mysql.help_topic LIMIT 1000;

set @N=0;

insert into my_order select @N:=@N+1, @N, 1 from mysql.help_topic a, mysql.help_topic b LIMIT 100000;

set @I=1;

insert into my_order select @N:=@N+1, @N, @I:=@I+1 from mysql.help_topic a, mysql.help_topic b LIMIT 600;

set @N=0;

insert into delivery select @N:=@N+1, '2010-05-10 15:22:02' from mysql.help_topic a, mysql.help_topic b LIMIT 600;

注意在my_order表中大多数记录的customer_id的值是1。

执行ANALYZE TABLE来更新Cardinality：

ANALYZE TABLE my_order;

ANALYZE TABLE delivery;

下面我们来执行一个Query：

mysql> SELECT count(*) FROM my_order a JOIN my_order b ON a.customer_id = b.customer_id and a.product_id = 123 JOIN delivery ON b.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02';
+----------+
| count(*) |
+----------+
| 600 |
+----------+
1 row in set (0.51 sec)

然后用Explain分析一下：

mysql> explain SELECT count(*) FROM my_order a JOIN my_order b ON a.customer_id = b.customer_id and a.product_id = 123 JOIN delivery ON b.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02';
+----+-------------+----------+--------+----------------------------------+----------------------+---------+------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+----------------------------------+----------------------+---------+------------------------------+------+-------------+
| 1 | SIMPLE | a | ref | PRIMARY,FK_order2customer_id | PRIMARY | 4 | const | 1 | Using index |
| 1 | SIMPLE | b | ref | UQ_order_id,FK_order2customer_id | FK_order2customer_id | 4 | oliver_test.a.customer_id | 167 | |
| 1 | SIMPLE | delivery | eq_ref | PRIMARY | PRIMARY | 12 | oliver_test.b.order_id,const | 1 | Using index |
+----+-------------+----------+--------+----------------------------------+----------------------+---------+------------------------------+------+-------------+
3 rows in set (0.00 sec)

从explain的结果来看，MySQL选择先把my_order自己做Join然后再去Join表delivery。第一个Join需要访问167行数据。但实际情况是如何呢？

mysql> SELECT count(*) FROM my_order a JOIN my_order b ON a.customer_id = b.customer_id and a.product_id = 123 ;
+----------+
| count(*) |
+----------+
| 100000 |
+----------+
1 row in set (0.06 sec)

如果我们只执行第一个Join，实际将访问100000条记录，远远高于167，相反如果我们先做第二个Join，我们只需要访问600行：

mysql> select count(*) from delivery JOIN my_order ON my_order.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02';
+----------+
| count(*) |
+----------+
| 600 |
+----------+
1 row in set (0.00 sec)

为什么MySQL会选择错误的顺序来执行呢？首先察看一下my_order表的索引：

mysql> show index from my_order;
+----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| my_order | 0 | PRIMARY | 1 | product_id | A | 100600 | NULL | NULL | | BTREE | |
| my_order | 0 | PRIMARY | 2 | customer_id | A | 100600 | NULL | NULL | | BTREE | |
| my_order | 0 | UQ_order_id | 1 | order_id | A | 100600 | NULL | NULL | | BTREE | |
| my_order | 1 | FK_order2customer_id | 1 | customer_id | A | 602 | NULL | NULL | | BTREE | |
+----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
4 rows in set (0.00 sec)

我们可以看到对于索引FK_order2customer_id，Cardinality的值是602，MySQL在估算第一个Join的时候，假设索引是平均分布的，用总行数（100600）除以Cardinality，所以得到167，由于167小于第二个Join需要访问的行数600，所以选择先执行第一个Join。

如何解决这个问题？

修改Query中Join的顺序并用STRAIGHT_JOIN强制MySQL按照这个顺序执行，下面是新的Query:

mysql> SELECT count(*) FROM delivery STRAIGHT_JOIN my_order a ON a.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02' JOIN my_order b ON a.customer_id = b.customer_id and b.product_id = 123 ;
+----------+
| count(*) |
+----------+
| 600 |
+----------+
1 row in set (0.00 sec)

你可以看到查询时间从0.51秒变成0秒。

用Explain察看新的执行顺序：

mysql> explain SELECT count(*) FROM delivery STRAIGHT_JOIN my_order a ON a.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02' JOIN my_order b ON a.customer_id = b.customer_id and b.product_id = 123 ;
+----+-------------+----------+--------+----------------------------------+-------------+---------+-------------------------------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+----------------------------------+-------------+---------+-------------------------------+------+---------------------------------------------+
| 1 | SIMPLE | b | ref | PRIMARY,FK_order2customer_id | PRIMARY | 4 | const | 1 | Using index |
| 1 | SIMPLE | delivery | index | PRIMARY | PRIMARY | 12 | NULL | 600 | Using where; Using index; Using join buffer |
| 1 | SIMPLE | a | eq_ref | UQ_order_id,FK_order2customer_id | UQ_order_id | 4 | oliver_test.delivery.order_id | 1 | Using where |
+----+-------------+----------+--------+----------------------------------+-------------+---------+-------------------------------+------+---------------------------------------------+
3 rows in set (0.00 sec)

另外很重要的一点是这个问题在开始阶段通常不会注意，因为数据量少不会影响性能，但随着数据不断增加，一旦达到一定数量，就会突然出现并影响系统的性能。

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Cardinality对表Join的影响

如果你执行SHOW INDEX FROM TABLE_NAME察看所引信息，中间会有一列Cardinality，MySQL在解析多重Join的时候会根据Cardinality的信息决定选择什么路径来执行Join，这种算法的前提条件是索引数据时很平均分配的，但如果索引中的数据非常不均衡，会导致MySQL做出错误的选择。下面是一个完整的例子：首先创建三个测试表：CREATE TABL...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。