索引长度与区分度
以material_data的表为例,这个表中有多大45万的数据;
对matertial_name的字段建立索引;
以matertial_name字段建模普通索引,
mysql> alter table material_data add index idx_material_name(material_name);
Query OK, 0 rows affected
Records: 0 Duplicates: 0 Warnings: 0
执行
select * from material_data where material_name='光缆交接箱改造';
mysql> explain select * from material_data where material_name='光缆交接箱改造';
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | material_data | NULL | ref | idx_material_name | idx_material_name | 363 | const | 6 | 100 | NULL |
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------+
1 row in set
可以发现 索引的长度达到363个长度,这会影响索引的使用效率,那么多个长度合适呢?
统计各个长度索引的区分度:
mysql> SELECT count(distinct LEFT(t.material_name ,110))/count(1) from material_data t;
索引长度 | 区分度 |
1 | 0.0047 |
2 | 0.034 |
3 | 0.1082 |
4 | 0.1635 |
5 | 0.2166 |
6 | 0.2607 |
7 | 0.299 |
8 | 0.3335 |
9 | 0.3646 |
10 | 0.3936 |
11 | 0.419 |
12 | 0.4411 |
13 | 0.462 |
14 | 0.4812 |
15 | 0.4991 |
16 | 0.5158 |
17 | 0.5303 |
18 | 0.5426 |
19 | 0.5536 |
20 | 0.5651 |
25 | 0.6057 |
30 | 0.6303 |
40 | 0.6426 |
50 | 0.6426 |
60 | 0.6426 |
70 | 0.6426 |
80 | 0.6426 |
90 | 0.6426 |
100 | 0.6426 |
110 | 0.6426 |
120 | 0.6426 |
可以发现大概在15个长度左右比较高效;
mysql> alter table material_data add index idx_material_name(material_name(15));
mysql> explain select * from material_data where material_name='光缆交接箱改造';
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | material_data | NULL | ref | idx_material_name | idx_material_name | 48 | const | 6 | 100 | Using where |
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------------+
1 row in set