MySQL字符列前缀索引长度的选择

概述

字符串列具有不定长得特点,可能长度为0也可能长度为1024,当我们频繁查询这个列的时候,我们可以为这个字符串列建立一个索引,但是这个索引不应该是不定长的, 我们应该来选择一个合适的前缀长度来建立索引.下面介绍找到最佳长度的过程.

找到最佳前缀索引长度

创建表

create table city (
	id int(11) not null auto_increment,
	cname varchar(255),
	primary key(id)
);

# 插入数据
insert into city(cname) values('New York');
insert into city(cname) values('Beijing');
insert into city(cname) values('Guangzhou');
insert into city(cname) values('Shanghai');
insert into city(cname) values('Shenzhen');
insert into city(cname) values('Chongqing');
insert into city(cname) values('London');
insert into city(cname) values('Tokyo');
insert into city(cname) values('Soul');
insert into city(cname) values('Garden Grove');
insert into city(cname) values('Escobar');
insert into city(cname) values('Amroha');
insert into city(cname) values('Tegal');
insert into city(cname) values('Lancaster');
insert into city(cname) values('Jelets');
insert into city(cname) values('Ambattur');
insert into city(cname) values('Yingkou');
insert into city(cname) values('Monclova');
insert into city(cname) values('Dazhou');
insert into city(cname) values('Guangan');

# 复制数据(执行多次)
insert into city(cname) select cname from city;

# 随机化数据(每次执行随机删除1000条)
DELETE FROM city ORDER BY RAND() LIMIT 1000;

数据计算

  • 查看最常出现的10个城市.
select count(*) as cnt, cname from city group by cname order by cnt desc limit 10;

返回结果

+-----+-----------+
| cnt | cname     |
+-----+-----------+
| 654 | Guangan   |
| 652 | Lancaster |
| 645 | Shanghai  |
| 633 | Amroha    |
| 607 | Chongqing |
| 578 | Beijing   |
| 574 | Soul      |
| 574 | Ambattur  |
| 552 | Hangzhou  |
| 549 | New York  |
+-----+-----------+
  • 通过计算完整列的选择性, 然后使得前缀的选择性接近于完整列的选择性.
SELECT count(distinct cname)/count(*) FROM city;

返回结果

+--------------------------------+
| COUNT(distinct cname)/count(*) |
+--------------------------------+
|                         0.0018 |
+--------------------------------+
  • 查询多个列的选择性进行比较
# 如果数据比较平均, 则这个结果也会比较平均.
SELECT count(distinct LEFT(cname, 3))/count(*) as factor3,
			 count(distinct LEFT(cname, 4))/count(*) as factor4,
			 count(distinct LEFT(cname, 5))/count(*) as factor5,
			 count(distinct LEFT(cname, 6))/count(*) as factor6,
			 count(distinct LEFT(cname, 7))/count(*) as factor7,
			 count(distinct LEFT(cname, 8))/count(*) as factor8,
			 count(distinct LEFT(cname, 9))/count(*) as factor9
FROM city;

返回结果

+---------+---------+---------+---------+---------+---------+---------+
| factor3 | factor4 | factor5 | factor6 | factor7 | factor8 | factor9 |
+---------+---------+---------+---------+---------+---------+---------+
|  0.0017 |  0.0017 |  0.0017 |  0.0018 |  0.0018 |  0.0018 |  0.0018 |
+---------+---------+---------+---------+---------+---------+---------+
  • 查看长度为4的前缀的最多的前面10个.
SELECT COUNT(1) cnt, LEFT(cname,4) AS pref FROM city GROUP BY pref ORDER BY cnt DESC LIMIT 10; 

返回结果

+------+------+
| cnt  | pref |
+------+------+
| 1188 | Guan |
|  652 | Lanc |
|  645 | Shan |
|  633 | Amro |
|  607 | Chon |
|  578 | Beij |
|  574 | Soul |
|  574 | Amba |
|  552 | Hang |
|  549 | New  |
+------+------+

结果选择: 在我的表中, 当我选择6时候分布最为均匀,是一个好的结果.

在cname列上创建索引

ALTER TABLE city add key (cname(6));

总结

前缀索引能够使得查找更加快, 索引大小更小的有效办法. 但是缺点是无法使用前缀索引做group by或者order by, 也无法使用前缀索引做覆盖扫描.

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值