MySQL中bit类型的应用(1)

MySQL中的bit类型, 支持1-64 个bit长度. 我们可以用bit(1)来保存一个"是否是xxx"的数据, 虽然我们通常使用tinyint这个类型.

但是如果我们有多个"是否xxx"这种栏位, 一般来说就需要设置多个栏位, 而且需要设置多个索引.

对于这种只有1和0的栏位, 因为重复率很高, 索引是否会被使用, 也要看查询引擎的分析以及取舍.

我们现在就以3个这种栏位的简单需求来比较一下使用bit和tinyint的查询差异.

我们把这3种类型定义为t1, t2, t3. 一般我们就建这3个栏位. 某一个类型为true就设置为1, 否则设置0.

mysql会把bit类型当做整数来识别. 我们可以使用位操作, 也可以直接与整数比较大小.

如果用一个bit(3)来表示, 就是 b'xxx', 每一个类型占一个bit. 对应的整数值为: t1 * 1+ t2 * 2 + t3 *4

首先来定义一个表:

USE test;
CREATE TABLE `test`.`test_bit`(
    `id` int unsigned NOT NULL AUTO_INCREMENT,
    `types` bit(3) NOT NULL COMMENT '类型(000): 类型1使用第一位, 类型2使用第二位, 类型3使用第三位(从右往左)',
    `t1` tinyint NOT NULL COMMENT '是否是类型1. 是=1, 否=0',
    `t2` tinyint NOT NULL COMMENT  '是否是类型2. 是=1, 否=0',
    `t3` tinyint NOT NULL COMMENT  '是否是类型3. 是=1, 否=0',
    PRIMARY KEY(`id`)
) ENGINE=InnoDB COMMENT 'bit操作测试';

现在我们建立一个存储过程, 来批量插入数据:

USE test;
DROP PROCEDURE IF EXISTS `mysp_test_bit`;
DELIMITER $$
CREATE PROCEDURE `mysp_test_bit`(IN count int)
BEGIN
    DECLARE i int default 0;
    -- DECLARE rnd int;
    SET i = 0;
    WHILE i < count do
        INSERT INTO `test`.`test_bit`(`types`,`t1`,`t2`,`t3`)
        SELECT a.t , a.t1, a.t2, a.t3 
        FROM (
            SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重复1
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重复2
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重复3
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重复4
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重复5
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重复6
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重复7
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
        ) AS a  ORDER BY rand() LIMIT 0,50; -- 从56个中随机取50个
        SET i = i +1;
    END WHILE;
END$$
DELIMITER ;

在这个存储过程中, 我们把1-7的值重复8遍为56个数, 然后随机取50个(这里暂不考虑t1,t2,t3全部为0的情况).

执行2w次, 生成100w行数据:

CALL mysp_test_bit(20000);

好了, 100w的数据插入成功, 现在来添加索引:

ALTER TABLE `test`.`test_bit`
ADD INDEX IX_types(`types`)
, ADD INDEX IX_t1(`t1`)
, ADD INDEX IX_t2(`t2`)
, ADD INDEX IX_t3(`t3`);

看下数据分布情况:

1. 首先查询单个条件符合的情况

1.1 先看 t1=1的

对应的bit方式为 b'xx1', 包括001, 011,101,111, 对应的数值为:1,3,5,7

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'001' =1;-- 查询第1位为1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(1,3,5,7);-- 查询第1位为1的,时间与上面的差不多
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t1=1;-- 查询第1位为1的

执行4次的时间为:

上图中, 为了容易区分, 把常用的第三种查询方式的执行时间, 用红框圈出来了.

1.2 查询t2=1的情况

对应的bit方式为b'x1x',包括010,011,110,111对应的数值为: 2,3,6,7

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'010' =2;-- 查询第2位为1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types`IN(2,3,6,7);-- 查询第2位为1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t2=1;-- 查询第2位为1的

执行4次的时间为:

1.3 查询t3=1的情况

对应的bit方式为b'1xx',包括 100, 101, 110, 111 对应的数值为: 4,5,6,7

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'100' =4;-- 查询第3位为1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(4,5,6,7);-- 查询第3位为1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t3=1;-- 查询第3位为1的

执行4次的时间为:

总体上来看, 在查询一个条件时, 单个栏位的方式效率比较好, 因为bit方式使用了数据处理(这个一般是不推荐的)或IN查询

2. 现在我们来查询2个栏位同时符合要求的数据

2.1 查询第t1,t2都为1的

对应的bit为 b'x11',包括: 011,111,对应的数值为 3,7

数据总数: 285767

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'011' =3;-- 查询第1,2位都为1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(3,7);-- 查询第1,2位都为1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t1=1 AND t2=1;-- 查询第1,2位都为1的

执行4次的时间为:

2.2 查询第t2,t3都为1的

对应的bit是b'11x',包括110,111,对应的数值为: 6,7

数据总数: 285683

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'110' =6;-- 查询第2,3位都为1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(6,7);-- 查询第2,3位都为1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t2=1 AND t3=1;-- 查询第2,3位都为1的

执行4次的时间为:

2.3 查询第t1,t3都为1的

对应的bit是b'1x1',包括101,111,对应的数值为: 5,7

数据总数: 285641

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'101' =5;-- 查询第1,3位都为1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(5,7);-- 查询第1,3位都为1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t1=1 AND t3=1;-- 查询第1,3位都为1的

执行4次的时间为:

3. 查询t1, t2, t3都为1的

数据总数: 142836

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'111' =7;-- 查询第1,2,3位都为1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` = 7;-- 查询第1,2,3位都为1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t1=1 AND t2=1 AND t3=1;-- 查询第1,2,3位都为1的

执行4次的时间为:

4. 建立t1, t2, t3的联合索引

4.1 使用两两联合索引, 进行单个、两个、三个栏位的查询

ALTER TABLE `test`.`test_bit` DROP INDEX IX_t1, DROP INDEX IX_t2, DROP INDEX IX_t3;
ALTER TABLE `test`.`test_bit`  ADD INDEX IX_t1t2(t1,t2), ADD INDEX IX_t1t3(t1,t3), ADD INDEX IX_t2t3(t2,t3);

单个栏位符合条件的查询:

单个栏位符合条件的只查询一次

两个栏位同时符合条件的查询:

两个栏位同时符合条件的只查询一次

三个栏位同时符合条件的查询:

三个栏位同时符合条件的我们查询了3次

4.2 使用3个栏位的联合索引

ALTER TABLE `test`.`test_bit` DROP INDEX IX_t1t2, DROP INDEX IX_t1t3, DROP INDEX IX_t2t3;
ALTER TABLE `test`.`test_bit`  ADD INDEX IX_t1t2t3(t1,t2,t3);

单个栏位符合条件的查询:

单个栏位符合条件的只查询一次

两个栏位同时符合条件的查询:

两个栏位同时符合条件的只查询一次

三个栏位同时符合条件的查询:

三个栏位同时符合条件的我们查询了3次

大家还可以看下3种查询方式的explain, 以3个联合索引为例

  • types & b'011' =3 这种操作, 是走 Full Index Scan
  • types IN(3,7) 这种操作, 是走Index Range Scan
  • t1=1 AND t2=1 这种操作, 是走Non_Unique Key Lookup

5. Final, 都不要索引, 看一下

ALTER TABLE `test`.`test_bit`  DROP INDEX IX_t1t2t3, DROP INDEX IX_types;

对于这种有3个左右栏位需要进行1和0设置的, 使用一个栏位做索引 + IN查询的方式, 效果是比多个索引要好, 只是查询条件的处理会稍微复杂一些.

对于仅仅依靠MySQL自身的优化来说, 还是能起到一些效果的. 毕竟索引重复率太高以及索引列太多, 都不太友好.

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值