单表扫描,使用ref和range从索引获取数据一例

单表扫描,使用ref和range从索引获取数据一例
---执行计划的得到的结果,和实际执行结果正好相反

 

缘起: 与朋友交流,过程如下:

----------------------------------------

你好,在吗,问个优化器索引选择的问题
--请讲.
我最近遇到线上一个select语句,explain选择的索引是一样的,这个索引是两个字段
比如select * from t1 where a='xxx' and b>='123123',索引是a_b(a,b)
默认情况explain显示的索引访问方式是ref,而force index a_b则使用了range,range访问效果实际更好
--贴查询执行计划全部内容
| 1 | SIMPLE | subscribe_f8 | ref | PRIMARY,uid | uid | 8 | const | 13494670 | Using where; Using index
force index 之后
| 1 | SIMPLE | subscribe_f8 | range | uid | uid | 12 | NULL | 13494674 | Using where; Using index |
--2者计划差别不大
就是type从ref变成range了. force 之前key_length是8,force之后是12 . 其实应该是12才是合理的
--版本支持expalin format=JSON命令吗?支持则试试,有更详细的代价计算值
--show create table 看看?

发来详细的执行计划,见 执行计划结果一 。

 

执行计划结果一

select uid_from,create_time from subscribe_f8 where  uid=12345678 and  create_time > '2013-09-08 09:54:07.0'   order by create_time asc limit 5000 | {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `subscribe_f8`.`uid_from` AS `uid_from`,`subscribe_f8`.`create_time` AS `create_time` from `subscribe_f8` where ((`subscribe_f8`.`uid` = 12345678) and (`subscribe_f8`.`create_time` > '2013-09-08 09:54:07.0')) order by `subscribe_f8`.`create_time` limit 5000"
          }
        ]
      }
    },
    {
......
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`subscribe_f8`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "ref",
                      "index": "PRIMARY",
                      "rows": 1.36e7,
                      "cost": 3.01e6,
                      "chosen": true
                    },
                    {
                      "access_type": "ref",
                      "index": "uid",
                      "rows": 1.36e7,
                      "cost": 2.77e6,
                      "chosen": true

                    },
                    {
                      "access_type": "range",
                      "rows": 1.02e7,
                      "cost": 5.46e6,
                      "chosen": false

                    }
                  ]
                },
                "cost_for_plan": 2.77e6,
                "rows_for_plan": 1.36e7,
                "chosen": true
              }
            ]
          },
...
}

分析:
这个问题,执行计划指示使用ref效果更好,但实际执行时,指定使用range方式sql执行效率更高一些。
而且,通常情况下,ref的效率比range的效率要高,所以MySQL优先使用ref方式(这是一条启发式规则)。
但究竟是否使用ref或range,MySQL还需要通过代价估算进行比较再做决定。
代价估算是一个求近似值的过程,因为计算基于的一些值是估算得来的,并不十分精准,这就造成了计算误差。
但是,如果索引的选择率较低(如低于10%),则使用ref的效果好于range的效果的概率大。反过来说,如果索引的选择率较高,则ref未必range的效果好,但是因计算误差,使得执行计划得到了ref好于range的错误结论。
进一步讲,如果索引的选择率很高(如远高于10%,这是大概值,不精确),甚至数据存放是顺序连续的,有可能的是,尽管索引存在,但索引扫描的效果还差与全表扫描。
其他说明:尽管这个事例中的SQL使用了LIMIT子句,但其对ref和range方式的计算和比较,不构成影响。
进一步了解情况:
--这个查询,能得到多少行元组?  占全表的所有元组的百分比是多少?
去掉limit后,符合那个时间段的记录数占那个uid的88%,占全表记录数的的40%

进一步分析:
从更详细的查询执行计划看,查询执行计划结果一,显示了ref的cost是'2.77e6', 而range的cost是’5.46e6‘,这说明优化器理所当然地认为ref比range好。
可是,鉴于实际上索引选择率太高,使得使用索引已经没有意义(但优化器不知道这一信息),所以实际上使用’force index (uid) ‘会得到更好的执行效果。
这就是这个想象的答案。
深入代码分析:
best_access_path()函数中,比较了各种路径的代价。所以是使用ref还是range甚至full table scan,在这个函数中有计算和比较。
摘录代码中部分注释如下,能表明一些含义。
 /*
    Don't test table scan if it can't be better.
    Prefer key lookup if we would use the same key for scanning.
    Don't do a table scan on InnoDB tables, if we can read the used
    parts of the row from any of the used index.
    This is because table scans uses index and we would not win
    anything by using a table scan. The only exception is INDEX_MERGE
    quick select. We can not say for sure that INDEX_MERGE quick select
    is always faster than ref access. So it's necessary to check if
    ref access is more expensive.
    We do not consider index/table scan or range access if:
    1a) The best 'ref' access produces fewer records than a table scan
        (or index scan, or range acces), and
    1b) The best 'ref' executed for all partial row combinations, is
        cheaper than a single scan. The rationale for comparing
        COST(ref_per_partial_row) * E(#partial_rows)
           vs
        COST(single_scan)
        is that if join buffering is used for the scan, then scan will
        not be performed E(#partial_rows) times, but
        E(#partial_rows)/E(#partial_rows_fit_in_buffer). At this point
        in best_access_path() we don't know this ratio, but it is
        somewhere between 1 and E(#partial_rows). To avoid
        overestimating the total cost of scanning, the heuristic used
        here has to assume that the ratio is 1. A more fine-grained
        cost comparison will be done later in this function.
    (2) This doesn't hold: the best way to perform table scan is to to perform
        'range' access using index IDX, and the best way to perform 'ref'
        access is to use the same index IDX, with the same or more key parts.
        (note: it is not clear how this rule is/should be extended to
        index_merge quick selects)
    (3) See above note about InnoDB.
    (4) NOT ("FORCE INDEX(...)" is used for table and there is 'ref' access
             path, but there is no quick select)
        If the condition in the above brackets holds, then the only possible
        "table scan" access method is ALL/index (there is no quick select).
        Since we have a 'ref' access path, and FORCE INDEX instructs us to
        choose it over ALL/index, there is no need to consider a full table
        scan.
  */
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值