Bitmap Index Scan
数据库里面的表的扫描方式主要是以下几种方式:sequential scans, index scans, and bitmap index scans,当然还有index only scan,这种算是index scans中比较特殊的一种,需要的信息在索引中都能找到,扫描索引即可,不需要去扫描表。
这篇文章主要谈谈Bitmap Index Scan的原理及适用的场景。
定义
A plain indexscan fetches one tuple-pointer at a time from the index,and immediately visits that tuple in the table. A bitmap scan fetches all the tuple-pointers from the index in one go, sorts them using an in-memory "bitmap" data structure, and then visits the table tuples in physical tuple-location order. The bitmap scan improves locality of reference to the table at the cost of more bookkeeping overhead to manage the "bitmap" data structure --- and at the cost that the data is no longer retrieved in index order, which doesn't matter for your query but would matter if you said ORDER BY.
A bitmapped index scan works in two stages. First the index or indexes are scanned to create a bitmap representing matching tuple.
这段解释是tom lane在postgres邮件组中的回答,我觉得是比较权威而且通俗易懂的解释。
核心是传统的index scan每次从索引中去取一个tuple的指针,然后立马去表中取数据,每一次会造成一次随机io。如果数据量较多的情况下,会比较低效。而bitmap scan一次性将符合条件的tuple-pointers全部取出来,然后在内存中进行地址排序,然后去取出数据,这时的读取数据由于进行的地址排序,读取时就变成了顺序的读。其实就是一个随机读转化为顺序读取的过程,但是取出的数据由于进行了地址的排序,就没有顺序。同时,对于limit这种sql,bitmap index scan这种就不适合,因为它一次会取出所有数据。
和传统索引扫描对比:
在读取数据量比较小时,index scan比较合适,在读取数据量比较大的情况下,bitmap index scan会更有优势。
下面通过实验验证
建立测试表
创建测试表
CREATE TABLE sampletable (x numeric);
插入数据
INSERT INTO sampletable SELECT random() * 10000 FROM generate_series(1, 10000000);
全表扫描
postgres=# explain (analyze,buffers) select * from sampletable where x = 12;
QUERY PLAN
------------------------------------------------------------------------------------------------