postgresql bitmap理解

中年如酒

已于 2023-06-23 21:29:38 修改

阅读量1.6k

点赞数

分类专栏： Postgresql 文章标签： postgresql 数据库 database

于 2022-01-24 17:08:04 首次发布

本文链接：https://blog.csdn.net/weixin_43230594/article/details/122670432

版权

Postgresql 专栏收录该内容

38 篇文章 4 订阅

订阅专栏

benchmarksql=# EXPLAIN SELECT * FROM bmsql_customer WHERE c_city = 'San Mateo' or c_city='San Francisco';
QUERY PLAN
---------------------------------------------------------------------------
Bitmap Heap Scan on bmsql_customer (cost=8.94..56.34 rows=12 width=558)
Recheck Cond: (((c_city)::text = 'San Mateo'::text) OR ((c_city)::text =
'San Francisco'::text))
-> BitmapOr (cost=8.94..8.94 rows=12 width=0)
-> Bitmap Index Scan on city_idx (cost=0.00..4.47 rows=6 width=0)
Index Cond: ((c_city)::text = 'San Mateo'::text)
-> Bitmap Index Scan on city_idx (cost=0.00..4.47 rows=6 width=0)
Index Cond: ((c_city)::text = 'San Francisco'::text)
(7 rows)

In the preceding example, we are fetching all the customer information whose city is either
San Mateo or San Francisco. As per the preceding plan, the optimizer has generated
two bitmap pages for each city from the city_idx index. And these two bitmap pages are
merged using the BitmapOr operation, and the result is forwarded to the bitmap heap scan.
Once the bitmap index is created, the bitmap heap scan will perform sorting on the bitmap
page, and then fetch the records from the relation in the sequential order. In the preceding
example, the optimizer has created two bitmap pages for each city, and we can limit this
number by using the IN operator. The bitmap heap scan cost is indirectly proportional to
the number of bitmap pages. Let’s observe the following query, which retrieves similar
results to the previous example:

上面样本输出的执行计划，fetch所有城市是San Mateo 或者 San Francisco的客户信息，按照这个执行计划，优化器为产生了两个bitmap pages，一个为San Mateo,另一个为San Mateo，然后透过BitmapOr Operation merged这两个bitmap page,并将结果转发到bitmap heap scan,一旦bitmap index被建立，bitmap heap scan 将在bitmap page上执行sort,然后就可以按照sequential order从relation fetch表记录（这也是sort的原因，因为sequential 比random read效率更好），在这个范本中，优化器为这两个wehere条件产生了两个bitmap page，其实可以透过使用in操作限制bitmap pages的数量，位图堆扫描成本与位图页数成间接比例，让我们观察以下查询，它检索到与上一个示例相似的结果：

benchmarksql=# EXPLAIN SELECT * FROM bmsql_customer WHERE c_city IN
('San Mateo','San Francisco');
QUERY PLAN
-----------------------------------------------------------------------
Bitmap Heap Scan on bmsql_customer (cost=8.94..56.30 rows=12 width=558)
Recheck Cond: ((c_city)::text = ANY ('{"San Mateo","San Francisco"}'::text[]))
-> Bitmap Index Scan on city_idx (cost=0.00..8.94 rows=12 width=0)
Index Cond: ((c_city)::text = ANY ('{"San Mateo","San
Francisco"}'::text[]))

bitmap heap scan图解如下：在这里插入图片描述
1.首先scan index产生bitmap index
2.然后将结果发送到bitmap heap scan
3.bitmap heap scan 根据结果去fetch表记录（这里应该少了前述的sort操作）
index scan : 是对于给定的查询，先扫描一遍索引，从索引中找到符合要求的记录的指针，再定位到表中具体的page去取。
4.普通的index scan每次从索引中获取一个元组指针，并立即访问表中的元组，那么一个 PAGE 有可能被多次访问，而位图扫描一次性从索引中获取所有的元组指针，使用内存中的“位图”数据结构对它们进行排序，然后按物理元组位置顺序访问表元组。
它的核心思想是单个page在扫描期间只访问一次。
5、Bitmap Scan的缺点是什么？
从上面的分析中可以看出，Bitmap的优化是通过bitmap的生成过程中增加内存与CPU的消耗来减少IO消耗。
第4、5点总结摘自：https://blog.51cto.com/u_15064650/2884485