Indexes are an important structure for database performance. You can refer to previous posts for the different types of indexes supported by Postgres. These indexes are accessed by Index Scan, Index Only Scan, and Bitmap Index Scan. In this post, we will look more closely at the Bitmap Index Scan.
Index Scan vs Bitmap Index Scan
Compare the operation methods and features with the Index Scan.
Index Scan | Bitmap Index Scan | |
---|---|---|
Operation | Index scan reads the index in alternation, bouncing between table and index, row at a time. | Scans all index rows before examining base table.This populates a TID bitmap. |
Scan Property | Random I/O against the base table. Read a row from the index, then a row from the table, and so on. | Table I/O is sequential, results in physical order. |
LIMIT clause | useful in combination with LIMIT | Handles LIMIT poorly |
–
Postgres scans an index and finds a matching key. It can choose not to read the matching row from the table right away. Instead, it can remember the page of the matching row, finish scanning the index, and read each data page with matching rows only once. As a result, the page reads are reduced.
The most important feature of the above bitmap index scan is sequential scan of table block by TID bitmap. An important factor for the planner is to select the bitmap index scan when the random I/O becomes large, based on the index scan. Therefore, if the number of fetches for an index is large, a bitmap index scan is more likely to be selected. It is not the only number that is affected. The locality of the column values in the index is also an important factor. You can check the locality of column values with vacuum and the pg_stats query.
VACCUM FULL test2;
SELECT attname, correlation FROM pg_stats WHERE tablename = 'test2';
attname | correlation
---------+-------------
major | 1
minor | -0.00180273
In the above example, the major column has a high locality, so query planner prefer Index Scan even if the number of fetches is large.
Below is an example of a Bitmap Index Scan.
EXPLAIN SELECT * FROM test2 WHERE minor BETWEEN 1 AND 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=22.61..2592.43 rows=994 width=12)
Recheck Cond: ((minor >= '1'::double precision) AND (minor <= '1000'::double precision))
-> Bitmap Index Scan on minor_rand (cost=0.00..22.36 rows=994 width=0)
Index Cond: ((minor >= '1'::double precision) AND (minor <= '1000'::double precision))
Bitmap Heap Scan uses a bitmap to get the heap tuple. When the number of index rows is small, it points directly to the heap tuple. If the number of rows to keep in the bitmap increases, it will point to the page that contains the row. If bitmap points to a page, “Recheck Cond” is used to distinguish the row that satisfies the actual condition.
Another key feature of Bitmap Index Scan is that it can be used instead of mutli-column index.
Mutli-column Index
If you use multiple columns in conditional expressions, you can consider multi-column indexes to improve performance.
After creating an index for that sort of situation:
CREATE TABLE test2 (
major int,
minor int,
name varchar
);
CREATE INDEX test2_mm_idx ON test2 (major, minor);
The index could be used for queries that look like the following:
SELECT * FROM test2 WHERE major = 1 AND minor = 8;
SELECT * FROM test2 WHERE major = 1;
But it is unlikely to be useful in the following form:
SELECT * FROM test2 WHERE minor = 8;
SELECT * FROM test2 WHERE major = 1 OR minor = 8;
Using Bitmap Index Scan
Had you instead created two indexes for both major and mior, those would have satisfied either individual type of search, as well as the combined one.
ALTER table test2 ADD CONSTRAINT pk_test2 PRIMARY KEY(major);
CREATE INDEX minor_rand ON test2 (major, minor);
EXPLAIN SELECT * FROM test2 WHERE major = 1 OR minor = 8;
QUERY PLAN
-------------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=8.87..20.71 rows=3 width=12)
Recheck Cond: ((major = 1) OR (minor = '8'::double precision))
-> BitmapOr (cost=8.87..8.87 rows=3 width=0)
-> Bitmap Index Scan on pk_test2 (cost=0.00..4.43 rows=1 width=0)
Index Cond: (major = 1)
-> Bitmap Index Scan on minor_rand (cost=0.00..4.44 rows=2 width=0)
Index Cond: (minor = '8'::double precision)
The query planner has the ability to combine and use multiple single-column indexes in a multi-column query by performing a bitmap index scan. In general, you can create an index on every column that covers query conditions As always, multi-column indexes can only optimize the queries that reference the columns in the index in the same order, while multiple single column indexes provide performance improvements to a larger number of queries.
However, it has a trade-off. The result is likely to be larger and more overhead to update than the multi-column version. The downside is less that flexibility and efficiency. Therefore, it is better to consider individual indexes first and to use multi-columns after benchmarking the query application
英文内容来源
https://bitnine.net/blog-useful-information/bitmap-index-scan/