PostgreSQL 不能索引的10个原因

最新推荐文章于 2025-03-08 16:12:54 发布

胖达李七七

最新推荐文章于 2025-03-08 16:12:54 发布

阅读量5.6k

点赞数 2

分类专栏： postgresql 文章标签： postgresql 性能优化

本文链接：https://blog.csdn.net/forrest_hu/article/details/131945884

版权

postgresql 专栏收录该内容

30 篇文章

订阅专栏

一、索引列存在多个or连接

当查询条件中存在多个OR连接时，PostgreSQL需要将所有条件的结果集进行合并，而这个合并操作可能会导致索引失效。

模拟环境

postgres=# create table idxidx as select * from pg_class;
SELECT 445
postgres=# create index idx_11 on idxidx(oid);
CREATE INDEX

测试情况

一个or连接两个索引列（走索引）

postgres=# explain analyze select oid,relname,relnamespace  from idxidx where oid =17726  or oid=17743;
                                                    QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on idxidx  (cost=8.56..14.14 rows=2 width=72) (actual time=0.018..0.019 rows=1 loops=1)
  Recheck Cond: ((oid = '17726'::oid) OR (oid = '17743'::oid))
  Heap Blocks: exact=1
  ->  BitmapOr  (cost=8.56..8.56 rows=2 width=0) (actual time=0.012..0.013 rows=0 loops=1)
        ->  Bitmap Index Scan on idx_11  (cost=0.00..4.28 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)
              Index Cond: (oid = '17726'::oid)
        ->  Bitmap Index Scan on idx_11  (cost=0.00..4.28 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1)
              Index Cond: (oid = '17743'::oid)
Planning Time: 0.061 ms
Execution Time: 0.038 ms
(10 rows)

两个or连接三个索引列（走全表扫描）

postgres=# explain analyze select oid,relname,relnamespace  from idxidx where oid =17726 or oid=17765 or oid=17743;
                                           QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
Seq Scan on idxidx  (cost=0.00..19.79 rows=3 width=72) (actual time=0.012..0.064 rows=1 loops=1)
  Filter: ((oid = '17726'::oid) OR (oid = '17765'::oid) OR (oid = '17743'::oid))
  Rows Removed by Filter: 444
Planning Time: 0.059 ms
Execution Time: 0.079 ms
(5 rows)

要避免这种情况，可以尝试对查询条件进行重写，例如使用UNION ALL连接多个查询条件,例如如如下这种方式

postgres=# explain analyze select oid,relname,relnamespace  from idxidx where oid =17726 union all  select oid,relname,relnamespace  from idxidx where oid=17765 union all  select oid,relname,relnamespace  from idxidx where oid=17743;
                                                         QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
Append  (cost=0.27..24.92 rows=3 width=72) (actual time=0.041..0.046 rows=1 loops=1)
  ->  Index Scan using idx_11 on idxidx  (cost=0.27..8.29 rows=1 width=72) (actual time=0.041..0.042 rows=1 loops=1)
        Index Cond: (oid = '17726'::oid)
  ->  Index Scan using idx_11 on idxidx idxidx_1  (cost=0.27..8.29 rows=1 width=72) (actual time=0.002..0.002 rows=0 loops=1)
        Index Cond: (oid = '17765'::oid)
  ->  Index Scan using idx_11 on idxidx idxidx_2  (cost=0.27..8.29 rows=1 width=72) (actual time=0.001..0.001 rows=0 loops=1)
        Index Cond: (oid = '17743'::oid)
Planning Time: 0.169 ms
Execution Time: 0.082 ms
(9 rows)

二、数据量太小

对于非常小的表或者索引，使用索引可能会比全表扫描更慢。这是因为使用索引需要进行额外的 I/O 操作，而这些操作可能比直接扫描表更慢。

模拟环境

postgres=# create table tn(id int,name varchar);
CREATE TABLE
postgres=# insert into tn values(1,'ysl');
INSERT 0 1
postgres=# insert into tn values(2,'ysl');
INSERT 0 1
postgres=# insert into tn values(2,'ysll');
INSERT 0 1
postgres=# insert into tn values(2,'ysll');
INSERT 0 1

postgres=# create index idx_tn on tn(id);
CREATE INDEX
postgres=# \d tn
                     Table "public.tn"
Column |       Type        | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
id     | integer           |           |          |
name   | character varying |           |          |
Indexes:
   "idx_tn" btree (id)

postgres=# select * from tn;
id | name
----+------
 1 | ysl
 2 | ysl
 2 | ysll
 2 | ysll
(4 rows)

测试

postgres=# explain analyze select * from tn where id=2;
                                        QUERY PLAN                                          
---------------------------------------------------------------------------------------------
Seq Scan on tn  (cost=0.00..1.05 rows=1 width=36) (actual time=0.007..0.007 rows=3 loops=1)
  Filter: (id = 2)
  Rows Removed by Filter: 1
Planning Time: 0.053 ms
Execution Time: 0.021 ms
(5 rows)

postgres=# explain analyze select * from tn where id=1;
                                        QUERY PLAN                                          
---------------------------------------------------------------------------------------------
Seq Scan on tn  (cost=0.00..1.05 rows=1 width=36) (actual time=0.011..0.012 rows=1 loops=1)
  Filter: (id = 1)
  Rows Removed by Filter: 3
Planning Time: 0.057 ms
Execution Time: 0.026 ms
(5 rows

三、选择性不好

如果索引列中有大量重复的数据，或者一个字段全是一个值，这个时候，索引可能并不能发挥它的作用，起到加快检索的作用，因为这个索引并不能显著地减少需要扫描的行数，所以计算的代价可能远远大于走别的执行计划的代价。

基数：数据库基数是指数据库中不同值的数量

select count(distinct column_name)  from table_name;

选择性：基数和总行数的比值再乘以100%就是某个列的选择性。

select count(distinct column_name) /count（column_name）* 100% from table_name;

模拟环境

postgres=# create table tb_t1 as select * from pg_class;
SELECT 465
postgres=# create index idx_tb_t1 on tb_t1(oid);
CREATE INDEX

测试

可以看到，原本oid这一列，选择性较好，分布较均匀的时候，可以正常使用到索引。而选择性不好的情况下，则

postgres=# explain analyze select * from tb_t1 where oid=17726;
                                                   QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tb_t1  (cost=4.29..9.86 rows=2 width=236) (actual time=0.024..0.025 rows=1 loops=1)
  Recheck Cond: (oid = '17726'::oid)
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on idx_tb_t1  (cost=0.00..4.29 rows=2 width=0) (actual time=0.021..0.021 rows=1 loops=1)
        Index Cond: (oid = '17726'::oid)
Planning Time: 0.220 ms
Execution Time: 0.059 ms
(7 rows)

postgres=# update tb_t1 set oid=1;
UPDATE 465
postgres=# reindex  index idx_tb_t1;
REINDEX

postgres=# explain analyze select * from tb_t1 where oid=1;
                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
Seq Scan on tb_t1  (cost=0.00..29.81 rows=465 width=274) (actual time=0.013..0.080 rows=465 loops=1)
  Filter: (oid = '1'::oid)
Planning Time: 0.344 ms
Execution Time: 0.111 ms
(4 rows)

上边的这个例子，在我做完update后，列的基数是select count(distinct oid) from tb_t1;也就是1。而选择性是select count(distinct oid)/count(oid) 100% from tb_t1;也就是1/465 100% 选择性特别低。索引不能起到减少扫描的行数，反而在原本的基础上多了回表的动作，代价就增多了。因此CBO没有选择走这个索引的执行计划。

四、查询条件模糊

如果查询条件模糊，例如使用了不等于（<>）、LIKE等运算符或者使用了函数等，那么索引可能无法被使用。

因为正常情况下，等于（=）操作符可以直接利用B-tree或哈希索引进行查找。这是因为，这些操作符只需要在索引树中查找与给定值相等的项，就可以快速地定位到符合条件的记录。

而不等于（<>）操作符则需要查找所有不符合条件的记录，这会导致需要遍历整个索引树来找到匹配的记录，因此使用索引的成本比全表扫描更高。
LIKE操作符也可能导致不使用索引。这是因为，LIKE操作符通常需要执行模糊匹配，即查找包含你给的关键字的记录。虽然可以使用B-tree索引进行模糊匹配，但是如果模式以通配符开头（例如’%abc’），则索引将不会被使用，因为这种情况下需要遍历整个索引树来查找符合条件的记录。

这两种方式在列上有索引的时候，都是不能显著地减少需要扫描的行数。甚至加大SQL执行的代价，因此可能上边的索引不会被CBO选择为最后最优的执行计划。

模拟环境

postgres=# create table tb_l1 as select * from pg_class;
SELECT 465
postgres=# create index idx_tb_l1 on tb_l1(oid);
CREATE INDEX

测试

postgres=# explain analyze select * from tb_l1 where oid=17726;
                                                   QUERY PLAN                                                    
-------------------------------------------------------------------------------------------------------------------
Index Scan using idx_tb_l1 on tb_l1  (cost=0.27..8.29 rows=1 width=274) (actual time=0.029..0.030 rows=1 loops=1)
  Index Cond: (oid = '17726'::oid)
Planning Time: 0.473 ms
Execution Time: 0.083 ms
(4 rows)

postgres=# explain analyze select * from tb_l1 where oid<>17726;
                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
Seq Scan on tb_l1  (cost=0.00..17.81 rows=464 width=274) (actual time=0.007..0.103 rows=464 loops=1)
  Filter: (oid <> '17726'::oid)
  Rows Removed by Filter: 1
Planning Time: 0.069 ms
Execution Time: 0.132 ms
(5 rows)

五、表的一个列上有重复索引

在PostgreSQL里，是允许在一列上建立多个索引的，也就是如下这种方式，是不会报错说索引重复的，这也就导致了，使用过程中表上可能存在多余的重复索引，索引不会全部被使用到，而且可能引起性能问题。

postgres=# create index idx_tb_l1 on tb_l1(oid);
CREATE INDEX
postgres=# create index idx_tb_l2 on tb_l1(oid);
CREATE INDEX

模拟环境

postgres=# create table tb_l1 as select * from pg_class;
SELECT 465
postgres=# create index idx_tb_l1 on tb_l1(oid);
CREATE INDEX
postgres=# create index idx_tb_l2 on tb_l1(oid);
CREATE INDEX
postgres=# \d tb_l1
                       Table "public.tb_l1"
      Column        |     Type     | Collation | Nullable | Default
---------------------+--------------+-----------+----------+---------
oid                 | oid          |           |          |
relname             | name         |           |          |
relnamespace        | oid          |           |          |
reltype             | oid          |           |          |
reloftype           | oid          |           |          |
relowner            | oid          |           |          |
... ...
... ...
Indexes:
   "idx_tb_l1" btree (oid)
   "idx_tb_l2" btree (oid)

测试

postgres=# explain analyze select * from tb_l1 where oid=17726;                                                    QUERY PLAN                                                    
-------------------------------------------------------------------------------------------------------------------
Index Scan using idx_tb_l2 on tb_l1  (cost=0.27..8.29 rows=1 width=274) (actual time=0.025..0.025 rows=1 loops=1)
  Index Cond: (oid = '17726'::oid)
Planning Time: 0.364 ms
Execution Time: 0.043 ms
(4 rows)

测试可以看到，在一个表的同一列上的两个索引其实作用是一样的，仅仅名字不一样，属于重复索引，这种情况下，就算用到索引，同一时刻也就会使用到一个索引。

使用如下的SQL可以找到数据库里的重复索引，可以定期巡检的时候进行检查，并在确认后合理优化掉重复的索引

SELECT
 indrelid :: regclass              AS table_name,
 array_agg(indexrelid :: regclass) AS indexes
FROM pg_index
GROUP BY
 indrelid, indkey
HAVING COUNT(*) > 1;
一个执行的结果如下所示：
postgres=# SELECT
 indrelid :: regclass              AS table_name,
 array_agg(indexrelid :: regclass) AS indexes
FROM pg_index
GROUP BY
 indrelid, indkey
HAVING COUNT(*) > 1;
table_name |        indexes        
------------+-----------------------
tb_l1      | {idx_tb_l1,idx_tb_l2}
t1         | {ind1,idx2}
(2 rows)

postgres=# \di+ idx_tb_l1
                                       List of relations
Schema |   Name    | Type  |  Owner  | Table | Persistence | Access method | Size  | Description
--------+-----------+-------+---------+-------+-------------+---------------+-------+-------------
public | idx_tb_l1 | index | xmaster | tb_l1 | permanent   | btree         | 32 kB |
(1 row)

postgres=# \di+ idx_tb_l2
                                       List of relations
Schema |   Name    | Type  |  Owner  | Table | Persistence | Access method | Size  | Description
--------+-----------+-------+---------+-------+-------------+---------------+-------+-------------
public | idx_tb_l2 | index | xmaster | tb_l1 | permanent   | btree         | 32 kB |
(1 row）

六、优化器选项关闭了索引扫描

PostgreSQL里有着很多的可以影响优化器的参数，例如enable_indexscan,enable_bitmapscan,enable_hashjoin,enable_sort等等，这些参数可以在session，用户，数据库级别进行设置。可以通过设置这些参数的值，来改变相关SQL执行时的执行计划。但是需要注意的是，为了个别的SQL，去盲目改变这些参数的值，往往是得不偿失的，操作的时候需要严谨并且仔细考虑，否则，这些类型的参数的改变，对于数据库的性能影响可能是巨大的。

模拟环境

postgres=# create table tb_l1 as select * from pg_class;
SELECT 465
postgres=# create index idx_tb_l1 on tb_l1(oid);
CREATE INDEX

测试

开启了对应优化器选项
postgres=# show enable_indexscan ;
enable_indexscan
------------------
on
(1 row)

postgres=# show enable_bitmapscan ;
enable_bitmapscan
-------------------
on
(1 row)

postgres=# explain analyze select * from tb_l1 where oid=17721;
                                                   QUERY PLAN                                                    
-------------------------------------------------------------------------------------------------------------------
Index Scan using idx_tb_l2 on tb_l1  (cost=0.27..8.29 rows=1 width=274) (actual time=0.017..0.018 rows=1 loops=1)
  Index Cond: (oid = '17721'::oid)
Planning Time: 0.088 ms
Execution Time: 0.038 ms
(4 rows)

关闭对应的优化器选项，可以看到CBO受到设置的参数的影响，选择了seq scan的执行计划，而没有用到字段上的索引。

postgres=# set enable_indexscan=off;
SET
postgres=# set enable_bitmapscan=off;
SET
postgres=# explain analyze select * from tb_l1 where oid=17721;
                                           QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
Seq Scan on tb_l1  (cost=0.00..17.81 rows=1 width=274) (actual time=0.024..0.137 rows=1 loops=1)
  Filter: (oid = '17721'::oid)
  Rows Removed by Filter: 464
Planning Time: 0.079 ms
Execution Time: 0.192 ms
(5 rows)

七、统计信息不准确

因为CBO本身是基于代价的优化器，而计算代价要根据统计信息去做计算，统计信息不准确，得到的执行计划可能不是最优，这一点不做具体的举例。

八、Hints影响执行计划

PostgreSQL数据库里有着像ORACLE里类似的Hints功能，即pg_hint_plan工具，用Hints能够改变sql语句的执行计划，hint就是优化器的一种指示。虽然功能上和效果是类似的，但是PostgreSQL和ORACLE的Hints并不完全一致的，例如全表扫描等的关键字是不同的，需要进行区分。

准备环境

数据库需安装pg_hint_plan插件

postgres=# create table test_hint(id int,c varchar(100));
CREATE TABLE
postgres=# insert into test_hint select i,'test'||i from generate_series(1,10000) i;
INSERT 0 10000
postgres=# create index idx_test_hint_id on test_hint(id);
CREATE INDEX

测试

默认会走索引扫描，但是使用了hint，让其走了seqscan，没有使用到对应的字段上的索引。

postgres=# explain analyze select * from test_hint where id=10;
                                                        QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_test_hint_id on test_hint  (cost=0.29..8.30 rows=1 width=12) (actual time=0.008..0.008 rows=1 loops=1)
  Index Cond: (id = 10)
Planning Time: 0.111 ms
Execution Time: 0.024 ms
(4 rows)

postgres=# explain analyze select /*+seqscan(t) */ * from test_hint t where id=10;
                                              QUERY PLAN                                              
--------------------------------------------------------------------------------------------------------
Seq Scan on test_hint t  (cost=0.00..180.00 rows=1 width=12) (actual time=0.022..2.691 rows=1 loops=1)
  Filter: (id = 10)
  Rows Removed by Filter: 9999
Planning Time: 0.311 ms
Execution Time: 2.712 ms
(5 rows)

九、查询条件中使用函数

当查询条件中包含函数调用时，PostgreSQL里可能无法使用索引，因为它需要对所有数据进行计算，而不是只计算索引值。

准备环境

postgres=# CREATE TABLE test_table (
postgres(#     id SERIAL PRIMARY KEY,
postgres(#     name TEXT,
postgres(#     age INTEGER
postgres(# );
CREATE TABLE
postgres=#
postgres=# CREATE INDEX age_index ON test_table(age);
CREATE INDEX
postgres=# INSERT INTO test_table (name, age) VALUES
postgres-#     ('Alice', 25),
postgres-#     ('Bob', 30),
postgres-#     ('Charlie', 35),
postgres-#     ('David', 40),
postgres-#     ('Eve', 45),
postgres-#     ('Frank', 50);
INSERT 0 6
postgres=# CREATE OR REPLACE FUNCTION search_age(p_age INTEGER) RETURNS SETOF test_table AS $$
postgres$# BEGIN
postgres$#     RETURN QUERY SELECT * FROM test_table WHERE age > p_age;
postgres$# END;
postgres$# $$ LANGUAGE plpgsql;
CREATE FUNCTION

测试

可以看到，当查询条件中包含函数调用时，没有使用到索引，而是使用了一个 Function Scan。这个Function Scan也是一种特殊的扫描方式，是从函数中获取数据。PostgreSQL会调用指定的函数来处理查询结果，并且会为函数的输出结果创建一个虚拟的关系表，以便后续的节点可以使用这个关系表继续执行查询。

postgres=# EXPLAIN ANALYZE SELECT * FROM test_table WHERE age > 35;
                                                    QUERY PLAN                                                    
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_table  (cost=7.25..22.25 rows=400 width=40) (actual time=0.008..0.009 rows=3 loops=1)
  Recheck Cond: (age > 35)
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on age_index  (cost=0.00..7.15 rows=400 width=0) (actual time=0.004..0.004 rows=3 loops=1)
        Index Cond: (age > 35)
Planning Time: 0.075 ms
Execution Time: 0.027 ms
(7 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM search_age(35);
                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
Function Scan on search_age  (cost=0.25..10.25 rows=1000 width=40) (actual time=0.147..0.147 rows=3 loops=1)
Planning Time: 0.027 ms
Execution Time: 0.162 ms
(3 rows)

十、查询条件中有不等于运算符

因为在索引扫描期间，不等于运算符会导致索引中的每一行都需要进行比较，因此需要走全表扫描，不会走索引。

环境准备

postgres=# create table tb_l1 as select * from pg_class;
SELECT 465
postgres=# create index idx_tb_l1 on tb_l1(oid);
CREATE INDEX

测试

postgres=# explain analyze select * from tb_l1 where oid<>17721;
                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
Seq Scan on tb_l1  (cost=0.00..17.81 rows=464 width=274) (actual time=0.007..0.063 rows=464 loops=1)
  Filter: (oid <> '17721'::oid)
  Rows Removed by Filter: 1
Planning Time: 0.064 ms
Execution Time: 0.091 ms
(5 rows)

postgres=# explain analyze select * from tb_l1 where oid =17721;
                                                   QUERY PLAN                                                    
-------------------------------------------------------------------------------------------------------------------
Index Scan using idx_tb_l2 on tb_l1  (cost=0.27..8.29 rows=1 width=274) (actual time=0.019..0.021 rows=1 loops=1)
  Index Cond: (oid = '17721'::oid)
Planning Time: 0.107 ms
Execution Time: 0.051 ms
(4 rows)

postgres=# explain analyze select * from tb_l1 where oid !=17721;
                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
Seq Scan on tb_l1  (cost=0.00..17.81 rows=464 width=274) (actual time=0.012..0.072 rows=464 loops=1)
  Filter: (oid <> '17721'::oid)
  Rows Removed by Filter: 1
Planning Time: 0.071 ms
Execution Time: 0.102 ms
(5 rows)