PostgreSQL中的索引——2（访问方法的接口）

参宿_七

已于 2022-03-22 10:45:21 修改

阅读量730

点赞数

分类专栏： # 索引文章标签： postgresql 数据库索引

于 2022-02-11 15:44:09 首次发布

原文链接：https://postgrespro.com/blog/pgsql/4161264

版权

索引专栏收录该内容

10 篇文章 9 订阅

订阅专栏

接口

操作符类与族

系统目录

接口

在第一篇文章中，我们提到，访问方法必须提供自我的信息。让我们看看访问方法接口的结构。

属性

所有访问方法的属性都被存储在"pg_am"表中（“am”表示access method，访问方法）。我们同样可以从这张表中获取可用方法的列表：

postgres=# select amname from pg_am;

 amname
--------
 btree
 hash
 gist
 gin
 spgist
 brin
(6 rows)

尽管顺序扫描可以正确地被称为访问方法，但因为一些历史原因，它不在这个列表中。

在PG 9.5版及更低的版本中，每个属性都用“pg_am”表中的一个单独的字段表示。从9.6版开始，属性需通过一些特定的函数查询，并且被分为以下几个层级：

访问方法的属性-“pg_indexam_has_property”（类的属性）
特定索引的属性-“pg_index_has_property”（对象的属性）
索引的各个列的属性-"pg_index_column_has_property"

访问方法层和索引层是分开的，到目前为止，基于一种访问方法的所有索引将始终具有相同的属性。

以下四种是访问方法的属性（以"btree"为例）：

postgres=# select a.amname, p.name, pg_indexam_has_property(a.oid,p.name)
from pg_am a,
     unnest(array['can_order','can_unique','can_multi_col','can_exclude']) p(name)
where a.amname = 'btree'
order by a.amname;

 amname |     name      | pg_indexam_has_property
--------+---------------+-------------------------
 btree  | can_order     | t
 btree  | can_unique    | t
 btree  | can_multi_col | t
 btree  | can_exclude   | t
(4 rows)

（其中unnest的说明参见：postgresql中unnest使用说明与示例_那些年我们正年轻的博客-CSDN博客_postgres unnest）

can_order：当索引已建成，访问方法允许我们指定值的排序方式（目前只运用于“btree”）。

can_unique：支持唯一约束与主键（只运用于"btree"）（pg的唯一索引与唯一约束在功能上基本一样，其实创建唯一约束时，pg会自动在该列上创建唯一索引，以便在插入数据时，快速检索是否与已有的值重复，但是两者均不能避免多个NULL值的存在，除非为NULL值另建唯一B树索引，参考：PG唯一索引与唯一约束.sql_岳麓丹枫001的博客-CSDN博客_pg 唯一索引）。

can_multi_col：可以在多列上建索引

can_exclude：支持排除约束EXCLUDE

以下属性与某一具体索引有关（让我们考虑一个现有的例子）

postgres=# select p.name, pg_index_has_property('t_a_idx'::regclass,p.name)
from unnest(array[
       'clusterable','index_scan','bitmap_scan','backward_scan'
     ]) p(name);

     name      | pg_index_has_property
---------------+-----------------------
 clusterable   | t
 index_scan    | t
 bitmap_scan   | t
 backward_scan | t
(4 rows)

clusterable：可以根据索引对行重新排序（使用相同名字的指令聚集CLUSTER）（该指令在物理存储上使索引列有序紧密存储）。

index_scan：支持索引扫描。尽管这一属性看起来很奇怪，但并不是所有索引都能逐个返回TID，有些会一次返回所有结果，并只支持位图扫描。（比如Bloom索引）

bitmap_scan：支持位图扫描。(所有的索引都支持位图扫描)

backward_scan：结果可以按照构建索引时指定的相反顺序返回。

最后，下面是列属性：

postgres=# select p.name,
     pg_index_column_has_property('t_a_idx'::regclass,1,p.name)
from unnest(array[
       'asc','desc','nulls_first','nulls_last','orderable','distance_orderable',
       'returnable','search_array','search_nulls'
     ]) p(name);


        name        | pg_index_column_has_property
--------------------+------------------------------
 asc                | t
 desc               | f
 nulls_first        | f
 nulls_last         | t
 orderable          | t
 distance_orderable | f
 returnable         | t
 search_array       | t
 search_nulls       | t
(9 rows)

asc, desc, nulls_first, nulls_last, orderable：这些属性与值的排序有关（我们将在描述"btree"索引时讨论它们）。

distance_orderable：结果可以按照操作确定的“距离大小”的排序顺序返回（目前仅应用于GiST和RUM）。

returnable：可以在不访问表的情况下使用索引，即，支持仅索引扫描。

search_array：支持使用表达式“indexed-field IN (list_of_constants)”来检索多个值，该表达式与"indexed-field=ANY(array_of_constants)"相同。

search_nulls：支持IS NULL与IS NOT NULL条件检索。

我们已经详细讨论了一些属性。一些属性特定于某些访问方法。我们将在考虑这些特定方法时讨论这些属性。

操作符类与族

除了所述接口公开的访问方法的属性外，还需要了解访问方法接受哪些数据类型和哪些运算符的信息。为此，PostgreSQL引入了运算符类和运算符族的概念。

运算符类包含一组用于索引操作特定数据类型的最小运算符（可能还有辅助函数）。

运算符类包含在某些运算符族中（运算符族为不同的访问方法设立，而运算符类为不同的数据类型设立）。此外，如果一些运算符类具有相同的语义，那么它们可以被包含在一个公共的运算符族中。例如，“整数运算”族包括“bigint”、“integer”和“smallint”类型的“int8运算”、“int4运算”和“int2运算”类，它们的大小不同，但含义相同：

postgres=# select opfname, opcname, opcintype::regtype
from pg_opclass opc, pg_opfamily opf
where opf.opfname = 'integer_ops'
and opc.opcfamily = opf.oid
and opf.opfmethod = (
      select oid from pg_am where amname = 'btree'
    );

   opfname   | opcname  | opcintype
-------------+----------+-----------
 integer_ops | int2_ops | smallint
 integer_ops | int4_ops | integer
 integer_ops | int8_ops | bigint
(3 rows)

另一个例子：“datetime_ops”族包括操纵日期（有时间和无时间）的操作符类：

postgres=# select opfname, opcname, opcintype::regtype
from pg_opclass opc, pg_opfamily opf
where opf.opfname = 'datetime_ops'
and opc.opcfamily = opf.oid
and opf.opfmethod = (
      select oid from pg_am where amname = 'btree'
    );

   opfname    |     opcname     |          opcintype          
--------------+-----------------+-----------------------------
 datetime_ops | date_ops        | date
 datetime_ops | timestamptz_ops | timestamp with time zone
 datetime_ops | timestamp_ops   | timestamp without time zone
(3 rows)

操作符族还可以包含其他操作符，以比较不同类型的值。分组到族中使规划器能够对具有不同类型值的谓词使用索引（text类型的谓词比如LIKE a%，可以在text类型的字段上根据文本的pattern进行索引，这样规划器就能使用该索引执行LIKE谓词了，下面有举例）。族还可以包含其他辅助功能。

在大多数情况下，我们不需要了解任何有关运算符族和类的信息。通常我们只是创建一个索引，默认情况下使用某个操作符类。

但是，我们可以显式地指定运算符类。这是一个需要显式指定的简单示例：在排序规则不同于C的数据库中，常规索引不支持LIKE操作：

postgres=# show lc_collate;

 lc_collate 
-------------
 en_US.UTF-8
(1 row)


postgres=# explain (costs off) select * from t where b like 'A%';

         QUERY PLAN          
-----------------------------
 Seq Scan on t
   Filter: (b ~~ 'A%'::text)
(2 rows)

我们可以通过使用操作符类“text_pattern_ops”创建一个索引来克服这个限制（注意计划中的条件是如何变化的）：

postgres=# create index on t(b text_pattern_ops);

postgres=# explain (costs off) select * from t where b like 'A%';

                          QUERY PLAN                          
----------------------------------------------------------------
 Bitmap Heap Scan on t
   Filter: (b ~~ 'A%'::text)
   ->  Bitmap Index Scan on t_b_idx1
         Index Cond: ((b ~>=~ 'A'::text) AND (b ~<~ 'B'::text))
(4 rows)

系统目录

在本文的最后，我们提供了系统目录中与运算符类和族直接相关的表的简化图。

不用说，所有这些表格都有详细的描述。

系统目录能让我们找到很多问题的答案，而不需要看文档。比如，某一访问方法可以操作哪些数据类型：

postgres=# select opcname, opcintype::regtype
from pg_opclass
where opcmethod = (select oid from pg_am where amname = 'btree')
order by opcintype::regtype::text;

       opcname       |          opcintype          
---------------------+-----------------------------
 abstime_ops         | abstime
 array_ops           | anyarray
 enum_ops            | anyenum
...

运算符类包含哪些运算符（并且因此知道，包含此类运算符的条件可以使用哪些索引访问）：

postgres=# select amop.amopopr::regoperator
from pg_opclass opc, pg_opfamily opf, pg_am am, pg_amop amop
where opc.opcname = 'array_ops'
and opf.oid = opc.opcfamily
and am.oid = opf.opfmethod
and amop.amopfamily = opc.opcfamily
and am.amname = 'btree'
and amop.amoplefttype = opc.opcintype;

        amopopr        
-----------------------
 <(anyarray,anyarray)
 <=(anyarray,anyarray)
 =(anyarray,anyarray)
 >=(anyarray,anyarray)
 >(anyarray,anyarray)
(5 rows)