postgres创建索引分析

最新推荐文章于 2024-08-03 11:54:40 发布

siwluxuefeng

最新推荐文章于 2024-08-03 11:54:40 发布

阅读量1k

点赞数

分类专栏： psql

本文链接：https://blog.csdn.net/Linzhongyilisha/article/details/78811132

版权

psql 专栏收录该内容

30 篇文章 0 订阅

订阅专栏

查看每列的散列程度
select tablename,attname,n_distinct,correlation from pg_stats where tablename='table_name';
n_distinct解释
-1表示唯一，也就是说这个列的每一行都不一样.
>=1时，表示这个列有多少唯一值.
<1时，表示这个列的唯一值数量/总数.
correlation解释
表示该列与数据堆存储的线性相关性, 1表示正向完全相关。越接近0表示数据分布越离散。<0表示反向相关。

btree:
（1）索引适合选择性好的列（n_distinct很大，或者=-1），唯一值比例越高越适合btree。
（2）线性相关越好，扫描或返回多条数据的效率越高。

gin:
（1）与btree相反，选择性越差，采用GIN索引效率越高。
（2）GIN的倒排特性，还特别适合多值类型的元素组合查询，例如数组、全文检索类型、TOKEN类型、等等。
（3）GIN索引接口是开放的，用户可以根据数据特征，自定义GIN索引。支持更多的数据类型。
（4）对于松散列，建立gin索引（倒排），以达到最好的效果；
（5）为了让普通类型支持gin，需要创建btree_gin插件；
postgres=# create extension btree_gin;
CREATE EXTENSION
（6）如果是普通类型，则线性相关越好，扫描或返回多条数据的效率越高。
（7）如果是多值类型（如数组、全文检索、TOKENs），则元素越集中（元素聚类分析，横坐标为行号，纵坐标为元素值，数据分布越集中），效率越高。

brin:
（1）当数据与堆存储线性相关性很好时，可以采用BRIN索引。
（2）BRIN是块级索引，存储每个（或者每一段连续的）数据块的原子信息（最大值，最小值，平均值，空值比例，COUNT等）。
（3）默认是每128个连续数据块区间存储一个字段取值的区间，所以这个索引的信息量是将1000个数据块划分为几个连续的128个块的区间，然后存储每个区间ID值的取值范围
（4）特别适合范围扫描。
（5）线性相关越好，扫描或返回多条数据的效率越高。
==================================================test_data============================================

postgres=# analyze test_data ;
ANALYZE

MH=> select tablename,attname,n_distinct,correlation from pg_stats where tablename='test_data';
tablename | attname | n_distinct | correlation
---------------+----------------+------------+-------------
test_data | data_id | -1 | 0.471723
test_data | parent_house_id| 421535 | 0.469814
test_data | test_id | 2 | 0.215617
test_data | data_name | 294808 | -0.0140928
test_data | size | 7730 | 0.529041
test_data | type | 12 | 0.624537
test_data | ident_Id | 2 | 0.996693
test_data | city_Id | 3 | 0.996512
test_data | create_ts | 83770 | 0.119125
test_data | modify_ts | 36360 | 0.249871
(10 rows)

MH=>

分析：
data_id btree
parent_house_id btree
test_id 不建索引
size btree
type brin
ident_Id brin
city_Id brin
data_name create_ts modify_ts gin

建索引：
create index idx_test_data_1 on test_data using btree (data_id);
create index idx_test_data_2 on test_data using btree (parent_house_id);
create index idx_test_data_3 on test_data using btree (size);

create index idx_test_data_4 on test_data using brin (type);
create index idx_test_data_5 on test_data using brin (ident_Id);
create index idx_test_data_6 on test_data using brin (city_Id);

create index idx_test_data_7 on test_data using gin (data_name,create_ts,modify_ts);

===============================test_house：=======================================
postgres=# analyze test_house ;
ANALYZE

MH=> select tablename,attname,n_distinct,correlation from pg_stats where tablename='test_house';
tablename | attname | n_distinct | correlation
---------------+----------------+------------+-------------
test_house | house_id | -1 | 0.405785
test_house | parent_house_id | 365044 | 0.402222
test_house | test_id | 2 | 0.232351
test_house | house_name | 86129 | 0.062862
test_house | type | 5 | 0.874647
test_house | ident_Id | 3 | 0.998939
test_house | city_Id | 4 | 0.998827
test_house | create_ts | 145307 | 0.831058
test_house | modify_ts | 49521 | 0.364961
(9 rows)

MH=>

分析：
house_id btree
parent_house_id btree
test_id 不建索引
type brin
ident_Id brin
city_Id brin
house_name create_ts modify_ts gin

建索引：
create index idx_test_house_1 on test_house using btree (house_id);
create index idx_test_house_2 on test_house using btree (parent_house_id);

create index idx_test_house_3 on test_house using brin (type);
create index idx_test_house_4 on test_house using brin (ident_Id);
create index idx_test_house_5 on test_house using brin (city_Id);

create index idx_test_house_6 on test_house using gin (house_name,create_ts,modify_ts);