请勿滥用unlogged table & hash index

最新推荐文章于 2024-05-03 11:47:20 发布

Postgresql中国用户会

最新推荐文章于 2024-05-03 11:47:20 发布

阅读量568

点赞数

分类专栏：转载数据库管理文章标签： postgresql 数据库 hash

转载同时被 2 个专栏收录

108 篇文章 0 订阅

订阅专栏

数据库管理

27 篇文章 0 订阅

订阅专栏

Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大，国内顶级PostgreSQL数据库专家将悉数到场，并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:

Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
Postgres-XL的项目发起人Mason Sharp
pgpool的作者石井达夫(Tatsuo Ishii)
PG-Strom的作者海外浩平(Kaigai Kohei)
Greenplum研发总监姚延栋
周正中(德哥), PostgreSQL中国用户会创始人之一
汪洋，平安科技数据库技术部经理
……

 
 2015年度PG大象会报名地址：http://postgres2015.eventdove.com/
PostgreSQL中国社区： http://postgres.cn/
PostgreSQL专业1群： 3336901（已满）
PostgreSQL专业2群： 100910388
PostgreSQL专业3群： 150657323

unlogged table和hash index同样都不会写XLOG，所以如果你用流复制来搞HA，一定概要搞清楚一个问题，切换到备库的话unlogged table数据会被清掉，而hash index也没有，走hash index会失败。

unlogged table 的风险以及修复手段可以见：

http://blog.163.com/digoal@126/blog/static/163877040201582621345351/

hash index则风险略小，但是也必须重建，但是个人还是建议大家不要使用hash index，改用btree，因为性能确实相差无几。

批量将数据库集群的hash index修改为btree index的方法：

例子：

先并行创建一个btree索引，然后并行删除对应的hash 索引。


  
  
   
   $ vi test.sh
  
  
  
  
   
   
    
    #!/bin/bash
   
   
   
   
    
    

   
   
   
   
    
    for db in `psql -n -q -t -h 127.0.0.1 -p 1921 -U postgres postgres -c "copy (select datname from pg_database where datname <>'template0') to stdout;"`
   
   
   
   
    
    do 
   
   
   
   
    
    

   
   
   
   
    
    psql -n -q -t -h 127.0.0.1 -p 1921 -U postgres $db -c "with t1(sql,nsp,idx) as (select regexp_replace(indexdef,'USING hash','USING btree'),schemaname,indexname from pg_indexes where indexdef ~ 'USING hash'), t2(sql_create,sql_drop) as (select regexp_replace(sql,'CREATE INDEX','CREATE INDEX CONCURRENTLY'), 'DROP INDEX CONCURRENTLY '||quote_ident(nsp)||'.'||quote_ident(idx) from t1) select regexp_replace(sql_create,'CONCURRENTLY (.*) ON','CONCURRENTLY \1_0926 ON')  ||'; '|| sql_drop ||'; ' from t2;"|psql -a -e -h 127.0.0.1 -p 1921 -U postgres $db -f -
   
   
   
   
    
    

   
   
   
   
    
    done
   
   
   
   
    
    

   
   
   
   
    
    $ . ./test.sh
   
   
   
   
    
    

   
   
   
   
    
    这个with查询的结果如下举例：
   
   
   
   
    
    postgres=# with t1(sql,nsp,idx) as (select regexp_replace(indexdef,'USING hash','USING btree'),schemaname,indexname from pg_indexes where indexdef ~ 'USING hash'),
   
   
   
   
    
    t2(sql_create,sql_drop) as (select regexp_replace(sql,'CREATE INDEX','CREATE INDEX CONCURRENTLY'), 'DROP INDEX CONCURRENTLY '||quote_ident(nsp)||'.'||quote_ident(idx) from t1)
   
   
   
   
    
    select regexp_replace(sql_create,'CONCURRENTLY (.*) ON','CONCURRENTLY \1_0926 ON')  ||'; '|| sql_drop ||'; ' from t2;
   
   
   
   
    
                                                ?column?                                             
   
   
   
   
    
    -------------------------------------------------------------------------------------------------
   
   
   
   
    
     CREATE INDEX CONCURRENTLY hi1_0926 ON t USING btree (id); DROP INDEX CONCURRENTLY public.hi1; 
   
   
   
   
    
     CREATE INDEX CONCURRENTLY hi2_0926 ON s1.tbl USING btree (id); DROP INDEX CONCURRENTLY s1.hi2; 
   
   
   
   
    
    (2 rows)

或者在每个数据库调用这个inline code：


   
   
    
    do language plpgsql $$
   
   
   
   
    
    declare
   
   
   
   
    
      v_sql text; 
   
   
   
   
    
      v_schema name; 
   
   
   
   
    
      v_idx name; 
   
   
   
   
    
      sql1 text;
   
   
   
   
    
    begin
   
   
   
   
    
    for v_sql,v_schema,v_idx in select regexp_replace(indexdef,'USING hash','USING btree'),schemaname,indexname from pg_indexes where indexdef ~ 'USING hash'
   
   
   
   
    
    loop
   
   
   
   
    
      sql1='DROP INDEX '||quote_ident(v_schema)||'.'||quote_ident(v_idx); 
   
   
   
   
    
      execute sql1; 
   
   
   
   
    
      execute v_sql; 
   
   
   
   
    
    end loop; 
   
   
   
   
    
    end; 
   
   
   
   
    
    $$; 
   
   
   
   
    
    

   
   
   
   
    
    postgres=# \d t
   
   
   
   
    
           Table "public.t"
   
   
   
   
    
     Column |  Type   | Modifiers 
   
   
   
   
    
    --------+---------+-----------
   
   
   
   
    
     id     | integer | not null
   
   
   
   
    
    Indexes:
   
   
   
   
    
        "t_pkey" PRIMARY KEY, btree (id)
   
   
   
   
    
        "i1" btree (id)
   
   
   
   
    
        "i2" btree (id)
   
   
   
   
    
    

   
   
   
   
    
    postgres=# \d s1.tbl
   
   
   
   
    
            Table "s1.tbl"
   
   
   
   
    
     Column |  Type   | Modifiers 
   
   
   
   
    
    --------+---------+-----------
   
   
   
   
    
     id     | integer | 
   
   
   
   
    
    Indexes:
   
   
   
   
    
        "i1" btree (id)

关于hash和btree性能：

查询性能：

postgres=# create table tbl(id int, info text);

CREATE TABLE

postgres=# insert into tbl select generate_series(1,1000000);

INSERT 0 1000000

postgres=# create index idx_tbl1 on tbl using hash (id);

$ vi test.sql

\setrandom id 1 1000000

select * from tbl where id=:id;

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 30

progress: 1.0 s, 24219.3 tps, lat 0.258 ms stddev 0.436

progress: 2.0 s, 29387.1 tps, lat 0.270 ms stddev 0.401

progress: 3.0 s, 29281.0 tps, lat 0.271 ms stddev 0.442

progress: 4.0 s, 29231.7 tps, lat 0.272 ms stddev 0.844

......

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 8

number of threads: 8

duration: 30 s

number of transactions actually processed: 876806

latency average: 0.270 ms

latency stddev: 0.592 ms

tps = 29202.503991 (including connections establishing)

tps = 29379.956438 (excluding connections establishing)

statement latencies in milliseconds:

0.003062 \setrandom id 1 1000000

0.266481 select * from tbl where id=:id;

postgres=# drop index idx_tbl1 ;

DROP INDEX

postgres=# create index idx_tbl1 on tbl using btree (id);

CREATE INDEX

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 30

progress: 1.0 s, 28414.2 tps, lat 0.240 ms stddev 0.306

progress: 2.0 s, 31192.2 tps, lat 0.255 ms stddev 0.605

progress: 3.0 s, 31022.8 tps, lat 0.256 ms stddev 0.451

progress: 4.0 s, 29587.1 tps, lat 0.268 ms stddev 0.671

......

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 8

number of threads: 8

duration: 30 s

number of transactions actually processed: 903467

latency average: 0.263 ms

latency stddev: 0.678 ms

tps = 30088.054150 (including connections establishing)

tps = 30229.295069 (excluding connections establishing)

statement latencies in milliseconds:

0.002900 \setrandom id 1 1000000

0.259402 select * from tbl where id=:id;

更新性能

$ vi test.sql

\setrandom id 1 1000000

update tbl set id=1+:id where id=:id;

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 30

progress: 1.0 s, 12500.2 tps, lat 0.570 ms stddev 0.864

progress: 2.0 s, 17456.9 tps, lat 0.456 ms stddev 0.641

progress: 3.0 s, 18242.3 tps, lat 0.435 ms stddev 0.234

progress: 4.0 s, 17693.0 tps, lat 0.450 ms stddev 0.909

progress: 5.0 s, 17753.3 tps, lat 0.448 ms stddev 0.758

......

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 8

number of threads: 8

duration: 30 s

number of transactions actually processed: 521331

latency average: 0.456 ms

latency stddev: 0.702 ms

tps = 17372.386945 (including connections establishing)

tps = 17430.542432 (excluding connections establishing)

statement latencies in milliseconds:

0.003072 \setrandom id 1 1000000

0.452489 update tbl set id=1+:id where id=:id;

postgres=# drop index idx_tbl1 ;

DROP INDEX

postgres=# create index idx_tbl1 on tbl using hash (id);

CREATE INDEX

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 30

progress: 1.0 s, 16321.8 tps, lat 0.411 ms stddev 0.521

progress: 2.0 s, 17372.0 tps, lat 0.458 ms stddev 0.409

progress: 3.0 s, 16731.5 tps, lat 0.475 ms stddev 1.094

progress: 4.0 s, 16972.9 tps, lat 0.469 ms stddev 0.880

progress: 5.0 s, 17392.7 tps, lat 0.457 ms stddev 0.607

......

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 8

number of threads: 8

duration: 30 s

number of transactions actually processed: 527778

latency average: 0.450 ms

latency stddev: 0.678 ms

tps = 17587.300360 (including connections establishing)

tps = 17671.181609 (excluding connections establishing)

statement latencies in milliseconds:

0.002955 \setrandom id 1 1000000

0.446435 update tbl set id=1+:id where id=:id;

SIZE


   
   
    
    postgres=# create index idx_tbl1 on tbl using hash (id);
   
   
   
   
    
    CREATE INDEX
   
   
   
   
    
    postgres=# create index idx_tbl2 on tbl using btree (id);
   
   
   
   
    
    CREATE INDEX
   
   
   
   
    
    postgres=# \di+ idx_tbl*
   
   
   
   
    
                             List of relations
   
   
   
   
    
     Schema |   Name   | Type  |  Owner   | Table | Size  | Description 
   
   
   
   
    
    --------+----------+-------+----------+-------+-------+-------------
   
   
   
   
    
     public | idx_tbl1 | index | postgres | tbl   | 32 MB | 
   
   
   
   
    
     public | idx_tbl2 | index | postgres | tbl   | 21 MB | 
   
   
   
   
    
    (2 rows)

小结:

查询和更新性能相差无几。

大小，btree是hash的2/3。

Postgresql中国用户会

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
请勿滥用unlogged table & hash index

unlogged table和hash index同样都不会写XLOG，所以如果你用流复制来搞HA，一定概要搞清楚一个问题，切换到备库的话unlogged table数据会被清掉，而hash index也没有，走hash index会失败。unlogged table 的风险以及修复手段可以见：http://blog.163.com/digoal@126/blog/sta
复制链接

扫一扫