本文来源于阿里云-云栖社区,原文点击这里。
标签
PostgreSQL , GIS , PostGIS , Greenplum , 空间检索 , GiST , B-Tree , geohash
背景
《Greenplum 行存、列存,堆表、AO表的原理和选择》
以上文档详细的介绍了行存、列存,堆表、AO表的原理以及选择的依据。
《一个简单算法可以帮助物联网,金融 用户 节约98%的数据存储成本 (PostgreSQL,Greenplum帮你做到)》
以上文档介绍了提升基于列存的全局数据压缩比的方法。
《解密上帝之手 - 阿里云HDB for PostgreSQL数据库metascan特性(存储级、块级、batch级过滤与数据编排)》
以上文档介绍了局部编排,以及阿里云HDB for PostgreSQL数据库的metascan特性,(在不需要索引的情况下,如何提升任意列的选择性)。
压缩实际上是计算(CPU)换空间(磁盘)的做法,该不该做,我们还是先看看压缩比、性能损耗吧。
堆表
postgres=# create table t_heap(id int, c1 text, c2 int);
CREATE TABLE
postgres=# insert into t_heap select id, repeat(md5(random()::text), 128), random()*10000 from generate_series(1,10000000) t(id);
INSERT 0 10000000
Time: 120526.098 ms
某个维度count查询。
postgres=# explain analyze select c2,count(*) from t_heap group by c2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; segments: 48) (cost=1578949.03..1579074.98 rows=10076 width=12)
Rows out: 10001 rows at destination with 1354 ms to end, start offset by 519 ms.
-> HashAggregate (cost=1578949.03..1579074.98 rows=210 width=12)
Group By: t_heap.c2
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.001 ms to first row, 692 ms to end, start offset by 581 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=1578596.37..1578797.89 rows=210 width=12)
Hash Key: t_heap.c2
Rows out: Avg 10001.0 rows x 48 workers at destination. Max 10704 rows (seg17) with 596 ms to end, start offset by 581 ms.
-> HashAggregate (cost=1578596.37..1578596.37 rows=210 width=12)
Group By: t_heap.c2
Rows out: Avg 10001.0 rows x 48 workers. Max 10001 rows (seg0) with 0.006 ms to first row, 131 ms to end, start offset by 566 ms.
-> Seq Scan on t_heap (cost=0.00..1528595.58 rows=208337 width=4)
Rows out: Avg 208333.3 rows x 48 workers. Max 208401 rows (seg18) with 26 ms to first row, 901 ms to end, start offset by 573 ms.
Slice statistics:
(slice0) Executor memory: 359K bytes.
(slice1) Executor memory: 724K bytes avg x 48 workers, 724K bytes max (seg0).
(slice2) Executor memory: 388K bytes avg x 48 workers, 388K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 1874.143 ms
(22 rows)
Time: 1879.480 ms
无索引,某个单值查询