sql中 int转 numeric_PostgreSQL 中的 JSONB 性能测试

f2a15377bacc78521c605740b8571cad.png

PostgreSQL 中的 JSONB 性能测试

Postgres 从 9.3 就开始支持 JSON 了,mysql 也在 5.7 加入了 JSON。对于存储非结构化数据,如果大部分的操作都是整段存储和读取,那么完全不需要构建复杂的表或者模式。很多场景下非常适合选择 JSON 数据结构。我一直非常好奇 PostgreSQL 的 JSONB 性能如何。假定我们要在一个 http 请求内查询含有 JSON 列的内容,如果只做简单的优化,一个表最多可以支持多少行数据。在这里声明,我做测试的目的主要是满足公司业务需求,并非是全方位的测试。

准备工作

数据库版本:10.7
机器配置:2CPU, 15.25GB 内存 (AWS db.r4.large)

插入数据

首先创建一个表(object)用来写入数据。因为要数据量很大,所以我在这里使用的是 postgres COPY 命令行(https://www.postgresql.org/docs/9.2/sql-copy.html)。另外也可以用sql自动生成随机数据。首先写一个脚本把数据以csv的形式存在本地,之后先连接数据库:PGPASSWORD=<密码> psql -U postgres -h <数据库地址> -d postgres
连接上之后把本地数据插入到数据库中: COPY object(id,name,metadata) FROM '/home/ubuntu/d1.csv' DELIMITER ';' CSV HEADER;
插好之后可以在数据库中看到:

50b1f54c7653af7c6f59c9f9a2c430f3.png

开始测试

这里选择的测试 sql 语句是:

SELECT count(*)
FROM object
WHERE CASE jsonb_typeof(metadata->'is') WHEN 'number' THEN (metadata->>'is')::numeric > 1 ELSE FALSE END;

这个语句检测了is值的类型,并且和 1 比了大小。选择这个语句的原因是公司在实际使用中会用到类似的语句。

测试 1:十万行,无索引 <50 ms

postgres=> EXPLAIN ANALYSE select count(*) from object where CASE jsonb_typeof(metadata->'is') WHEN 'number' THEN (metadata->>'is')::numeric > 1 ELSE FALSE END;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=3792.29..3792.30 rows=1 width=8) (actual time=47.068..47.068 rows=1 loops=1)
   ->  Gather  (cost=3792.18..3792.29 rows=1 width=8) (actual time=47.059..47.104 rows=2 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Partial Aggregate  (cost=2792.18..2792.19 rows=1 width=8) (actual time=41.143..41.144 rows=1 loops=2)
               ->  Parallel Seq Scan on object  (cost=0.00..2718.65 rows=29412 width=0) (actual time=0.022..40.215 rows=6468 loops=2)
                     Filter: CASE jsonb_typeof((metadata -> 'is'::text)) WHEN 'number'::text THEN (((metadata ->> 'is'::text))::numeric > '1'::numeric) ELSE false END
                     Rows Removed by Filter: 43532
 Planning time: 0.068 ms
 Execution time: 47.151 ms
(10 rows)

测试 2:一百万行,无索引 <350 ms

postgres=> EXPLAIN ANALYSE select count(*) from object where CASE jsonb_typeof(metadata->'is') WHEN 'number' THEN (metadata->>'is')::numeric > 1 ELSE FALSE END;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=23991.38..23991.39 rows=1 width=8) (actual time=344.172..344.172 rows=1 loops=1)
   ->  Gather  (cost=23991.17..23991.38 rows=2 width=8) (actual time=338.922..345.419 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=22991.17..22991.18 rows=1 width=8) (actual time=329.769..329.769 rows=1 loops=3)
               ->  Parallel Seq Scan on object  (cost=0.00..22470.33 rows=208333 width=0) (actual time=0.020..320.997 rows=43063 loops=3)
                     Filter: CASE jsonb_typeof((metadata -> 'is'::text)) WHEN 'number'::text THEN (((metadata ->> 'is'::text))::numeric > '1'::numeric) ELSE false END
                     Rows Removed by Filter: 290270
 Planning time: 0.100 ms
 Execution time: 345.483 ms
(10 rows)

测试 3:一千万行,无索引 <3500 ms

postgres=> EXPLAIN ANALYSE select count(*) from object where CASE jsonb_typeof(metadata->'is') WHEN 'number' THEN (metadata->>'is')::numeric > 1 ELSE FALSE END;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=230902.07..230902.08 rows=1 width=8) (actual time=3500.912..3500.912 rows=1 loops=1)
   ->  Gather  (cost=230901.85..230902.06 rows=2 width=8) (actual time=3488.491..3500.957 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=229901.85..229901.86 rows=1 width=8) (actual time=3487.377..3487.378 rows=1 loops=3)
               ->  Parallel Seq Scan on object  (cost=0.00..224693.90 rows=2083180 width=0) (actual time=0.021..3390.216 rows=430889 loops=3)
                     Filter: CASE jsonb_typeof((metadata -> 'is'::text)) WHEN 'number'::text THEN (((metadata ->> 'is'::text))::numeric > '1'::numeric) ELSE false END
                     Rows Removed by Filter: 2902445
 Planning time: 0.071 ms
 Execution time: 3501.005 ms
(10 rows)

测试 4:一亿行,无索引 <55000 ms

EXPLAIN ANALYSE select count(*) from object where CASE jsonb_typeof(metadata->'is') WHEN 'number' THEN (metadata->>'is')::numeric > 1 ELSE FALSE END;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=2300300.19..2300300.20 rows=1 width=8) (actual time=54820.521..54820.521 rows=1 loops=1)
   ->  Gather  (cost=2300299.98..2300300.19 rows=2 width=8) (actual time=54805.490..54822.961 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=2299299.98..2299299.99 rows=1 width=8) (actual time=54803.496..54803.496 rows=1 loops=3)
               ->  Parallel Seq Scan on object  (cost=0.00..2247202.15 rows=20839130 width=0) (actual time=0.021..53569.486 rows=4305335 loops=3)
                     Filter: CASE jsonb_typeof((metadata -> 'is'::text)) WHEN 'number'::text THEN (((metadata ->> 'is'::text))::numeric > '1'::numeric) ELSE false END
                     Rows Removed by Filter: 29027998
 Planning time: 0.072 ms
 Execution time: 54823.022 ms
(10 rows)

55 秒不是在 http 请求可以接受的范围内了。我们来做个简单的优化,给 JSONB 行加个 GIN 索引。至于为什么要用 GIN 索引,可以参考这几篇文章(https://bitnine.net/blog-postgresql/postgresql-internals-jsonb-type-and-its-indexes/ https://www.postgresql.org/docs/9.4/datatype-json.html)。加上之后再测试:

postgres=> CREATE INDEX metadata_idx ON object USING gin(metadata);
CREATE INDEX
                                                                   ^
postgres=> EXPLAIN ANALYSE select count(*) from object where CASE jsonb_typeof(metadata->'is') WHEN 'number' THEN (metadata->>'is')::numeric > 1 ELSE FALSE END;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=2299966.88..2299966.89 rows=1 width=8) (actual time=57293.980..57293.980 rows=1 loops=1)
   ->  Gather  (cost=2299966.67..2299966.88 rows=2 width=8) (actual time=57293.969..57294.030 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=2298966.67..2298966.68 rows=1 width=8) (actual time=57279.930..57279.931 rows=1 loops=3)
               ->  Parallel Seq Scan on object  (cost=0.00..2246883.33 rows=20833333 width=0) (actual time=0.020..55950.405 rows=4305335 loops=3)
                     Filter: CASE jsonb_typeof((metadata -> 'is'::text)) WHEN 'number'::text THEN (((metadata ->> 'is'::text))::numeric > '1'::numeric) ELSE false END
                     Rows Removed by Filter: 29027998
 Planning time: 0.225 ms
 Execution time: 57294.086 ms
(10 rows)

没有提高的原因是 GIN 只会提高特定 operator@>的速度,而不会提高jsonb_typeof的速度。这是因为 GIN 只会索引路径。看来对于这种 SQL 语句不做更多的优化最多也就可以支持一千万到一亿行之间。那么我们换一个含有@>的语句试试 GIN 到底提高了多少速度。

postgres=> EXPLAIN ANALYSE select count(*) from object where metadata @> '{"random": 10}';

                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=302014.09..302014.10 rows=1 width=8) (actual time=484246.520..484246.520 rows=1 loops=1)
   ->  Bitmap Heap Scan on object  (cost=2471.00..301764.09 rows=100000 width=0) (actual time=3051.370..483609.148 rows=6453967 loops=1)
         Recheck Cond: (metadata @> '{"random": 10}'::jsonb)
         Rows Removed by Index Recheck: 89796990
         Heap Blocks: exact=48209 lossy=1052833
         ->  Bitmap Index Scan on metadata_idx  (cost=0.00..2446.00 rows=100000 width=0) (actual time=3041.135..3041.135 rows=14369031 loops=1)
               Index Cond: (metadata @> '{"random": 10}'::jsonb)
 Planning time: 0.069 ms
 Execution time: 484247.422 ms

效果非常一般。来看看 query plan,发现并没有用 GIN 检索,而是用的 bitmap。关闭 bitmap 再试试:

postgres=> SET enable_bitmapscan = off;
SET
postgres=> EXPLAIN ANALYSE select count(*) from object where metadata @> '{"random": 10}';
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1622987.72..1622987.73 rows=1 width=8) (actual time=25895.827..25895.827 rows=1 loops=1)
   ->  Gather  (cost=1622987.50..1622987.71 rows=2 width=8) (actual time=25894.538..25897.273 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=1621987.50..1621987.51 rows=1 width=8) (actual time=25883.140..25883.141 rows=1 loops=3)
               ->  Parallel Seq Scan on object  (cost=0.00..1621883.33 rows=41667 width=0) (actual time=0.021..25489.382 rows=2151322 loops=3)
                     Filter: (metadata @> '{"random": 10}'::jsonb)
                     Rows Removed by Filter: 31182011
 Planning time: 0.077 ms
 Execution time: 25897.317 ms
(10 rows)

现在搜索的速度降到了之前的一半。

顺便看看同样数据量的情况下不用JSON行 join 两次的速度:381毫秒

postgres=> EXPLAIN ANALYSE select count(*) from object join assoc on object.id = assoc.aid join objectb ob on assoc.bid = ob.id;
                                                                                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=18769.12..18769.13 rows=1 width=8) (actual time=378.946..378.946 rows=1 loops=1)
   ->  Gather  (cost=18768.91..18769.12 rows=2 width=8) (actual time=360.940..380.270 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=17768.91..17768.92 rows=1 width=8) (actual time=352.564..352.565 rows=1 loops=3)
               ->  Hash Join  (cost=12838.23..17664.74 rows=41667 width=0) (actual time=237.012..348.358 rows=33333 loops=3)
                     Hash Cond: (assoc.bid = ob.id)
                     ->  Merge Join  (cost=9754.23..13754.36 rows=41667 width=4) (actual time=158.275..235.775 rows=33333 loops=3)
                           Merge Cond: (object.id = assoc.aid)
                           ->  Parallel Index Only Scan using object_pk on object  (cost=0.57..3114727.29 rows=41678260 width=4) (actual time=0.099..21.509 rows=33335 loops=3)
                                 Heap Fetches: 59742
                           ->  Sort  (cost=9747.82..9997.82 rows=100000 width=8) (actual time=158.064..183.463 rows=100000 loops=3)
                                 Sort Key: assoc.aid
                                 Sort Method: external sort  Disk: 2160kB
                                 ->  Seq Scan on assoc  (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.013..19.147 rows=100000 loops=3)
                     ->  Hash  (cost=1443.00..1443.00 rows=100000 width=4) (actual time=78.059..78.059 rows=100000 loops=3)
                           Buckets: 131072  Batches: 2  Memory Usage: 2781kB
                           ->  Seq Scan on objectb ob  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.012..25.505 rows=100000 loops=3)
 Planning time: 0.686 ms
 Execution time: 381.002 ms
(20 rows)

总结

一千万行已经可以满足大部分业务的需求了,再多就得考虑换分数据库,加 cache 或者干脆用 nosql 了。

9226c1e683534e9d71efa5341b259b1a.png
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值