Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
- Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
- Postgres-XL的项目发起人Mason Sharp
- pgpool的作者石井达夫(Tatsuo Ishii)
- PG-Strom的作者海外浩平(Kaigai Kohei)
- Greenplum研发总监姚延栋
- 周正中(德哥), PostgreSQL中国用户会创始人之一
- 汪洋,平安科技数据库技术部经理
- ……
|
|
JOIN性能确实好很多,主要体现在。
1. BuldkScan
300毫秒左右,而seqscan需要1500毫秒。
2. GpuJoin
约4.5秒。而hash join约8秒。
postgres=# explain (analyze,verbose,costs,buffers,timing) select count(*) from t1 natural join t2 natural join t3;QUERY PLAN---------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=592268.52..592268.53 rows=1 width=0) (actual time=12571.551..12571.551 rows=1 loops=1)Output: pgstrom.count((pgstrom.nrows()))Buffers: shared hit=132771-> Custom Scan (GpuPreAgg) (cost=425951.98..570768.69 rows=17 width=4) (actual time=12548.700..12571.530 rows=28 loops=1)Output: pgstrom.nrows()Bulkload: On (density: 100.00%)Reduction: NoGroupFeatures: format: tuple-slot, bulkload: unsupportedBuffers: shared hit=132771-> Custom Scan (GpuJoin) (cost=422951.98..567268.52 rows=10000000 width=0) (actual time=12337.214..12531.078 rows=10000000 loops=1)Pseudo Scan: (t1.c1)::integer, (t1.c2)::integer, (t3.c1)::integer, (t3.c2)::integerBulkload: On (density: 100.00%)Depth 1: GpuHashJoin, HashKeys: (t1.c1, t1.c2), JoinQual: ((t1.c1 = t3.c1) AND (t1.c2 = t3.c2))Nrows (in:10000000 out:10000000, 100.00% planned 100.00%), KDS-Hash (size: 495.91MB planned 739.10MB, nbatches: 1 planned 1)Features: format: heap-tuple, bulkload: supportedBuffers: shared hit=132771-> Custom Scan (GpuJoin) (cost=211479.50..355978.48 rows=10000000 width=16) (actual time=6805.095..7022.021 rows=10000000 loops=1)Output: t1.c1, t1.c2, t2.c1, t2.c2Pseudo Scan: t1.c1::integer, t1.c2::integer, t2.c1::integer, t2.c2::integerBulkload: On (density: 100.00%)Depth 1: GpuHashJoin, HashKeys: (t1.c1, t1.c2), JoinQual: ((t1.c1 = t2.c1) AND (t1.c2 = t2.c2))Nrows (in:10000000 out:10000000, 100.00% planned 100.00%), KDS-Hash (size: 495.91MB planned 739.10MB, nbatches: 1 planned 1)Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=88523-> Custom Scan (BulkScan) on public.t1 (cost=0.00..144248.00 rows=10000000 width=8) (actual time=12.278..335.416 rows=10000000 loops=1)Output: t1.c1, t1.c2Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=44275-> Seq Scan on public.t2 (cost=0.00..144248.00 rows=10000000 width=8) (actual time=0.009..1510.590 rows=10000000 loops=1)Output: t2.c1, t2.c2Buffers: shared hit=44248-> Seq Scan on public.t3 (cost=0.00..144248.00 rows=10000000 width=8) (actual time=0.017..1539.207 rows=10000000 loops=1)Output: t3.c1, t3.c2Buffers: shared hit=44248Planning time: 0.856 msExecution time: 14226.754 ms(36 rows)
postgres=# set pg_strom.enabled=off;SETpostgres=# explain (analyze,verbose,costs,buffers,timing) select count(*) from t1 natural join t2 natural join t3;QUERY PLAN-------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=1889780.00..1889780.01 rows=1 width=0) (actual time=23863.974..23863.974 rows=1 loops=1)Output: count(*)Buffers: shared hit=132744, temp read=184849 written=184073-> Hash Join (cost=666622.00..1864780.00 rows=10000000 width=0) (actual time=6751.553..23007.967 rows=10000000 loops=1)Hash Cond: ((t1.c1 = t3.c1) AND (t1.c2 = t3.c2))Buffers: shared hit=132744, temp read=184849 written=184073-> Hash Join (cost=333311.00..994748.00 rows=10000000 width=16) (actual time=3369.316..12137.123 rows=10000000 loops=1)Output: t1.c1, t1.c2, t2.c1, t2.c2Hash Cond: ((t1.c1 = t2.c1) AND (t1.c2 = t2.c2))Buffers: shared hit=88496, temp read=86300 written=85912-> Seq Scan on public.t1 (cost=0.00..144248.00 rows=10000000 width=8) (actual time=0.008..1079.109 rows=10000000 loops=1)Output: t1.c1, t1.c2Buffers: shared hit=44248-> Hash (cost=144248.00..144248.00 rows=10000000 width=8) (actual time=3368.576..3368.576 rows=10000000 loops=1)Output: t2.c1, t2.c2Buckets: 131072 (originally 131072) Batches: 256 (originally 128) Memory Usage: 4073kBBuffers: shared hit=44248, temp written=33847-> Seq Scan on public.t2 (cost=0.00..144248.00 rows=10000000 width=8) (actual time=0.007..1096.536 rows=10000000 loops=1)Output: t2.c1, t2.c2Buffers: shared hit=44248-> Hash (cost=144248.00..144248.00 rows=10000000 width=8) (actual time=3382.059..3382.059 rows=10000000 loops=1)Output: t3.c1, t3.c2Buckets: 131072 (originally 131072) Batches: 256 (originally 128) Memory Usage: 4073kBBuffers: shared hit=44248, temp written=33847-> Seq Scan on public.t3 (cost=0.00..144248.00 rows=10000000 width=8) (actual time=0.010..1102.204 rows=10000000 loops=1)Output: t3.c1, t3.c2Buffers: shared hit=44248Planning time: 0.594 msExecution time: 23864.050 ms(29 rows)
8个表的JOIN,不使用 GPU。
postgres=# explain (analyze,verbose,costs,buffers,timing) select count(*) from t1 natural join t2 natural join t3 natural join t4 natural join t5 natural join t6 natural join t7 natural join t8 ;QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=5773641.04..5773641.05 rows=1 width=0) (actual time=78621.151..78621.151 rows=1 loops=1)Output: count(*)Buffers: shared hit=435962, temp read=518383 written=516199-> Hash Join (cost=2720512.24..5748641.21 rows=9999931 width=0) (actual time=33751.548..77761.816 rows=10000000 loops=1)Hash Cond: ((t1.c1 = t4.c1) AND (t1.c2 = t4.c2))Buffers: shared hit=435962, temp read=518383 written=516199-> Hash Join (cost=1526935.54..3930456.05 rows=9999954 width=48) (actual time=18264.080..53774.071 rows=10000000 loops=1)Output: t1.c1, t1.c2, t2.c1, t2.c2, t3.c1, t3.c2, t5.c1, t5.c2, t7.c1, t7.c2, t8.c1, t8.c2Hash Cond: ((t1.c1 = t5.c1) AND (t1.c2 = t5.c2))Buffers: shared hit=347463, temp read=304863 written=303577-> Hash Join (cost=333358.83..2151331.98 rows=9999977 width=32) (actual time=3509.315..30357.234 rows=10000000 loops=1)Output: t1.c1, t1.c2, t2.c1, t2.c2, t3.c1, t3.c2, t8.c1, t8.c2Hash Cond: ((t1.c1 = t8.c1) AND (t1.c2 = t8.c2))Buffers: shared hit=258967, temp read=110794 written=110406-> Merge Join (cost=48.41..1261770.78 rows=10000000 width=24) (actual time=0.077..17169.673 rows=10000000 loops=1)Output: t1.c1, t1.c2, t2.c1, t2.c2, t3.c1, t3.c2Merge Cond: (t1.c1 = t3.c1)Join Filter: (t1.c2 = t3.c2)Buffers: shared hit=214719-> Merge Join (cost=23.83..782865.84 rows=10000000 width=16) (actual time=0.059..9851.866 rows=10000000 loops=1)Output: t1.c1, t1.c2, t2.c1, t2.c2Merge Cond: (t1.c1 = t2.c1)Join Filter: (t1.c2 = t2.c2)Buffers: shared hit=143146-> Index Scan using idx1 on public.t1 (cost=0.43..303939.43 rows=10000000 width=8) (actual time=0.034..2221.148 rows=10000000 loops=1)Output: t1.c1, t1.c2Buffers: shared hit=71573-> Index Scan using idx2 on public.t2 (cost=0.43..303939.43 rows=10000000 width=8) (actual time=0.013..2620.760 rows=10000000 loops=1)Output: t2.c1, t2.c2Buffers: shared hit=71573-> Index Scan using idx3 on public.t3 (cost=0.43..303939.43 rows=10000000 width=8) (actual time=0.013..2675.423 rows=10000000 loops=1)Output: t3.c1, t3.c2Buffers: shared hit=71573-> Hash (cost=144247.77..144247.77 rows=9999977 width=8) (actual time=3508.083..3508.083 rows=10000000 loops=1)Output: t8.c1, t8.c2Buckets: 131072 (originally 131072) Batches: 256 (originally 128) Memory Usage: 4073kBBuffers: shared hit=44248, temp written=33847-> Seq Scan on public.t8 (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.034..1189.104 rows=10000000 loops=1)Output: t8.c1, t8.c2Buffers: shared hit=44248-> Hash (cost=994748.05..994748.05 rows=9999977 width=16) (actual time=14751.206..14751.206 rows=10000000 loops=1)Output: t5.c1, t5.c2, t7.c1, t7.c2Buckets: 131072 Batches: 256 Memory Usage: 2855kBBuffers: shared hit=88496, temp read=86300 written=129560-> Hash Join (cost=333312.20..994748.05 rows=9999977 width=16) (actual time=3485.575..12066.617 rows=10000000 loops=1)Output: t5.c1, t5.c2, t7.c1, t7.c2Hash Cond: ((t7.c1 = t5.c1) AND (t7.c2 = t5.c2))Buffers: shared hit=88496, temp read=86300 written=85912-> Seq Scan on public.t7 (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.017..1176.896 rows=10000000 loops=1)Output: t7.c1, t7.c2Buffers: shared hit=44248-> Hash (cost=144248.48..144248.48 rows=10000048 width=8) (actual time=3484.279..3484.279 rows=10000000 loops=1)Output: t5.c1, t5.c2Buckets: 131072 (originally 131072) Batches: 256 (originally 128) Memory Usage: 4073kBBuffers: shared hit=44248, temp written=33847-> Seq Scan on public.t5 (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.012..1191.389 rows=10000000 loops=1)Output: t5.c1, t5.c2Buffers: shared hit=44248-> Hash (cost=994748.05..994748.05 rows=9999977 width=16) (actual time=15486.754..15486.754 rows=10000000 loops=1)Output: t4.c1, t4.c2, t6.c1, t6.c2Buckets: 131072 Batches: 256 Memory Usage: 2855kBBuffers: shared hit=88496, temp read=86300 written=129560-> Hash Join (cost=333312.20..994748.05 rows=9999977 width=16) (actual time=3494.887..12813.050 rows=10000000 loops=1)Output: t4.c1, t4.c2, t6.c1, t6.c2Hash Cond: ((t6.c1 = t4.c1) AND (t6.c2 = t4.c2))Buffers: shared hit=88496, temp read=86300 written=85912-> Seq Scan on public.t6 (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.025..1214.920 rows=10000000 loops=1)Output: t6.c1, t6.c2Buffers: shared hit=44248-> Hash (cost=144248.48..144248.48 rows=10000048 width=8) (actual time=3493.640..3493.640 rows=10000000 loops=1)Output: t4.c1, t4.c2Buckets: 131072 (originally 131072) Batches: 256 (originally 128) Memory Usage: 4073kBBuffers: shared hit=44248, temp written=33847-> Seq Scan on public.t4 (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.016..1191.688 rows=10000000 loops=1)Output: t4.c1, t4.c2Buffers: shared hit=44248Planning time: 46.170 msExecution time: 78621.977 ms(78 rows)
使用GPU。
postgres=# set pg_strom.enabled=on;SETpostgres=# explain (analyze,verbose,costs,buffers,timing) select count(*) from t1 natural join t2 natural join t3 natural join t4 natural join t5 natural join t6 natural join t7 natural join t8 ;QUERY PLAN-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=1663562.75..1663562.76 rows=1 width=0) (actual time=48858.541..48858.541 rows=1 loops=1)Output: pgstrom.count((pgstrom.nrows()))Buffers: shared hit=354038-> Custom Scan (GpuPreAgg) (cost=1453270.24..1642063.09 rows=17 width=4) (actual time=48794.894..48858.504 rows=28 loops=1)Output: pgstrom.nrows()Bulkload: On (density: 100.00%)Reduction: NoGroupFeatures: format: tuple-slot, bulkload: unsupportedBuffers: shared hit=354038-> Custom Scan (GpuJoin) (cost=1450270.24..1638562.92 rows=9999931 width=0) (actual time=48438.139..48777.446 rows=10000000 loops=1)Pseudo Scan: (t1.c1)::integer, (t1.c2)::integer, (t4.c1)::integer, (t4.c2)::integerBulkload: On (density: 100.00%)Depth 1: GpuHashJoin, HashKeys: (t1.c1, t1.c2), JoinQual: ((t1.c1 = t4.c1) AND (t1.c2 = t4.c2))Nrows (in:10000000 out:10000000, 100.00% planned 100.00%), KDS-Hash (size: 801.09MB planned 560.27MB, nbatches: 1 planned 2)Features: format: heap-tuple, bulkload: supportedBuffers: shared hit=354038-> Custom Scan (GpuJoin) (cost=422951.98..567268.52 rows=10000000 width=24) (actual time=9964.714..10167.462 rows=10000000 loops=1)Output: t1.c1, t1.c2, t2.c1, t2.c2, t3.c1, t3.c2Pseudo Scan: t1.c1::integer, t1.c2::integer, t2.c1::integer, t2.c2::integer, t3.c1::integer, t3.c2::integerBulkload: On (density: 100.00%)Depth 1: GpuHashJoin, HashKeys: (t1.c1, t1.c2), JoinQual: ((t1.c1 = t3.c1) AND (t1.c2 = t3.c2))Nrows (in:10000000 out:10000000, 100.00% planned 100.00%), KDS-Hash (size: 495.91MB planned 739.10MB, nbatches: 1 planned 1)Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=132771-> Custom Scan (GpuJoin) (cost=211479.50..355978.48 rows=10000000 width=16) (actual time=5018.376..5238.156 rows=10000000 loops=1)Output: t1.c1, t1.c2, t2.c1, t2.c2Pseudo Scan: t1.c1::integer, t1.c2::integer, t2.c1::integer, t2.c2::integerBulkload: On (density: 100.00%)Depth 1: GpuHashJoin, HashKeys: (t1.c1, t1.c2), JoinQual: ((t1.c1 = t2.c1) AND (t1.c2 = t2.c2))Nrows (in:10000000 out:10000000, 100.00% planned 100.00%), KDS-Hash (size: 495.91MB planned 739.10MB, nbatches: 1 planned 1)Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=88523-> Custom Scan (BulkScan) on public.t1 (cost=0.00..144248.00 rows=10000000 width=8) (actual time=12.040..322.669 rows=10000000 loops=1)Output: t1.c1, t1.c2Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=44275-> Seq Scan on public.t2 (cost=0.00..144248.00 rows=10000000 width=8) (actual time=0.008..1528.913 rows=10000000 loops=1)Output: t2.c1, t2.c2Buffers: shared hit=44248-> Seq Scan on public.t3 (cost=0.00..144248.00 rows=10000000 width=8) (actual time=0.022..1529.554 rows=10000000 loops=1)Output: t3.c1, t3.c2Buffers: shared hit=44248-> Custom Scan (GpuJoin) (cost=845250.50..990105.67 rows=9999835 width=40) (actual time=25935.700..27197.613 rows=10000000 loops=1)Output: t4.c1, t4.c2, t5.c1, t5.c2, t6.c1, t6.c2, t7.c1, t7.c2, t8.c1, t8.c2Pseudo Scan: t4.c1::integer, t4.c2::integer, t5.c1::integer, t5.c2::integer, t6.c1::integer, t6.c2::integer, t7.c1::integer, t7.c2::integer, t8.c1::integer, t8.c2::integerBulkload: On (density: 100.00%)Depth 1: GpuHashJoin, HashKeys: (t4.c1, t4.c2), JoinQual: ((t4.c1 = t5.c1) AND (t4.c2 = t5.c2))Nrows (in:10000000 out:10000000, 100.00% planned 100.00%), KDS-Hash (size: 495.91MB planned 739.10MB, nbatches: 1 planned 1)Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=221267-> Custom Scan (GpuJoin) (cost=634200.09..778635.44 rows=9999835 width=32) (actual time=20268.808..20269.446 rows=10000000 loops=1)Output: t4.c1, t4.c2, t6.c1, t6.c2, t7.c1, t7.c2, t8.c1, t8.c2Pseudo Scan: t4.c1::integer, t4.c2::integer, t6.c1::integer, t6.c2::integer, t7.c1::integer, t7.c2::integer, t8.c1::integer, t8.c2::integerBulkload: On (density: 100.00%)Depth 1: GpuHashJoin, HashKeys: (t4.c1, t4.c2), JoinQual: ((t4.c1 = t8.c1) AND (t4.c2 = t8.c2))Nrows (in:10000000 out:10000000, 100.00% planned 100.00%), KDS-Hash (size: 495.91MB planned 739.10MB, nbatches: 1 planned 1)Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=177019-> Custom Scan (GpuJoin) (cost=422951.23..567266.84 rows=9999906 width=24) (actual time=14136.992..14560.294 rows=10000000 loops=1)Output: t4.c1, t4.c2, t6.c1, t6.c2, t7.c1, t7.c2Pseudo Scan: t4.c1::integer, t4.c2::integer, t6.c1::integer, t6.c2::integer, t7.c1::integer, t7.c2::integerBulkload: On (density: 100.00%)Depth 1: GpuHashJoin, HashKeys: (t4.c1, t4.c2), JoinQual: ((t4.c1 = t7.c1) AND (t4.c2 = t7.c2))Nrows (in:10000000 out:10000000, 100.00% planned 100.00%), KDS-Hash (size: 495.91MB planned 739.10MB, nbatches: 1 planned 1)Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=132771-> Custom Scan (GpuJoin) (cost=211479.13..355977.88 rows=9999977 width=16) (actual time=7517.167..7812.485 rows=10000000 loops=1)Output: t4.c1, t4.c2, t6.c1, t6.c2Pseudo Scan: t4.c1::integer, t4.c2::integer, t6.c1::integer, t6.c2::integerBulkload: On (density: 100.00%)Depth 1: GpuHashJoin, HashKeys: (t4.c1, t4.c2), JoinQual: ((t4.c1 = t6.c1) AND (t4.c2 = t6.c2))Nrows (in:10000000 out:10000000, 100.00% planned 100.00%), KDS-Hash (size: 495.91MB planned 739.10MB, nbatches: 1 planned 1)Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=88523-> Custom Scan (BulkScan) on public.t4 (cost=0.00..144248.48 rows=10000048 width=8) (actual time=12.063..322.445 rows=10000000 loops=1)Output: t4.c1, t4.c2Features: format: tuple-slot, bulkload: supportedBuffers: shared hit=44275-> Seq Scan on public.t6 (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.009..1559.817 rows=10000000 loops=1)Output: t6.c1, t6.c2Buffers: shared hit=44248-> Seq Scan on public.t7 (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.022..1527.810 rows=10000000 loops=1)Output: t7.c1, t7.c2Buffers: shared hit=44248-> Seq Scan on public.t8 (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.023..1521.776 rows=10000000 loops=1)Output: t8.c1, t8.c2Buffers: shared hit=44248-> Seq Scan on public.t5 (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.015..1499.270 rows=10000000 loops=1)Output: t5.c1, t5.c2Buffers: shared hit=44248Planning time: 82.220 msExecution time: 52799.548 ms(92 rows)
[参考]
1.