Greenplum——大数据量写入和更新的性能优化之路

背景

Greenplum数据加载的三种方式:

  1. insert
  2. copyin
  3. 外部表 gpfdist

理论上来说,性能 gdfdist>copyin>insert(batch)>insert

1 安装所需工具

1.1 pgbench

yum安装pgbench

yum install -y https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7.9-x86_64/

yum install postgresql10
yum install postgresql10-server
yum install -y postgresql10-contrib

配置环境变量,默认安装是/usr/pgsql-10/bin,所以需要配置path 路径

vim /etc/profile

# 添加这行
export PATH=$PATH:/usr/pgsql-10/bin

source /etc/profile

在gp上面创建测试库

create database pgbenchdb;

执行测试指令

pgbench -i -s 5 pgbenchdb     --初始化,将在pgbench_accounts表中创建 500,000行。

执行后查询pgbench_accounts的条数,如果50w条说明插入成功

2 基准测试

2.1 测试准备

1、在pgbenchdb数据库下创建测试表tbl

CREATE TABLE public.tbl (
	id bigserial NOT NULL,
	crt_time timestamp NULL,
	sensorid int4 NULL,
	sensorloc point NULL,
	info text NULL
)
WITH (
	autovacuum_enabled=on,
	autovacuum_vacuum_threshold=1,
	autovacuum_vacuum_cost_delay=0
)
DISTRIBUTED BY (id);

2、在/home/gpadmin下面创建insert.sql文件,向表中插入一条随机数据

insert into public.tbl (crt_time,sensorid,info) values ( clock_timestamp(),trunc(random()*500000),substring(md5(random()::text),1,8) );

3、在/home/gpadmin下面创建read.sql文件, 从表中读取一条随机数据

select * from public.tbl where sensorid = trunc(random()*500000);

2.2 开始测试

2.2.1 写入测试

测试前

truncate table public.tbl

1、链接数测试,模拟224个客户端连接,8个线程,每个客户端8个事务

pgbench -f /home/gpadmin/insert.sql -c 224 -C -j 8 -t 8 pgbenchdb

执行结果

starting vacuum...end.
transaction type: /home/gpadmin/insert.sql
scaling factor: 1
query mode: simple
number of clients: 224
number of threads: 8
number of transactions per client: 8
number of transactions actually processed: 1792/1792
latency average = 2308.419 ms
tps = 97.036123 (including connections establishing)
tps = 97.929852 (excluding connections establishing)

2、测试客户端64 10w写入

pgbench -f /home/gpadmin/insert.sql -c 64 -C -j 8 -t 1563 pgbenchdb

执行结果

transaction type: /home/gpadmin/insert.sql
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 8
number of transactions per client: 1563
number of transactions actually processed: 98733/100032
latency average = 649.590 ms
tps = 98.523657 (including connections establishing)
tps = 100.359377 (excluding connections establishing)

3、测试客户端128 10w写入

pgbench -f /home/gpadmin/insert.sql -c 128 -C -j 8 -t 782 pgbenchdb

执行结果

[gpadmin@pnode3 ~]$ pgbench -f /home/gpadmin/insert.sql -c 128 -C -j 8 -t 782 pgbenchdb
starting vacuum...end.
transaction type: /home/gpadmin/insert.sql
scaling factor: 1
query mode: simple
number of clients: 128
number of threads: 8
number of transactions per client: 782
number of transactions actually processed: 100096/100096
latency average = 1316.215 ms
tps = 97.248556 (including connections establishing)
tps = 98.091034 (excluding connections establishing)

4、测试客户端164 100w写入

pgbench -f /home/gpadmin/insert.sql -c 64 -C -j 8 -t 15625 pgbenchdb

执行结果

transaction type: /home/gpadmin/insert.sql
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 8
number of transactions per client: 15625
number of transactions actually processed: 1000000/1000000
latency average = 769.468 ms
tps = 83.174383 (including connections establishing)
tps = 84.614156 (excluding connections establishing)

5、客户端128 100w写入

pgbench -f /home/gpadmin/insert.sql -c 128 -C -j 8 -t 7813 pgbenchdb

执行结果

transaction type: /home/gpadmin/insert.sql
scaling factor: 1
query mode: simple
number of clients: 128
number of threads: 8
number of transactions per client: 7813
number of transactions actually processed: 1000064/1000064
latency average = 1389.130 ms
tps = 92.144034 (including connections establishing)
tps = 92.931354 (excluding connections establishing)

2.2.2 读取测试

1、客户端64  10w查询

pgbench -f /home/gpadmin/read.sql -c 64 -C -j 8 -t 1563 pgbenchdb

执行结果

transaction type: /home/gpadmin/read.sql
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 8
number of transactions per client: 1563
number of transactions actually processed: 100032/100032
latency average = 4188.179 ms
tps = 15.281105 (including connections establishing)
tps = 15.371656 (excluding connections establishing)

性能太差了 就不往下测试了

2.2.3 更新测试

3 优化

3.1 写入优化

1 copy in方式

采用CopyManager对Greenplum进行数据导出导入的Java Demo:Greenplum copy in的Java工具类_CarsonBigData的博客-CSDN博客

2 gpfdist方式

Greenplum——基于Greenplum-Spark Connector的Spark脚本开发及遇到的坑_CarsonBigData的博客-CSDN博客_greenplum spark connector

3.2 更新优化

方案描述:增量数据和目标表数据合并到临时表,然后覆盖目标表。

-- 业务库insert和update的数据:tmp_incr_data

-- 今天过来的最新数据:tmp_update_data

-- 目标表里不需要更新的数据:tmp_not_update_data

-- 目标表:dest_table,按天分区

-- 主键字段:ID

1、取出当天未更新的数据,放到临时表里

CREATE TABLE tmp_not_update_data
SELECT *
FROM dest_table a
LEFT JOIN tmp_incr_data b
    ON a.id = b.id
WHERE b.id is null;

2、从目标表删除当天数据

TRUNCATE TABLE dest_table PARTITION(DATA_DATE = '2022-07-28');

3、把当天未更新的数据插回目标表

INSERT INTO dest_table
SELECT * FROM tmp_not_update_data;

4、把当天更新的数据查到目标表

INSERT INTO dest_table
SELECT * FROM tmp_update_data;

持续更新。。。

 参考博客:

Greenplum6 JDBC insert 性能百倍提升 *之踩坑之路_恋奴娇的博客-CSDN博客

HybridDB for PostgreSQL , Greenplum 写入性能优化实践_weixin_34391854的博客-CSDN博客
Greenplum6 JDBC insert性能媲美MySQL_Greenplum中文社区的博客-CSDN博客_greenplum写入性能

通过copyIn向PostgreSQL或Greenplum写入数据_陈序猿张的博客-CSDN博客

Greenplum基于pgbench的性能测试 - 腾讯云开发者社区-腾讯云

pgbench 使用介绍及示例 - 墨天轮

基于pgbench的GreenPlum压力测试_zyset的博客-CSDN博客_pgbench 官方压测结果

 pgbench10安装记录(yum)_mingjie73的博客-CSDN博客_pgbench 安装

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值