异构数据同步 datax (2)-postgres 写扩展

1、源码调整注意事项

datax : 版本 

ceb085a41b274da793f405d9cc2ffc05.png

源码下载,自行用idea进行打包编译,修改完如下类,

com.alibaba.datax.plugin.writer.postgresqlwriter.PostgresqlWriter

com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil

编译替换jar文件名:

postgresqlwriter-0.0.1-SNAPSHOT.jar

plugin-rdbms-util-0.0.1-SNAPSHOT.jar

目录树如下:(plugin/writer/postgresqlwriter)

find <目录路径> | sed -e 's/[^-][^\/]*\//--/g' -e 's/--/|-/'

|-lib
|-bin
|-job
|-conf
|-log
|-log_perf
|-tmp
|-script
|-plugin
|---writer
|-----postgresqlwriter
|-------plugin_job_template.json
|-------plugin.json
|-------libs
|---------checker-qual-3.5.0.jar
|---------postgresql-42.3.3.jar
|---------commons-collections-3.0.jar
|---------druid-1.0.15.jar
|---------commons-lang3-3.3.2.jar
|---------logback-core-1.0.13.jar
|---------commons-io-2.4.jar
|---------datax-common-0.0.1-SNAPSHOT.jar
|---------guava-r05.jar
|---------plugin-rdbms-util-0.0.1-SNAPSHOT.jar
|---------hamcrest-core-1.3.jar
|---------logback-classic-1.0.13.jar
|---------commons-math3-3.1.1.jar
|---------slf4j-api-1.7.10.jar
|---------fastjson2-2.0.23.jar
|-------postgresqlwriter-0.0.1-SNAPSHOT.jar

2、使用

2.1 、可以支持带有唯一索引的表的新增或者更新

mysql 表结构

CREATE TABLE `sys_test_copy2` (
  `user_id` bigint NOT NULL DEFAULT '0' COMMENT '用户ID',
  `email` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT '' COMMENT '用户邮箱',
  `iso_country_code` varchar(3) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT 'ISO国家代码',
  `country` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '国家',
  `brand_no` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '品牌',
  `source` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '来源',
  `create_time` datetime DEFAULT NULL COMMENT '创建时间',
  PRIMARY KEY (`user_id`),
  UNIQUE KEY `sys_test_copy2_u1` (`email`,`iso_country_code`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

目标 PG表结构  (唯一索引是2个字段组成)

CREATE TABLE "public"."sys_test_copy2" (
  "user_id" int8 NOT NULL,
  "email" varchar(50) COLLATE "pg_catalog"."default",
  "iso_country_code" varchar(3) COLLATE "pg_catalog"."default",
  "country" varchar(50) COLLATE "pg_catalog"."default",
  "brand_no" varchar(30) COLLATE "pg_catalog"."default",
  "source" varchar(50) COLLATE "pg_catalog"."default",
  "create_time" timestamp(6),
  CONSTRAINT "sys_test_copy2_pkey" PRIMARY KEY ("user_id"),
  CONSTRAINT "sys_test_copy2_u1" UNIQUE ("email", "iso_country_code")
)
;

ALTER TABLE "public"."sys_test_copy2" 
  OWNER TO "postgres";

datax job: 

核心扩展点: "writeMode": "update!@#(user_id)!@#(email,iso_country_code)",

{
    "job":{
        "content":[
            {
                "reader":{
                    "name":"mysqlreader",
                    "parameter":{
                         "username": "root",
                        "password": "xxxxxx",
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://192.168.5.180:3306/xxxx?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&serverTimezone=GMT%2B8"
                                ],
                                "querySql": [
                                    " SELECT * from  sys_test_copy2"
                                ]
                            }
                    }
                },
                "writer":{
                    "name":"postgresqlwriter",
                    "parameter":{
                        "writeMode": "update!@#(user_id)!@#(email,iso_country_code)",
                        "column":[
                            "id",
                            "name"
                        ],
                        "connection":[
                            {
                                "jdbcUrl":"jdbc:postgresql://127.0.0.1:5432/postgres",
                                "table":[
                                    "sys_test_copy2"
                                ]
                            }
                        ],
                        "password":"xxxx",
                        "username":"postgres"
                    }
                }
            }
        ],
        "setting":{
            "speed":{
                "channel":6
            }
        }
    }
}

执行job,生成的模版语句:除了主键和 唯一索引字段不更新,其他字段都更新

INSERT INTO %s (user_id,email,iso_country_code,country,brand_no,source,create_time) VALUES(?::int8,?::varchar,?::varchar,?::varchar,?::varchar,?::varchar,?::timestamp) ON CONFLICT (user_id) DO UPDATE SET country=EXCLUDED.country,brand_no=EXCLUDED.brand_no,source=EXCLUDED.source,create_time=EXCLUDED.create_time

2.2、根据主键进行新增或者更新

INSERT INTO sys_test_copy1(user_id, email) VALUES (5592, 'xxxx5@hotmail.com')  ON CONFLICT (user_id) do nothing;

表结构就不放了,去掉唯一索引

datax job:

{
    "job": {
        "setting": {
            "speed": {
                "channel": 5
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "数据库密码",
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://192.168.5.180:3306/xxxx?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&serverTimezone=GMT%2B8"
                                ],
                                "querySql": [
                                    " SELECT * from  sys_test_copy1"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "postgresqlwriter",
                    "parameter": {
                        "username": "postgres",
                        "password": "数据库密码",
                        "writeMode": "insert!@#(user_id)",
                        "column": [
                            "*"
                        ],                      
                        "connection": [
                            {
                                "table": [
                                    "sys_test_copy1"
                                ],
                                "jdbcUrl": "jdbc:postgresql://192.168.5.190:5432/xxxx",
                            }
                        ]
                    }
                }
            }
        ]
    }
}

其实都是写的 insert into on CONFLICT 语句

com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil

下面代码是经过自测,可直接使用:核心逻辑就是对job 里面的 writeMode节点进行配置处理:

 "writeMode": "insert!@#(user_id)",

 "writeMode": "update!@#(user_id)!@#(email,iso_country_code)",

   private static String onDuplicateKeyUpdateString(String writeMode, List<String> columnHolders) {
        String[] writeModeArr = writeMode.split("!@#", -1);
        int writeModeArrLen = writeModeArr.length;
        writeMode = writeModeArr[0];
        // 主键
        String primaryKey = writeModeArr[1].replace("(", "").replace(")", "");
        StringBuilder sb = new StringBuilder();
        if ("insert".equals(writeMode) && writeModeArrLen == 2) {
            sb.append(" ON CONFLICT ").append(writeModeArr[1]).append(" do nothing");
        }
        System.out.println("columnHolders:"+ StringUtils.join(columnHolders, ","));
        if ("update".equals(writeMode)) {
            sb.append(" ON CONFLICT ").append(writeModeArr[1]);
            String[] unionFieldArr = writeModeArr[2].replace("(","").replace(")","").split(",", -1);
            List<String> unionFieldList = Arrays.asList(unionFieldArr);
            List<String> updateSqlList = new ArrayList<>();

            for (String updateField : columnHolders) {
                // 移除主键更新字段
                if (StringUtils.equalsIgnoreCase(updateField, primaryKey)) {
                    continue;
                }
                // 移除唯一约束字段
                if (unionFieldList.contains(updateField)) {
                    continue;
                }
                updateSqlList.add(updateField + "=EXCLUDED." + updateField);
            }
            if (updateSqlList.isEmpty()) {
                sb.append(" DO NOTHING");
            }else{
                sb.append(" DO UPDATE SET ").append(StringUtils.join(updateSqlList, ","));
            }
        }
        return sb.toString();
    }

3、小结:

pg插件,目前不支持插入更新操作,需要手工调整源码来适配。适配注意点,是根据你是否配置唯一索引来决定。(insert or update)

下期将简单介绍下,如果通过xxl-job 来执行 脚本

python datax.py ./job/mysql_postgres_job.json


 

4、参考:

4.1、postgres SQL 支持  插入更新操作(与mysql 语法有一定差异)

可下面文章

MySQL + PostgreSQL批量插入更新insertOrUpdate_mysql insert update-CSDN博客

4.2、datax中,可通过源码调整来实现

参考来源

https://juejin.cn/post/7124899170615296013

 示例有点问题,纠正如下

INSERT INTO public.t_user (user_id, username, password, age, created_time) VALUES (1, 'zs', '123', 18, now()), (2, 'ls', '123456', 19, now()), (3, 'ww', '123', 20, now()) ON CONFLICT (username, age, password) DO UPDATE SET created_time = EXCLUDED.created_time;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值