Datax使用小结

1.首先,先来了解下Datax是做什么的吧:https://github.com/alibaba/DataX/blob/master/introduction.md

2.Datax下载链接(Windows也可以使用):http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

3.正如教程讲的,使用Datax之前要先安装JDK和PYTHON2.7.16环境。

  安装PY2.7教程:https://www.jb51.net/article/136324.htm

4.假设我们现在要同步Mysql-Mysql的数据,先建好两个库(test1和test2)和连个表(channel_logs和channel_log_test)

/* 建表语句
Navicat MySQL Data Transfer

Source Server         : 本地
Source Server Version : 50711
Source Host           : localhost:3306
Source Database       : test1

Target Server Type    : MYSQL
Target Server Version : 50711
File Encoding         : 65001

Date: 2019-03-29 12:01:47
*/

SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for `channel_logs`
-- ----------------------------
DROP TABLE IF EXISTS `channel_logs`;
CREATE TABLE `channel_logs` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
  `name` varchar(255) DEFAULT NULL COMMENT '当前名称',
  `channel_id` int(11) DEFAULT NULL COMMENT 'id'
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8;

-- ----------------------------
-- Records of channel_logs
-- ----------------------------
INSERT INTO `channel_logs` VALUES ('6', '记录1', '8');
INSERT INTO `channel_logs` VALUES ('7', '记录32', '8');
INSERT INTO `channel_logs` VALUES ('8', '记录3', '8');
INSERT INTO `channel_logs` VALUES ('9', '记录4', '8');
INSERT INTO `channel_logs` VALUES ('10', '记录5', '8');
INSERT INTO `channel_logs` VALUES ('11', '记录6', '8');
INSERT INTO `channel_logs` VALUES ('12', '记录6', '9');
INSERT INTO `channel_logs` VALUES ('13', '记录7', '10');
INSERT INTO `channel_logs` VALUES ('14', '记录8', '9');
INSERT INTO `channel_logs` VALUES ('15', '记录9', '10');
INSERT INTO `channel_logs` VALUES ('16', '记录98', '9');
INSERT INTO `channel_logs` VALUES ('17', '记录0', '10');
INSERT INTO `channel_logs` VALUES ('18', '记录67', '9');
INSERT INTO `channel_logs` VALUES ('19', '记录77', '10');
INSERT INTO `channel_logs` VALUES ('20', '记录88', '10');
INSERT INTO `channel_logs` VALUES ('21', '记录858', '10');
INSERT INTO `channel_logs` VALUES ('22', '记录96', '9');
INSERT INTO `channel_logs` VALUES ('23', '记录465', '10');
INSERT INTO `channel_logs` VALUES ('24', '记录65', '9');

  两个表结构一致,第一个表插入数据,第二个表作为空表。

5.好了,表建好之后,现在准备datax脚本,如下:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "column": [              
                            "id",
                            "name"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test1"], 
                                "table": ["channel_logs"]
                            }
                        ],
                        "password": "123456",
                        "username": "canal"
                    }

                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "column": [
                            "id",
                            "name"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/test2",
                                "table": ["channel_log_test"]
                            }
                        ],
                        "password": "123456",
                        "username": "canal"
                    }

                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 1
           }
        }
    }
}

6.打开cmd.exe执行datax脚本,输入:

python E:\datax\bin\datax.py E:\datax\bin\test.json

如图:

7.如下的执行结果即成功:

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2019-03-29 10:39:50.043 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2019-03-29 10:39:50.051 [main] INFO  Engine - the machine info  =>

        osInfo: Oracle Corporation 1.8 25.0-b70
        jvmInfo:        Windows 8.1 amd64 6.3
        cpu num:        4

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size
        PS Eden Space                  | 256.00MB                       | 256.00MB
        Code Cache                     | 240.00MB                       | 2.44MB
        Compressed Class Space         | 1,024.00MB                     | 0.00MB
        PS Survivor Space              | 42.50MB                        | 42.50MB
        PS Old Gen                     | 683.00MB                       | 683.00MB
        Metaspace                      | -0.00MB                        | 0.00MB


2019-03-29 10:39:50.067 [main] INFO  Engine -
{
        "content":[
                {
                        "reader":{
                                "name":"mysqlreader",
                                "parameter":{
                                        "column":[
                                                "id",
                                                "name"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:mysql://127.0.0.1:3306/gzhgdb"
                                                        ],
                                                        "table":[
                                                                "channel_logs"
                                                        ]
                                                }
                                        ],
                                        "password":"******",
                                        "username":"canal"
                                }
                        },
                        "writer":{
                                "name":"mysqlwriter",
                                "parameter":{
                                        "column":[
                                                "id",
                                                "name"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":"jdbc:mysql://127.0.0.1:3306/test",
                                                        "table":[
                                                                "channel_log_test"
                                                        ]
                                                }
                                        ],
                                        "password":"******",
                                        "username":"canal"
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":1
                }
        }
}

2019-03-29 10:39:50.114 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2019-03-29 10:39:50.130 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2019-03-29 10:39:50.130 [main] INFO  JobContainer - DataX jobContainer starts job.
2019-03-29 10:39:50.130 [main] INFO  JobContainer - Set jobId = 0
2019-03-29 10:39:50.671 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://127.0.0.1:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2019-03-29 10:39:50.686 [job-0] INFO  OriginalConfPretreatmentUtil - table:[channel_logs] has columns:[id,name,channel_id].
2019-03-29 10:39:51.061 [job-0] INFO  OriginalConfPretreatmentUtil - table:[channel_log_test] all columns:[
id,name,channel_id].
2019-03-29 10:39:51.077 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (id,name) VALUES(?,?)
], which jdbcUrl like:[jdbc:mysql://127.0.0.1:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2019-03-29 10:39:51.077 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2019-03-29 10:39:51.077 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2019-03-29 10:39:51.077 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2019-03-29 10:39:51.077 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2019-03-29 10:39:51.077 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2019-03-29 10:39:51.077 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2019-03-29 10:39:51.092 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2019-03-29 10:39:51.148 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2019-03-29 10:39:51.164 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2019-03-29 10:39:51.164 [job-0] INFO  JobContainer - Running by standalone Mode.
2019-03-29 10:39:51.179 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2019-03-29 10:39:51.179 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2019-03-29 10:39:51.179 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2019-03-29 10:39:51.211 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2019-03-29 10:39:51.211 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from channel_logs
] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/gzhgdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-03-29 10:39:51.257 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from channel_logs
] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-03-29 10:39:51.532 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[321]ms
2019-03-29 10:39:51.532 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2019-03-29 10:40:01.196 [job-0] INFO  StandAloneJobContainerCommunicator - Total 19 records, 148 bytes | Speed 14B/s, 1 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2019-03-29 10:40:01.196 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2019-03-29 10:40:01.200 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2019-03-29 10:40:01.204 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2019-03-29 10:40:01.208 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2019-03-29 10:40:01.216 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: E:\datax\hook
2019-03-29 10:40:01.220 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s
                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s

2019-03-29 10:40:01.220 [job-0] INFO  JobContainer - PerfTrace not enable!
2019-03-29 10:40:01.224 [job-0] INFO  StandAloneJobContainerCommunicator - Total 19 records, 148 bytes | Speed 14B/s, 1 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2019-03-29 10:40:01.232 [job-0] INFO  JobContainer -
任务启动时刻                    : 2019-03-29 10:39:50
任务结束时刻                    : 2019-03-29 10:40:01
任务总计耗时                    :                 11s
任务平均流量                    :               14B/s
记录写入速度                    :              1rec/s
读出记录总数                    :                  19
读写失败总数                    :                   0

 

1.datax脚本中不要有:"#注释";

2.cmd.exe执行datax脚本命令时乱码解决:输入CHCP 65001;

3.数据库中的数据中文乱码解决:在datax脚本文件中jdbcUrl项加上:?characterEncoding=utf8。

 

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值