初识DataX3.0

目前接到任务,让同步表数据。市面很多同步工具不一一尝试了,信赖阿里,所以调研了一下阿里的dataX,一点点来吧,学习为主

环境准备:linux6.8

python自带的2.7

MySQL 5.7.1

1.先下载: wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

2.解压:tar -zxvf datax.tar.gz -C /usr/local/

3.授权: chmod -R 755 datax/*

4.进入目录# cd datax/

5.启动脚本:python datax.py ../job/job.json

展示如下

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2019-05-17 02:40:26.101 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2019-05-17 02:40:26.138 [main] INFO  Engine - the machine info  => 

	osInfo:	Oracle Corporation 1.8 25.91-b14
	jvmInfo:	Linux amd64 2.6.32-642.el6.x86_64
	cpu num:	2

	totalPhysicalMemory:	-0.00G
	freePhysicalMemory:	-0.00G
	maxFileDescriptorCount:	-1
	currentOpenFileDescriptorCount:	-1

	GC Names	[PS MarkSweep, PS Scavenge]

	MEMORY_NAME                    | allocation_size                | init_size                      
	PS Eden Space                  | 256.00MB                       | 256.00MB                       
	Code Cache                     | 240.00MB                       | 2.44MB                         
	Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
	PS Survivor Space              | 42.50MB                        | 42.50MB                        
	PS Old Gen                     | 683.00MB                       | 683.00MB                       
	Metaspace                      | -0.00MB                        | 0.00MB                         


2019-05-17 02:40:26.204 [main] INFO  Engine - 
{
	"content":[
		{
			"reader":{
				"name":"streamreader",
				"parameter":{
					"column":[
						{
							"type":"string",
							"value":"DataX"
						},
						{
							"type":"long",
							"value":19890604
						},
						{
							"type":"date",
							"value":"1989-06-04 00:00:00"
						},
						{
							"type":"bool",
							"value":true
						},
						{
							"type":"bytes",
							"value":"test"
						}
					],
					"sliceRecordCount":100000
				}
			},
			"writer":{
				"name":"streamwriter",
				"parameter":{
					"encoding":"UTF-8",
					"print":false
				}
			}
		}
	],
	"setting":{
		"errorLimit":{
			"percentage":0.02,
			"record":0
		},
		"speed":{
			"byte":10485760
		}
	}
}

2019-05-17 02:40:26.254 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2019-05-17 02:40:26.257 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2019-05-17 02:40:26.257 [main] INFO  JobContainer - DataX jobContainer starts job.
2019-05-17 02:40:26.271 [main] INFO  JobContainer - Set jobId = 0
2019-05-17 02:40:26.321 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2019-05-17 02:40:26.325 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2019-05-17 02:40:26.327 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2019-05-17 02:40:26.327 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2019-05-17 02:40:26.332 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2019-05-17 02:40:26.335 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2019-05-17 02:40:26.340 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2019-05-17 02:40:26.393 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2019-05-17 02:40:26.401 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2019-05-17 02:40:26.408 [job-0] INFO  JobContainer - Running by standalone Mode.
2019-05-17 02:40:26.462 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2019-05-17 02:40:26.468 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2019-05-17 02:40:26.474 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2019-05-17 02:40:26.535 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2019-05-17 02:40:27.036 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[521]ms
2019-05-17 02:40:27.037 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2019-05-17 02:40:36.474 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.021s |  All Task WaitReaderTime 0.185s | Percentage 100.00%
2019-05-17 02:40:36.474 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2019-05-17 02:40:36.475 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2019-05-17 02:40:36.475 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2019-05-17 02:40:36.475 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2019-05-17 02:40:36.477 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /usr/local/datax/hook
2019-05-17 02:40:36.478 [job-0] INFO  JobContainer - 
	 [total cpu info] => 
		averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
		-1.00%                         | -1.00%                         | -1.00%
                        

	 [total gc info] => 
		 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
		 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
		 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2019-05-17 02:40:36.478 [job-0] INFO  JobContainer - PerfTrace not enable!
2019-05-17 02:40:36.479 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.021s |  All Task WaitReaderTime 0.185s | Percentage 100.00%
2019-05-17 02:40:36.480 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2019-05-17 02:40:26
任务结束时刻                    : 2019-05-17 02:40:36
任务总计耗时                    :                 10s
任务平均流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0

这样就可以进入job中编辑自己的同步表的json了

{
    "job": {
        "setting": {
            "speed": {              
                 "byte": 1048576,
                 "channel":"5",                 
            }
        },
        "content": [
            {
                "reader": {
				//name不能随便改
                    "name": "mysqlreader",
                    "parameter": {                       
                        "username": "root",                       
                        "password": "123456",
                        "column": ["*"],                      
                        "connection": [
                            {
                                "table": ["sys_user"],
                                "jdbcUrl": ["jdbc:mysql://ip:3306/bdc_asis?useUnicode=true&characterEncoding=utf8"]
                            }
                        ]
                    }
                },
               "writer": {
			   //测试了一下name不能随便改
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "replace into",
                        "username": "root",
                        "password": "1111",
                        "column":["*"],
                        "connection": [
                            {
                                "table": ["sys_user"],
                                "jdbcUrl": "jdbc:mysql://ip:3306/bdc_asis?useUnicode=true&characterEncoding=utf8"                                
                            }
                        ]
                    }
                }
            }
        ]
    }
}

 有几点需要注意:1.账户密码不能错。2.这个是mysql到mysql,不需要引用其他的插件,如果你是sqlserver或者oracle需要自己下载一下插件或者查一下资料。sqlsercer插件sqljdbc_6.0.8112.100_enu.tar.gz驱动:wget https://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/enu/sqljdbc_6.0.8112.100_enu.tar.gz

解压并把sqljdbc42.jar文件移至/usr/local/datax/lib目录下,并授权

#tar -xf sqljdbc_6.0.8112.100_enu.tar.gz 
#mv /sqljdbc_6.0/enu/jre8/sqljdbc42.jar  /usr/local/datax/lib/
#chmod 755 /usr/local/datax/lib/sqljdbc42.jar

还需要注意:数据库授权可以远程访问。如果遇到java.sql.SQLException:null, message from server: "Host 'xxxxx' is not allowed to connect.基本即使没授权。

进入mysql

use mysql; 回车,在输入 : show tables;

输入: select host from user;  在输入: update user set host ='%' where user ='root';

千万不要忘记刷新授权!千万不要忘记刷新授权!千万不要忘记刷新授权!
 

刷新权限

FLUSH PRIVILEGES

定时脚本编写需要注意,我们可能要好多json需要操作这样就会写好多的定时脚本,这样有点笨。我们可以采用写一个shell然后启动多个json,这样比较方便一些

#!/bin/bash
#一定要写这个,声明你的环境变量
source /etc/profile

/usr/local/datax/bin/datax.py /usr/local/jobs/sys_user1.json " >>/usr/local/datax_log/sys_user1.`date +%Y%m%d`  2>&1 &
/usr/local/datax/bin/datax.py /usr/local/jobs/sys_user2.json " >>/usr/local/datax_log/sys_user2.`date +%Y%m%d`  2>&1 &
/usr/local/datax/bin/datax.py /usr/local/jobs/sys_user3.json " >>/usr/local/datax_log/sys_user3.`date +%Y%m%d`  2>&1 &

然后写个定时任务

$ crontab -e
#一分钟执行一次脚本
*/1 * * * * /user/local/dataX3.0.sh >/dev/null 2>&1
  • 我是定时每分钟跑一次脚本,注意一定要处理输入文件,因为cron会见执行情况通过mail发给用户,时间长系统会爆炸掉的哦
  • 3
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值