DataX数据传输O2O

DataX工具安装与使用
一、准备环境

1、创建用户和组和目录

groupadd -g 1400 datax
useradd -g datax -u 1400 datax
mkdir /datax
chown datax:datax /datax

2、系统环境

Linux
JDK(1.8以上,推荐1.8)
Python(推荐Python2.6.X)
Apache Maven 3.x (Compile DataX)

(1)升级JDK到(1.8.0_251)点击下载
安装:
rpm -ivh jdk-8u151-linux-x64.rpm 
查看版本:
java -version
(2)安装 apache-maven 点击下载
解压缩到/datax:
jar -xvf apache-maven-3.5.2-bin.zip -d /datax
添加环境变量:
vi /home/datax/.bash_profile

alias mvn='/datax/apache-maven-3.5.2/bin/mvn'

查看版本:
mvn -version

alias的作用是给命令起一个别的名字(作用的是命令)
export的作用是设置一个变量(作用的是变量)

(3)查看python版本:python -V
(4)DataX 安装配置 点击下载

上传到/datax目录

解压缩:
tar -zxvf datax.tar.gz
自检脚本:
python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json

自检脚本无法通过报错处理:https://developer.aliyun.com/ask/376622?spm=a2c6h.13148508.0.0.1d734f0eKYBCG7

二、DataX O2O(Oracle to Oracle)配置清单

如两个不通数据库之间使用datax数据迁移
可以通过命令查看配置模板: python datax.py -r {Sourcedb_READER} -w {Targetdb_WRITER}
示例:python /datax/datax/bin/datax.py -r oraclereader -w oraclewriter

查看Oracle到Oracle数据传输json文件模板

[datax@ceshi1 ~]$ python /datax/datax/bin/datax.py -r oraclereader -w oraclewriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the oraclereader document:
     https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md 

Please refer to the oraclewriter document:
     https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md 
 
Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "oraclereader", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": [], 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "username": ""
                    }
                }, 
                "writer": {
                    "name": "oraclewriter", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": "", 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "preSql": [], 
                        "username": ""
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

DataX Job配置文件oraclereader和oraclewriter配置项清单如下:

在这里插入图片描述
在这里插入图片描述

三、操作示例
主机源端目标端
IP192.168.48.201(源端)192.168.48.130(目标端)
端口15211521
实例名orclorcl1
用户/密码nice/nicehr/hr
表名STUDYDATAX

两表表结构如下:

SQL> desc datax
 Name					   Null?    Type
 --------------------------------- -------- 
 STUID					   NOT NULL NUMBER(10)
 STUNAME				   NOT NULL VARCHAR2(20)

源端在/datax/datax/job目录下用oraclereader to oraclewriter模板配置json文件

vi test.json

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "oraclereader", 
                    "parameter": {
                        "column": [
                        "STUID",
                        "STUNAME"
                        ], 
                        "connection": [
                            {
                                "jdbcUrl": ["jdbc:oracle:thin:@192.168.48.201:1521:orcl"], 
                                "table": ["STUDY"]
                            }
                        ], 
                        "password": "nice", 
                        "username": "nice"
                    }
                }, 
                "writer": {
                    "name": "oraclewriter", 
                    "parameter": {
                        "column": [
                        "STUID",
                        "STUNAME"
                        ], 
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:oracle:thin:@192.168.48.130:1521:orcl1", 
                                "table": ["DATAX"]
                            }
                        ], 
                        "password": "hr", 
                        "preSql": ["delete from DATAX"], ##同步前清空DATAX表
                        "username": "hr"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "4"							##并行数,不加会报错
            }
        }
    }
}

使用python执行json文件输出如下:

[datax@shuaige job]$ python /datax/datax/bin/datax.py /datax/datax/job/test.json 

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2021-03-11 11:11:15.356 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2021-03-11 11:11:15.364 [main] INFO  Engine - the machine info  => 

	osInfo:	Oracle Corporation 1.8 25.251-b08
	jvmInfo:	Linux amd64 2.6.32-358.el6.x86_64
	cpu num:	2

	totalPhysicalMemory:	-0.00G
	freePhysicalMemory:	-0.00G
	maxFileDescriptorCount:	-1
	currentOpenFileDescriptorCount:	-1

	GC Names	[PS MarkSweep, PS Scavenge]

	MEMORY_NAME                    | allocation_size                | init_size                      
	PS Eden Space                  | 256.00MB                       | 256.00MB                       
	Code Cache                     | 240.00MB                       | 2.44MB                         
	Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
	PS Survivor Space              | 42.50MB                        | 42.50MB                        
	PS Old Gen                     | 683.00MB                       | 683.00MB                       
	Metaspace                      | -0.00MB                        | 0.00MB                         


2021-03-11 11:11:15.387 [main] INFO  Engine - 
{
	"content":[
		{
			"reader":{
				"name":"oraclereader",
				"parameter":{
					"column":[
						"STUID",
						"STUNAME"
					],
					"connection":[
						{
							"jdbcUrl":[
								"jdbc:oracle:thin:@192.168.48.201:1521:orcl"
							],
							"table":[
								"STUDY"
							]
						}
					],
					"password":"****",
					"username":"nice"
				}
			},
			"writer":{
				"name":"oraclewriter",
				"parameter":{
					"column":[
						"STUID",
						"STUNAME"
					],
					"connection":[
						{
							"jdbcUrl":"jdbc:oracle:thin:@192.168.48.130:1521:orcl1",
							"table":[
								"DATAX"
							]
						}
					],
					"password":"**",
					"preSql":[
						"delete from DATAX"
					],
					"username":"hr"
				}
			}
		}
	],
	"setting":{
		"speed":{
			"channel":"4"
		}
	}
}

2021-03-11 11:11:15.409 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2021-03-11 11:11:15.411 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2021-03-11 11:11:15.412 [main] INFO  JobContainer - DataX jobContainer starts job.
2021-03-11 11:11:15.414 [main] INFO  JobContainer - Set jobId = 0
2021-03-11 11:11:15.747 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@192.168.48.201:1521:orcl.
2021-03-11 11:11:15.938 [job-0] INFO  OriginalConfPretreatmentUtil - table:[STUDY] has columns:[STUID,STUNAME].
2021-03-11 11:11:16.190 [job-0] INFO  OriginalConfPretreatmentUtil - table:[DATAX] all columns:[
STUID,STUNAME
].
2021-03-11 11:11:16.226 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (STUID,STUNAME) VALUES(?,?)
], which jdbcUrl like:[jdbc:oracle:thin:@192.168.48.130:1521:orcl1]
2021-03-11 11:11:16.228 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2021-03-11 11:11:16.229 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do prepare work .
2021-03-11 11:11:16.231 [job-0] INFO  JobContainer - DataX Writer.Job [oraclewriter] do prepare work .
2021-03-11 11:11:16.263 [job-0] INFO  CommonRdbmsWriter$Job - Begin to execute preSqls:[delete from DATAX]. context info:jdbc:oracle:thin:@192.168.48.130:1521:orcl1.
2021-03-11 11:11:16.270 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2021-03-11 11:11:16.271 [job-0] INFO  JobContainer - Job set Channel-Number to 4 channels.
2021-03-11 11:11:16.283 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks.
2021-03-11 11:11:16.287 [job-0] INFO  JobContainer - DataX Writer.Job [oraclewriter] splits to [1] tasks.
2021-03-11 11:11:16.326 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2021-03-11 11:11:16.334 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2021-03-11 11:11:16.337 [job-0] INFO  JobContainer - Running by standalone Mode.
2021-03-11 11:11:16.349 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2021-03-11 11:11:16.434 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2021-03-11 11:11:16.434 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2021-03-11 11:11:16.447 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2021-03-11 11:11:16.455 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select STUID,STUNAME from STUDY 
] jdbcUrl:[jdbc:oracle:thin:@192.168.48.201:1521:orcl].
2021-03-11 11:11:16.499 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select STUID,STUNAME from STUDY 
] jdbcUrl:[jdbc:oracle:thin:@192.168.48.201:1521:orcl].
2021-03-11 11:11:16.749 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[306]ms
2021-03-11 11:11:16.750 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2021-03-11 11:11:26.438 [job-0] INFO  StandAloneJobContainerCommunicator - Total 2 records, 18 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-03-11 11:11:26.438 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2021-03-11 11:11:26.438 [job-0] INFO  JobContainer - DataX Writer.Job [oraclewriter] do post work.
2021-03-11 11:11:26.438 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do post work.
2021-03-11 11:11:26.438 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2021-03-11 11:11:26.439 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /datax/datax/hook
2021-03-11 11:11:26.440 [job-0] INFO  JobContainer - 
	 [total cpu info] => 
		averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
		-1.00%                         | -1.00%                         | -1.00%
                        

	 [total gc info] => 
		 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
		 PS MarkSweep         | 1                  | 1                  | 1                  | 0.048s             | 0.048s             | 0.048s             
		 PS Scavenge          | 1                  | 1                  | 1                  | 0.025s             | 0.025s             | 0.025s             

2021-03-11 11:11:26.441 [job-0] INFO  JobContainer - PerfTrace not enable!
2021-03-11 11:11:26.442 [job-0] INFO  StandAloneJobContainerCommunicator - Total 2 records, 18 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-03-11 11:11:26.443 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2021-03-11 11:11:15
任务结束时刻                    : 2021-03-11 11:11:26
任务总计耗时                    :                 11s
任务平均流量                    :                1B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   2
读写失败总数                    :                   0

[datax@shuaige job]$

执行成功,查看目标端数据库

SQL> select * from datax;

     STUID STUNAME
---------- --------------------
	 1 xiaohong
	 2 xiaoming

SQL> 

数据传输成功,如果在执行json文件报错,一定要先看好json文件格式,用户密码大小写,[ ]," "等,一定要加并行数,"channel"参数至少为“1”,“preSql”:[“delete from DATAX”] SQL条件可以不加,但是有主键冲突的行会报错传输失败,也可以换其他sql。

至此结束,谢谢。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值