DataX3使用起来还是很方便的,下面是一些官方的东西
DataX3的GitHub地址https://github.com/alibaba/DataX,里面包含DataX3的介绍,下载链接。
DataX3的使用方法https://github.com/alibaba/DataX/wiki/Quick-Start
各种reader和writer的配置参数https://github.com/alibaba/DataX/wiki/DataX-all-data-channels
基本使用
检查系统Python是否可用,Linux系统一般都自带Python,官方建议使用Python2
直接下载DataX,http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
下载后解压至本地某个目录,修改权限为755,进入bin目录,即可运行样例同步作业:
$ tar zxvf datax.tar.gz
$ sudo chmod -R 755 datax
$ cd datax/bin
$ python datax.py ../job/job.json
样例运行结果,可以正常使用
然后创建自己的配置文件
DataX目前支持的数据源DataX all data channels
以mysqlreader和mysqlwriter为例,查看配置模板
$ cd datax/bin
$ python datax.py -r mysqlreader -w mysqlwriter
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the mysqlreader document:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
Please refer to the mysqlwriter document:
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": [],
"table": []
}
],
"password": "",
"username": "",
"where": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": "",
"table": []
}
],
"password": "",
"preSql": [],
"session": [],
"username": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}
用上面的模板创建自己的json配置文件
命名为mysql2mysql.json,放到job目录下
$ vim datax/job/mysql2mysql.json
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["id","name","location","age"],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://192.168.1.130:3306/people?useUnicode=true&characterEncoding=UTF-8"],
"table": ["test"]
}
],
"password": "123456",
"username": "root",
"where": "test.id <= 50000"
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["id","name","location","age"],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.131:3306/people?useUnicode=true&characterEncoding=UTF-8",
"table": ["test"]
}
],
"password": "123456",
"preSql": [],
"session": [],
"username": "root",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "100"
}
}
}
}
各参数的详细解释可以到数据源DataX all data channels查看,包括其他数据源的也是类似的做法。
然后在datax目录下创建脚本,启动DataX
#!/bin/bash
python ./bin/datax.py ./job/mysql2mysql.json > ./log/mysql2mysql.log &
可在datax/log/mysql2mysql.log处查看运行情况。此处就不粘贴运行结果了。
DataX3使用起来还是很方便的,速度还可以,同网络内,100并发5千万(15G大小)的数据大概80分钟吧