离线数据迁移DataX3初使用

DataX3使用起来还是很方便的,下面是一些官方的东西
DataX3的GitHub地址https://github.com/alibaba/DataX,里面包含DataX3的介绍,下载链接。
DataX3的使用方法https://github.com/alibaba/DataX/wiki/Quick-Start
各种reader和writer的配置参数https://github.com/alibaba/DataX/wiki/DataX-all-data-channels

基本使用

检查系统Python是否可用,Linux系统一般都自带Python,官方建议使用Python2
这里写图片描述

直接下载DataX,http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
下载后解压至本地某个目录,修改权限为755,进入bin目录,即可运行样例同步作业:

$ tar zxvf datax.tar.gz
$ sudo chmod -R 755 datax
$ cd  datax/bin
$ python datax.py ../job/job.json

样例运行结果,可以正常使用
这里写图片描述


然后创建自己的配置文件
DataX目前支持的数据源DataX all data channels

以mysqlreader和mysqlwriter为例,查看配置模板

$ cd  datax/bin
$  python datax.py -r mysqlreader -w mysqlwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the mysqlreader document:
     https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md 

Please refer to the mysqlwriter document:
     https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md 

Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": [], 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "username": "", 
                        "where": ""
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": "", 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "preSql": [], 
                        "session": [], 
                        "username": "", 
                        "writeMode": ""
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

用上面的模板创建自己的json配置文件
命名为mysql2mysql.json,放到job目录下

$ vim datax/job/mysql2mysql.json
{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": ["id","name","location","age"], 
                        "connection": [
                            {
                                "jdbcUrl": ["jdbc:mysql://192.168.1.130:3306/people?useUnicode=true&characterEncoding=UTF-8"], 
                                "table": ["test"]
                            }
                        ], 
                        "password": "123456", 
                        "username": "root", 
                        "where": "test.id <= 50000"
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": ["id","name","location","age"], 
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.1.131:3306/people?useUnicode=true&characterEncoding=UTF-8", 
                                "table": ["test"]
                            }
                        ], 
                        "password": "123456", 
                        "preSql": [], 
                        "session": [], 
                        "username": "root", 
                        "writeMode": "insert"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "100"
            }
        }
    }
}

各参数的详细解释可以到数据源DataX all data channels查看,包括其他数据源的也是类似的做法。
然后在datax目录下创建脚本,启动DataX

#!/bin/bash
python ./bin/datax.py ./job/mysql2mysql.json > ./log/mysql2mysql.log &

可在datax/log/mysql2mysql.log处查看运行情况。此处就不粘贴运行结果了。


DataX3使用起来还是很方便的,速度还可以,同网络内,100并发5千万(15G大小)的数据大概80分钟吧

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值