1.前提条件是:第一步应该在本地安装好 jdk 、 maven 、 python的基础环境。
(注意:这里使用的python最好使用python 2.X。因为使用python 3.X会出现一个问题。这里不做说明)
2.在DataX的官网上下载压缩包,然后解压到自定义的文件夹里面。
3. win +R 输入cmd命令。进入命令行模式。
然后进入安装的DataX的bin目录下。输入python datax.py …/job/json/job.json命令。(注意:这里我在job文件夹下添加了一个json的文件夹。默认是没有json文件夹。你们根据自己的文件路径输入命令即可)
4.如果出现乱码,在命令行输入 CHCP 65001 ,并且重新执行命令python datax.py …/job/json/job.json
成功运行!!
5.基本使用
5.1从stream读取数据并打印到控制台
首先查看官方的json配置模板
//查看streamreader-->streamwriter模板
E:\DataX\datax\bin>datax.py -r streamreader -w streamwriter
//模板如下:
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the streamreader document:
https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md
Please refer to the streamwriter document:
https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md
Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [],
"sliceRecordCount": ""
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}
根据模板编写json文件
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [
{
"type":"string",
"value":"qijianing, hello world!"
},
{
"type":"string",
"value":"齐家宁, 你好!"
}
],
"sliceRecordCount": "10" //打印次数
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "utf-8", //编码格式utf-8
"print": true
}
}
}
],
"setting": {
"speed": {//控制并发数
"channel": "2"//控制并发2次-->这里因为是打印所以会sliceRecordCount乘以channel 打印20遍。如果设置为mysql真的会进行并发
}
}
}
}
在job/json下创建json类型的文件起名为stream2stream.json。然后在bin目录下运行python datax.py …/job/json/stream2stream.json 。运行结果如下图所示:
5.2从mysql到mysql批量插入。
5.2.1查看一下从mysql到mysql的json配置
E:\DataX\datax\bin>datax.py -r mysqlreader -w mysqlwriter
//json文件配置
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the mysqlreader document:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
Please refer to the mysqlwriter document:
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",//读取端,根据DataX定义好的设置
"parameter": {
"column": [],//读取端需要同步的列
"connection": [
{
"jdbcUrl": [],//读取端连接信息
"table": []//读取端指定的表
}
],
"password": "",//读取端密码
"username": "",//读取端账户
"where": ""//筛选条件
}
},
"writer": {
"name": "mysqlwriter",//写入端,根据DataX定义好的设置
"parameter": {
"column": [],//写入端需要同步的列
"connection": [
{
"jdbcUrl": "",//写入端的连接信息
"table": []//写入端指定的表
}
],
"password": "",//写入端的密码
"preSql": [],//执行写入之前做的事情
"session": [],
"username": "",//写入端账户
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""//指定channel数
}
}
}
}
我的配置的json
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://localhost:3306/eesy2?characterEncoding=utf-8"],
"table": ["account1"]
}
],
"password": "****",//这里填写自己的用户名和密码
"username": "****",
"where": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": "jdbc:mysql://localhost:3306/eesy1?characterEncoding=utf-8",
"table": ["account"]
}
],
"password": "****",//这里填写自己的用户名和密码
"preSql": [],
"session": [],
"username": "****",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "5"
}
}
}
}
然后cd到bin目录下去执行。
成功将数据同步到mysql。
6.在从mysql到mysql同步数据的过程中,出现了一些问题:
因为要同步的两张表都定义了主键。当同步两张表的数据的时候。相同主键的数据无法进行同步。所以只同步成功了两张表主键不一致的数据。
所以,接下来对json文件进行一下修改。
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["name","money"],//这里只同步name、money。对主键字段id不进行同步。采用自增长的方式增加id。
"connection": [
{
"jdbcUrl": ["jdbc:mysql://localhost:3306/eesy2?characterEncoding=utf-8"],
"table": ["account1"]
}
],
"password": "root",
"username": "root",
"where": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["name","money"],//接收端的接收的字段也要和发送端保持一致的。
"connection": [
{
"jdbcUrl": "jdbc:mysql://localhost:3306/eesy1?characterEncoding=utf-8",
"table": ["account"]
}
],
"password": "root",
"preSql": [],
"session": [],
"username": "root",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "5"
}
}
}
}
运行结果如下:
第二次同步数据:
可以看出,数据并没有覆盖,而是在下面重复同步了一份数据。