首先,得先知道,如何使用datax。
其实很简单,首先准备好jdk和Python环境,然后准备一个json(比如mysqlTomysql.json)文件,最后用命令在服务器或者本地执行
python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_JSON_PATH}/mysqlTomysql.json
即可实现异构数据源之间数据的传递。
当然,这里最重要的是要准备好这个json文件,这个json文件如何写呢?
首先,json文件主要分为reader和writer两部分,在datax框架里面,每个reader和writer下,都有一个doc文件,里面有这个reader或者writer的使用说明,模板和具体参数解释,你只需要根据这个模板,修改参数,就可以生成自己的reader和writer模板,然后两者拼接起来即可。
下面以mysql到mysql为例:
1,通过命令查看配置模板: python {YOUR_DATAX_HOME}/bin/datax.py -r mysqlreader -w mysqlwriter
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": [],
"table": []
}
],
"password": "",
"username": "",
"where": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": "",
"table": []
}
],
"password": "",
"preSql": [],
"session": [],
"username": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}
2,根据mysqlreader和mysqlwriter下的doc文件,了解各个参数的意思,做填空就可以了
{
"job": {
"setting": {
"speed": {
"channel": 5
},
"errorLimit": {
"record": 0,
"percentage": 0.01
}
},
"content": [
{
"reader":{
"name":"mysqlreader",
"parameter":{
"username":"root",
"password":"root",
"column":[
"id"
,
"name"
,
"age"
,
"sex"
,
"remake"
,
"name"
],
"connection":[
{
"table":["test_output"],
"jdbcUrl":["jdbc:mysql://localhost:3306/test_output"]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "root",
"column": [
"id"
,
"name"
,
"age"
,
"sex"
,
"remake"
,
"abc"
],
"session": [
],
"preSql": [
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://localhost:3306/test_input",
"table": [
"test_input"
]
}
]
}
}
}
]
}
}
3,将mysqlTomysql.json文件放在服务器路径/datax/test/下,执行命令
python /datax/bin/datax.py /datax/test/mysqlTomysql.json`即可实现数据采集
2020-05-06 20:15:49.716 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
2020-05-06 20:15:49.717 [job-0] INFO JobContainer - PerfTrace not enable!
2020-05-06 20:15:49.717 [job-0] INFO StandAloneJobContainerCommunicator - Total 14 records, 420 bytes | Speed 42B/s, 1 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2020-05-06 20:15:49.718 [job-0] INFO JobContainer -
任务启动时刻 : 2020-05-06 20:15:38
任务结束时刻 : 2020-05-06 20:15:49
任务总计耗时 : 10s
任务平均流量 : 42B/s
记录写入速度 : 1rec/s
读出记录总数 : 14
读写失败总数 : 0
到这里就完成了将数据从mysql拉到mysql了。