前言
datax笔记
一、json配置
{
"job": {
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"connection": [
{
"jdbcUrl": [
"jdbc:oracle:thin:@127.0.0.1:1521:test"
],
"querySql": [
"select name,card_id from student"
]
}
],
"password": "123456",
"username": "testapp"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{
"name": "name",
"type": "string"
},
{
"name": "card_id",
"type": "string"
}
],
// TODO core-site.xml里查看
"defaultFS": "hdfs://mytest",
"fieldDelimiter": " ",
"fileName": "文件名.txt",
"fileType": "text",
"hadoopConfig": {
"dfs.client.failover.proxy.provider.mytest": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
// TODO 查看hdfs-site.xml
"dfs.ha.namenodes.mytest": "nn1,nn2",
"dfs.namenode.rpc-address.mytest.nn1": "192.168.1.100:9000",
"dfs.namenode.rpc-address.mytest.nn2": "192.168.1.101:9000",
"dfs.nameservices": "mytest"
},
"path": "/",
"writeMode": "append"
}
}
}
],
"setting": {
"errorLimit": {
"percentage": 0.02,
"record": 0
},
"speed": {
"channel": 1
}
}
}
}
二、使用步骤
1.上传文件
将上述json修改成自己的信息后,保存到成json文件,上传到datax的job目录下。
2.执行任务
执行如下命令执行任务
python bin/datax.py job/oraclereader.json
三、参数化
将job中的属性值参数化,如将密码
的属性执行改为 $password
在执行脚本中传递,方式如下
python bin/datax.py -p "-Dpassword=123456" job/oraclereader.json