{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"id",
"type_id",
"type",
"sale_type",
"trademark",
"company",
"seating_capacity",
"power_type",
"charge_type",
"category",
"weight_kg",
"warranty"
],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://hadoop102:3306/car_data"
],
"table": [
"car_info"
]
}
],
"password": "000000",
"splitPk": "",
"username": "root"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{
"name": "id",
"type": "string"
},
{
"name": "type_id",
"type": "string"
},
{
"name": "type",
"type": "string"
},
{
"name": "sale_type",
"type": "string"
},
{
"name": "trademark",
"type": "string"
},
{
"name": "company",
"type": "string"
},
{
"name": "seating_capacity",
"type": "bigint"
},
{
"name": "power_type",
"type": "string"
},
{
"name": "charge_type",
"type": "string"
},
{
"name": "category",
"type": "string"
},
{
"name": "weight_kg",
"type": "bigint"
},
{
"name": "warranty",
"type": "string"
}
],
"hadoopConfig": {
"dfs.nameservices": "mycluster",
"dfs.namenode.rpc-address.mycluster.nn2": "hadoop103:8020",
"dfs.namenode.rpc-address.mycluster.nn1": "hadoop102:8020",
"dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"dfs.ha.namenodes.mycluster": "nn1,nn2"
},
"compress": "gzip",
"defaultFS": "hdfs://mycluster",
"fieldDelimiter": "\t",
"fileName": "car_info",
"fileType": "text",
"path": "${targetdir}",
"writeMode": "append"
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
1.测试DataX执行
DataX导入数据时,需要目的地目录已经存在,因此我们在执行DataX任务之前,首先要创建导出目录:hadoop fs -mkdir -p /origin_data/car_info/2023-05-01
然后执行以下命令:bin/datax.py job/car_info.json -p"-Dtargetdir=/origin_data/car_info/2023-05-01"
执行完成后,查看HDFS上/origin_data/car_info/2023-05-01目录中有没有出现数据。
注意:(测试时写入:python路径写绝对路径:"path": "/origin_data/car_info/2023-05-01")