如果是第一次操作的同学不知道的可以看我之前写的博客,操作过一次后面基本是相同的了
在目录三:https://blog.csdn.net/Zsigner/article/details/108362724
1、生成hdfswriter plugins 修改pom.xml
//原始的里面是所有很全的,不过一般都是按需install
<modules>
<module>common</module>
<module>core</module>
<module>transformer</module>
<!-- reader -->
<module>mysqlreader</module>
<!-- writer -->
<module>hdfswriter</module>
<!-- common support module -->
<module>plugin-rdbms-util</module>
<module>plugin-unstructured-storage-util</module>
<module>hbase20xsqlreader</module>
<module>hbase20xsqlwriter</module>
</modules>
2、编译生成elasticsearchwriter 插件
mvn clean install -Dmaven.test.skip=true
3、复制生成的文件到 /datax/plugin/,注意区分reader 跟writer
cp -r /usr/local/DataX-master/hdfswriter/target/datax/plugin/writer/hdfswriter /usr/local/data/datax/datax/plugin/writer
4、编写脚本 test.json
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "datax",
"password": "123456",
"where": "updated_at>='${start_time}' and updated_at<='${end_time}'",
"column": [
"app_id",
"created_at",
"id",
"in_id",
"is_deleted",
"resource_id"
],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/test?com.mysql.jdbc.faultInjection.serverCharsetIndex=45"
],
"table": [
"test"
]
}
]
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"defaultFS": "hdfs://127.0.0.1:4007",
"fileType": "orc",
"path": "/usr/hive/warehouse/db/test/statdate=${statdate}",
"fileName": "part-m-",
"column": [
{
"name": "app_id",
"type": "String"
},
{
"name": "created_at",
"type": "String"
},
{
"name": "id",
"type": "int"
},
{
"name": "in_id",
"type": "String"
},
{
"name": "is_deleted",
"type": "int"
},
{
"name": "resource_id",
"type": "String"
}
],
"writeMode": "append",
"fieldDelimiter": "\u0001",
"compress":"NONE"
}
}
}
]
}
}
5、运行脚本
python /usr/local/datax/bin/datax.py ./test.json -p "-Dstart_time=2020-09-05 -Dend_time=2020-09-05"