1、下载Datax
cd /data
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
tar -zxvf datax.tar.gz
# 需要删除隐藏文件 (重要)
rm -rf /data/datax/plugin/*/._*
2、解压后,查看自己需要转换的源数据源和目标数据源插件是否支持,如果都有的话则会有如下文件夹。
/data/datax/plugin/reader/mysqlreader
/data/datax/plugin/writer/elasticsearchwriter
3、通过解压所得我没有 elasticsearchwriter 插件,需自行用源码打包
4、下载Datax源码
https://gitee.com/mirrors/DataX.git
5、去掉自己不需要的 module ,我只留下了自己需要的 elasticsearchwriter
<modules>
<module>common</module>
<module>core</module>
<module>transformer</module>
<!-- reader -->
<!-- writer -->
<module>elasticsearchwriter</module>
<!-- common support module -->
<module>plugin-rdbms-util</module>
<module>plugin-unstructured-storage-util</module>
</modules>
6、编译 elasticsearchwriter,在Datax根目录执行
mvn clean install '-Dmaven.test.skip=true'
7、生成的插件包在 DataX\elasticsearchwriter\target\datax\plugin\writer,复制到 /data/datax/plugin/writer 下
8、编写tidb-es.json,其中read中也可以使用querySql来替代,但是 read 中的 column 与 writer 中 column 需对应,最好顺序也一致
{
"job": {
"setting": {
"speed": {
"channel": 8
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "root",
"column": [
"id as pk",
"id",
"indicators_name",
"indicators_code",
"indicators_region_name",
"create_time"
],
"where": "",
"splitPk": "id",
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://IP:PORT/schema?characterEncoding=utf-8"
],
"table": [
"table_name"
]
}
]
}
},
"writer": {
"name": "elasticsearchwriter",
"parameter": {
"endpoint": "http://IP:PORT",
"index": "datax_table_name",
"type": "_doc",
"settings": {
"index": {
"number_of_shards": 5,
"number_of_replicas": 1
}
},
"writeMode": "insert",
"cleanup": false,
"discovery": false,
"batchSize": 10000,
"splitter": ",",
"column": [
{
"name": "pk",
"type": "id"
},
{
"name": "id",
"type": "keyword"
},
{
"name": "indicators_name",
"type": "text"
},
{
"name": "indicators_code",
"type": "text"
},
{
"name": "indicators_region_name",
"type": "text"
},
{
"name": "create_time",
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
]
}
}
}
]
}
}
9、执行datax
Datax依赖python环境,如果没有python环境,需自行安装。我centos自带的 Python 2.7.5
python /data/datax/bin/datax.py /data/datax/job/tidb-es.json
10、过程再无问题,在Elasticsearch中查到迁移的数据
## Datax作为离线数据迁移工具,对实时迁移并不友好。如果需要定时,可搭配crontab使用。