- datax安装
环境准备:python版本:2.x+
jdk版本:1.7+
maven版本3.x
查看环境
#查看python版本
python --version
#查看java版本
java -version
#查看maven版本
mvn -v
- datax下载
http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
datax解压安装
#解压
tar -zxvf datax.tar.gz -C /opt/install/datax
#安装(进入bin目录执行)
python datax.py ../job/job.json
- 测试是否安装成功stream2stream
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"sliceRecordCount": 10,
"column": [
{
"type": "long",
"value": "10"
},
{
"type": "string",
"value": "hello,你好,世界-DataX"
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "UTF-8",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": 5
}
}
}
}
- 测试mysql导入mysql
{
"job": {
"setting": {
"speed": {
"channel": 3
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "root",
"column": ["id","name"],
"connection": [{
"table": [
"mysql_user"
],
"jdbcUrl": [
"jdbc:mysql://192.168.1.164:3306/sqoop"
]
}]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "root",
"column": ["id","name"],
"connection": [{
"jdbcUrl": "jdbc:mysql://192.168.1.164:3306/wb",
"table": [
"mysql_user"
]
}]
}
}
}]
}
}
- 测试mysql导入hive
{
"job": {
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["id", "name"],
"where":"1=1",
"connection": [{
"jdbcUrl": ["jdbc:mysql://192.168.1.164:3306/sqoop"],
"table": ["mysql_user"]
}],
"username": "root",
"password": "root"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [{
"name": "id",
"type": "int"
}, {
"name": "name",
"type": "STRING"
}
],
"defaultFS": "hdfs://myhbase:8020",
"fieldDelimiter": "\t",
"fileName": "mysql_user",
"fileType": "text",
"path": "/user/hive/warehouse/myhive.db/mysql_user",
"writeMode": "append"
}
}
}],
"setting": {
"speed": {
"byte": 10485760,
"channel": "5"
}
}
}
- 注意:
1.DataX没法同步表结构,因此在同步数据时,必须保证writer里的表已经存在。
2.column字段最好不要用“*”代替,保证需要同步什么字段就写什么字段。WARN:您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。
3.通常不建议写入数据库时,通道个数 > 32