一、环境准备
- Linux
- JDK(1.8以上,推荐1.8)
- Python(推荐Python2.6.X)
yum
安装Python
:https://www.cnblogs.com/kaishirenshi/p/11858655.html
# centos7
# 换成阿里云的yum源
yum -y install epel-release
yum repolist
yum -y install python36
- 下载Datax:https://github.com/alibaba/DataX
二、案例实操
2.1 安装
1)将下载好的datax.tar.gz
上传到hadoop101的/opt/softwares
2)解压datax.tar.gz到/opt/module
[atguigu@hadoop102 software]$ tar-zxvf datax.tar.gz-C /opt/module/
3)运行自检脚本
[atguigu@hadoop102 bin]cd /opt/module/datax/bin/
[atguigu@hadoop102 bin] python datax.py /opt/module/datax/job/job.json
2.2 查看官方模板
python /opt/module/datax/bin/datax.py - r mysqlreader - w
hdfswriter {
"job": {
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [{
"jdbcUrl": [],
"table": []
}],
"password": "",
"username": "",
"where": ""
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [],
"compress": "",
"defaultFS": "",
"fieldDelimiter": "",
"fileName": "",
"fileType": "",
"path": "",
"writeMode": ""
}
}
}],
"setting": {
"speed": {
"channel": ""
}
}
}
}
2.3 准备数据
1)创建 student 表
mysql> create database datax;
mysql> use datax;
mysql> create table student(id int,name varchar(20));
2)插入数据
mysql> insert into student values(1001,'zhangsan'),(1002,'lisi'),(1003,'wangwu');
2.4 编写配置文件
vim /opt/module/datax/job/mysql2hdfs.json
{
"job": {
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"id",
"name"
],
"connection": [{
"jdbcUrl": [
"jdbc:mysql://hadoop102:3306/datax"
],
"table": [
"student"
]
}],
"username": "root",
"password": "000000"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [{
"name": "id",
"type": "int"
},
{
"name": "name",
"type": "string"
}
],
"defaultFS": "hdfs://hadoop102:9000",
"fieldDelimiter": "\t",
"fileName": "student.txt",
"fileType": "text",
"path": "/",
"writeMode": "append"
}
}
}],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
MySQL的地址和HDFS的地址记得改为自己的,HDFS的端口可在配置文件中查看
2.5 执行任务
bin/datax.py job/mysql2hdfs.json
注意: HdfsWriter 实际执行时会在该文件名后添加随机的后缀作为每个线程写入实际文件名。