Datax是一个开源的数据交换工具,支持关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步,很多商业化的大数据平台也会将Datax作为ETL组件。
本文介绍将datax集成到springboot中进行开发。
一、下载官方安装包
1.下载地址:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
2.解压后保留conf、job、lib、plugin目录,其余目录和文件可以删掉。其中plugin目录只保留需要的插件。
3.将datax目录移动到springboot项目目录下
二、导入必要的依赖
pom.xml中加入以下依赖,其中版本号为0.0.1-SNAPSHOT的依赖包可从datax压缩包中获取,需要安装到本地仓库或上传到私有仓库。
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-core</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-common</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-transformer</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>mysqlreader</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>mysqlwriter</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>oraclereader</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>plugin-rdbms-util</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>ojdbc6</artifactId>
<version>11.2.0.3</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.2</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.4</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.3.2</version>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.0.15</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.73</version>
</dependency>
三、执行函数
@Value("${datax.home}")
private String dataxHome;
public void execute(String jobConfig) {
System.setProperty("datax.home", dataxHome);
System.setProperty("now", LocalTime.now().toString());
String[] dataxArgs = {"-job", dataxHome + "/job/" + jobConfig, "-mode", "standalone", "-jobid", "-1"};
try {
Engine.entry(dataxArgs);
} catch (Throwable e) {
e.printStackTrace();
}
}
- dataxHome参数为前面下载的datax目录路径;
- jobConfig 参数是配置数据同步任务的json文件,放在${dataxHome}/job目录下,格式为xxx.json;