ETL系列:二、SpringBoot集成DataX

一、引言

有的项目可能存在一些需求,项目需要使用自己的定时任务调度工具(如xxl-job等)来调度datax任务脚本,这个时候就需要在SpringBoot工程中集成Datax来使用。

二、集成方案

一般有两个比较简单的集成方案:

(1) 执行command命令方式

(2) 调用datax任务执行器方式

三、集成实战

1、执行command命令方式

此方案只需要编写一个工具类即可,但是应用运行环境需要支持python。

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.Arrays;

/**
 * 命令执行工具类
 */
@Component
public class ExecCommandUtil {
    private static Logger log = LoggerFactory.getLogger(ExecCommandUtil.class);
    private static String CHARSET;
    @Value("${spring.datax.command.charset:GBK}")
    public void setCharset(String charset) {
        this.CHARSET = charset;
    }

    public static void execCommand(String param) throws Exception {
        int exitValue = -1;
        String[] command = param.split(" ");
        log.info(Arrays.toString(command));
        BufferedReader bufferedReader = null;
        try {
            long startTime = System.currentTimeMillis();
            // command process
            ProcessBuilder processBuilder = new ProcessBuilder();
            processBuilder.command(command);
            processBuilder.redirectErrorStream(true);

            Process process = processBuilder.start();

            BufferedInputStream bufferedInputStream = new BufferedInputStream(process.getInputStream());

            // 指定读取流编码
            bufferedReader = new BufferedReader(new InputStreamReader(bufferedInputStream, CHARSET));

            // command log
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                log.info(line);
            }

            // command exit
            process.waitFor();
            long endTime = System.currentTimeMillis();
            log.debug("command execute spend time: {} ms", endTime - startTime);
            exitValue = process.exitValue();
        } finally {
            if (bufferedReader != null) {
                bufferedReader.close();
            }
        }

        // 命令退出值exitValue不等于0且不等于3,代表命令未成功执行
        if (exitValue != 0 && exitValue != 3) {
            throw new Exception(String.format("command is failed, exit value=%s.", exitValue));
        }
    }
}

2、调用datax任务执行器方式

(1) 添加依赖

注意:添加依赖前,需要将如下的这些包上传到私有仓库。

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>datax-common</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>datax-core</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <scope>test</scope>
        </dependency>

(2) datax依赖包构建

  • 下载源码

git clone git@github.com:alibaba/DataX.git

  • 打包上传私服,core与common包在maven下进行安装到本地

mvn install:install-file -DgroupId=com.datax -DartifactId=datax-core -Dversion=1.0.0 -Dpackaging=jar -Dfile=datax-core-0.0.1-SNAPSHOT.jar

mvn install:install-file -DgroupId=com.datax -DartifactId=datax-common -Dversion=1.0.0 -Dpackaging=jar -Dfile=datax-common-0.0.1-SNAPSHOT.jar

(3) 添加配置

安装路径就是上篇文章讲的datax安装目录

## DataX插件安装路径设置
spring.datax.homepath=/data/datax/datax

(4) 编码

  • datax工作目录系统变量设置工具类DataxHomePathUtil
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;


/**
 * datax工作目录工具类
 */
@Component
public class DataxHomePathUtil {
    private  static Logger logger = LoggerFactory.getLogger(DataxHomePathUtil.class);
    /**
     * datax工作目录
     * 存放插件与job定义文件
     */
    private static String DATAX_PLUGIN_PATH;

    @Value("${spring.datax.homepath:}")
    public void setDataxPluginPath(String dataxPluginPath)
    {
        this.DATAX_PLUGIN_PATH = dataxPluginPath;
    }

    public static void setDataxHomePath() {
        logger.debug("---datax插件安装目录:{}", DATAX_PLUGIN_PATH);
        System.setProperty("datax.home", DATAX_PLUGIN_PATH);
    }

}
  • DataX任务引擎调用工具类EngineHelper
import com.alibaba.datax.core.Engine;
import org.springframework.stereotype.Component;

/**
 * job引擎执行工具类
 */
@Component
public class EngineHelper {
    /**
     * datax任务引擎
     * @param jobJson   json配置文件路径
     * @throws Throwable
     */
    public static void entry(String jobJson) throws Throwable {
        DataxHomePathUtil.setDataxHomePath();
        String[] datxArgs2 = {"-job", jobJson, "-mode", "standalone", "-jobid", "-1"};
        Engine.entry(datxArgs2);
    }
}

3、测试

(1) 添加配置

添加配置前,请准备好数据同步任务脚本,并上传至对应路径。

## datax数据同步任务脚本
spring.datax.job.balfund=/data/datax/datax/job/balfund-1.json
## datax数据同步命令
spring.datax.command.py-balfund=python /data/datax/datax/bin/datax.py -p"-Dversion='8'" /data/datax/datax/job/balfund-clickhouse2.json

(2)编写测试类

import com.***.datax.util.EngineHelper;
import com.***.datax.util.ExecCommandUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;


@Controller
@RequestMapping("/datax")
public class DataxController {
    Logger log = LoggerFactory.getLogger(DataxController.class);
    @Value("${spring.datax.job.balfund}")
    private String jobJsonBalfund;

    @Value("${spring.datax.command.py-balfund}")
    private String pyJobBalfund;

    @GetMapping("/test-1")
    public String test1() {
        log.info("------------{}", jobJsonBalfund);
        try {
            EngineHelper.entry(jobJsonBalfund);
        } catch (Throwable e) {
            throw new RuntimeException(e);
        }
        return "执行完成";
    }

    @GetMapping("/test-2")
    public String test2() {
        log.info("------------{}", jobJsonBalfund);
        try {
            ExecCommandUtil.execCommand(pyJobBalfund);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
        return "执行完成";
    }
}

  • 3
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Spring Boot集成Kettle可以实现数据抽取、转换和加载(ETL)的功能。具体步骤如下: 1. 在pom.xml文件中添加Kettle的依赖: ``` <dependency> <groupId>org.pentaho</groupId> <artifactId>kettle-core</artifactId> <version>8.3..-371</version> </dependency> ``` 2. 创建Kettle的配置文件,例如kettle.properties,配置Kettle的相关参数,如数据库连接信息、文件路径等。 3. 在Spring Boot的配置文件中,添加Kettle的配置信息,如下所示: ``` @Configuration public class KettleConfig { @Value("${kettle.home}") private String kettleHome; @Value("${kettle.properties.file}") private String kettlePropertiesFile; @Bean public KettleEnvironment kettleEnvironment() throws KettleException { KettleEnvironment.init(false); System.setProperty("KETTLE_HOME", kettleHome); System.setProperty("KETTLE_PROPERTIES", kettlePropertiesFile); return KettleEnvironment.getInstance(); } @Bean public KettleDatabaseRepository kettleDatabaseRepository() throws KettleException { KettleDatabaseRepositoryMeta kettleDatabaseRepositoryMeta = new KettleDatabaseRepositoryMeta(); kettleDatabaseRepositoryMeta.setName("KettleDatabaseRepository"); kettleDatabaseRepositoryMeta.setConnection(new DatabaseMeta("Kettle", "MYSQL", "Native", "localhost", "kettle", "3306", "root", "root")); KettleDatabaseRepository kettleDatabaseRepository = new KettleDatabaseRepository(); kettleDatabaseRepository.init(kettleDatabaseRepositoryMeta); kettleDatabaseRepository.connect("admin", "admin"); return kettleDatabaseRepository; } } ``` 4. 编写Kettle的作业和转换,例如job.kjb和trans.ktr,放置在resources目录下。 5. 在Spring Boot中调用Kettle的作业和转换,如下所示: ``` @Service public class KettleService { @Autowired private KettleEnvironment kettleEnvironment; @Autowired private KettleDatabaseRepository kettleDatabaseRepository; public void runJob(String jobName) throws KettleException { JobMeta jobMeta = new JobMeta(kettleEnvironment, jobName, null); Job job = new Job(kettleDatabaseRepository, jobMeta); job.start(); job.waitUntilFinished(); if (job.getErrors() > ) { throw new KettleException("Job " + jobName + " failed with " + job.getErrors() + " errors."); } } public void runTrans(String transName) throws KettleException { TransMeta transMeta = new TransMeta(kettleEnvironment, transName); Trans trans = new Trans(transMeta); trans.execute(null); trans.waitUntilFinished(); if (trans.getErrors() > ) { throw new KettleException("Trans " + transName + " failed with " + trans.getErrors() + " errors."); } } } ``` 6. 在Controller中调用KettleService的方法,如下所示: ``` @RestController public class KettleController { @Autowired private KettleService kettleService; @GetMapping("/runJob") public String runJob(@RequestParam String jobName) throws KettleException { kettleService.runJob(jobName); return "Job " + jobName + " executed successfully."; } @GetMapping("/runTrans") public String runTrans(@RequestParam String transName) throws KettleException { kettleService.runTrans(transName); return "Trans " + transName + " executed successfully."; } } ``` 这样,就可以在Spring Boot中集成Kettle,并实现数据抽取、转换和加载的功能了。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值