Java项目集成Kettle实现简易抽取加载功能

 

 

本项目基于springboot来实现。

1.首先导入kettle相关jar包

将kettle安装目录中data-integration\lib中的kettle-core.jar、kettle-engine.jar、metastore.jar导入工程(可以使用maven命令打包到maven仓库来实现)。当然还需要导入使用这三个jar包其他依赖。

完整依赖如下:

        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-vfs2</artifactId>
            <version>2.3</version>
        </dependency>
        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>23.0</version>
        </dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<version>5.1.47</version><!--$NO-MVN-MAN-VER$-->
		</dependency>
		<dependency>
			<groupId>org.pentaho</groupId>
			<artifactId>kettle-core</artifactId>
			<version>8.2.0.0-342</version>
		</dependency>
		<dependency>
			<groupId>org.pentaho</groupId>
			<artifactId>kettle-engine</artifactId>
			<version>8.2.0.0-342</version>
		</dependency>
		<dependency>
			<groupId>org.pentaho</groupId>
			<artifactId>metastore</artifactId>
			<version>8.2.0.0-342</version>
		</dependency>
		<dependency>
			<groupId>commons-io</groupId>
			<artifactId>commons-io</artifactId>
			<version>2.6</version>
		</dependency>
		<dependency>
			<groupId>commons-lang</groupId>
			<artifactId>commons-lang</artifactId>
			<version>2.6</version>
		</dependency>
		<dependency>
			<groupId>org.projectlombok</groupId>
			<artifactId>lombok</artifactId>
			<version>1.18.4</version><!--$NO-MVN-MAN-VER$-->
		</dependency>
		<dependency>
    		<groupId>org.owasp.encoder</groupId>
    		<artifactId>encoder</artifactId>
    		<version>1.2</version>
		</dependency>

2.数据源配置

配置如下:

@Configuration
@ConfigurationProperties("kettle")
public class DatabaseMetasProperties {
	private List<DatabaseMetaInfo> databaseMetaList;

	public List<DatabaseMetaInfo> getDatabaseMetaList() {
		return databaseMetaList;
	}

	public void setDatabaseMetaList(List<DatabaseMetaInfo> databaseMetaList) {
		this.databaseMetaList = databaseMetaList;
	}
	
}

数据源工具类:

@Component
public class DatabaseMetaTool {
	
	@Autowired
	private DatabaseMetasProperties databaseMetasProperties;

	public List<DatabaseMeta> getDatabaseMetas() {
		List<DatabaseMeta> databaseMetaList = new ArrayList<>();
		databaseMetasProperties.getDatabaseMetaList().forEach((databaseMetaInfo) -> {
			DatabaseMeta databaseMeta = new DatabaseMeta(databaseMetaInfo.getName(),
					databaseMetaInfo.getDatabaseType(),
					null,
					databaseMetaInfo.getHost(),
					databaseMetaInfo.getDBName(),
					databaseMetaInfo.getPort(),
					databaseMetaInfo.getUsername(),
					databaseMetaInfo.getPassword());
//			//设置字符集
			databaseMeta.addExtraOption(databaseMetaInfo.getDatabaseType(), "characterEncoding", "utf-8");
//			databaseMeta.addExtraOption(databaseMetaInfo.getDatabaseType(), "useUnicode", "true");
//			databaseMeta.addExtraOption(databaseMetaInfo.getDatabaseType(), "useSSL", "false");
			databaseMetaList.add(databaseMeta);
		});
		return databaseMetaList;
	}
}

 

对于kettle功能的使用都需要初始化。即实现如下代码:

KettleEnvironment.init();

功能调用完后

KettleEnvironment.shutdown();

3.测试数据源是否配置成功

代码如下:

@Test
	public void test() {
		try {
			KettleEnvironment.init();
			List<DatabaseMeta> databaseMetaList = databaseMetaTool.getDatabaseMetas();
			databaseMetaList.forEach(databaseMeta -> {
				System.out.println(databaseMeta.testConnection()); 
			});
		} catch (KettleException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} finally {
			KettleEnvironment.shutdown();
		}
	}

结果如下:

说明数据库配置成功。

当然DatabaseMeta下还有其他方法获取相关信息。

4.创建ktr对象

ktr对象由TransMeta.class来实现。但是实现TransMeta对象要先创建步骤来实现抽取加载等功能,然后设置好步骤顺序。

而创建步骤由StepMeta.class来完成,而创建StepMeta需实现StepMetaInterface接口。而此接口主要用来实现此步骤为表输入还是输出还是脚本等核心对象操作(通过kettle图像化界面可知核心对象有很多)。

TableInputMeta为表输入核心对象

TableOutputMeta为表输出核心对象

ExecSQLMeta为sql脚本核心对象

InsertUpdateMeta为插入/更新核心对象

kettle-engine.jar还有很多相关核心对象可以通过源码查看。

5.创建TableInputMeta对象

private TableInputMeta createTableInput() {
		TableInputMeta tableInputMeta = new TableInputMeta();
		tableInputMeta.setDatabaseMeta(databaseMetaTool.getDatabaseMetas().get(0));
		tableInputMeta.setSQL("SELECT id, create_by, create_time, update_by, update_time, address, avatar, description" + 
				", email, mobile, nick_name, password, sex, status, type, username, del_flag, department_id, street" + 
				", pass_strength FROM t_user");
		
		return tableInputMeta;
	}

这里数据源和sql是必须设置的,然后其他相关设置需调用其它方法,可通过源码查看其他方法

6.创建TableOutputMeta对象

private TableOutputMeta createTableOutput() {
		TableOutputMeta tableOutputMeta = new TableOutputMeta();
		//设置数据源
		tableOutputMeta.setDatabaseMeta(databaseMetaTool.getDatabaseMetas().get(1));
		
		tableOutputMeta.setFieldDatabase(new String[] {"id", "create_by", "create_time", "update_by", "update_time", 
				"address", "avatar", "description", "email", "mobile", "nick_name", "password", "sex", "status", "type",
				"username", "del_flag", "department_id", "street", "pass_strength"});
		
		tableOutputMeta.setFieldStream(new String[] {"id", "create_by", "create_time", "update_by", "update_time", 
				"address", "avatar", "description", "email", "mobile", "nick_name", "password", "sex", "status", "type",
				"username", "del_flag", "department_id", "street", "pass_strength"});
		tableOutputMeta.setTableName("t_user");
		
		tableOutputMeta.setCommitSize(1000);
		return tableOutputMeta;
	}

表输出需要设置流字段和目标表的字段一一对应,而且这里设置1000条提交一次

7.创建ExecSQLMeta对象

private ExecSQLMeta getExecSQL() {
		ExecSQLMeta execSQLMeta = new ExecSQLMeta();
		execSQLMeta.setDatabaseMeta(databaseMetaTool.getDatabaseMetas().get(1));
		execSQLMeta.setSql("INSERT into t_log(create_time) VALUES(SYSDATE())");
		execSQLMeta.setExecutedEachInputRow(false);
		execSQLMeta.setArguments(new String[] {});
//		execSQLMeta.setSingleStatement(false);
//		execSQLMeta.setVariableReplacementActive(false);
//		execSQLMeta.setQuoteString(false);
//		execSQLMeta.setParams(false);
		return execSQLMeta;
	}

sql脚本如果没有参数需要调用execSQLMeta.setArguments(new String[] {});不然会报空指针异常

8.生成ktr文件并执行

@Test
	public void contextLoads() {
		try {
            KettleEnvironment.init();
    		
    		//转换文件
    		TransMeta transMeta = new TransMeta();
    		transMeta.setDatabases(databaseMetaTool.getDatabaseMetas());
    		
    		StepMeta t_user_Input = new StepMeta("t_user_Input", createTableInput());
    		t_user_Input.setDraw(true);
    		t_user_Input.setLocation(100, 100);
    		//添加步骤
    		transMeta.addStep(t_user_Input);
    		
    		StepMeta t_user_Output = new StepMeta("t_user_Output", createTableOutput());
    		t_user_Output.setDraw(true);
    		t_user_Output.setLocation(250, 100);
    		transMeta.addStep(t_user_Output);
    		
    		StepMeta t_log = new StepMeta("t_log", getExecSQL());
    		t_log.setDraw(true);
    		t_log.setLocation(400, 100);
    		
    		System.out.println(t_log.getXML());
    		transMeta.addStep(t_log);
    		
    		//设置步骤顺序
    		transMeta.addTransHop(new TransHopMeta(t_user_Input, t_user_Output));
    		transMeta.addTransHop(new TransHopMeta(t_user_Output, t_log));
    		
    		String str = transMeta.getXML();
    		System.out.println(str);
    		File file = new File("D:/etlTest/test111.ktr");
    		FileUtils.writeStringToFile(file, str, "utf-8");
    		
    		Trans trans = new Trans(transMeta);
    		//执行转换
    		trans.execute(null);
    		trans.waitUntilFinished();
            KettleEnvironment.shutdown();
        } catch (KettleException e) {
            e.printStackTrace();
        } catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

生成的ktr文件可以通过kettle图像化界面打开查看效果。

执行结果如下:

其他功能正在探索中

  • 2
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值