背景
最近在做数据库数据迁移,选择TTL工具,发现dataX这个开源工具gitlab上很活跃,又是阿里出品,处于支持国货心态,依然选择了它,但是这个工具还有有很多遗留问题,比如我们要做oracle的update模式的时候它是不支持的,所以下载源码动手干吧。
源码分析修改
源码下载
github上下地址https://github.com/alibaba/DataX
源码目录结构
源码目录结构很清晰,一眼就能定位到想要的源码实现,我需要修改的是oralce相关所以就只关注oracle的目录
源码分析
-
代码很简单,一下就定位到OracleWriter这个类,并且定位到update模式不支持的判断
-
但是代码太简单了,没有找到使用的信息,肯定是依赖了其他的工具包,决定去安装目录看看lib依赖了哪些包,看到了plugin-rdbms-util让我比较怀疑,决定去看看源码
-
plugin-dbms-util源码也不是很复杂,耐心看util或者write相关的代码,终于让我找到了核心位置代码WriterUtil,判断if(mysql) else other的地方,所以决定这里再加个orcle的判断
源码修改
- 既然核心位置已经找到,就要想用什么方法修改,mysql因为有天然的replace方案,所以update模式实现起来很方便,但是oracle没有,只能用merge into xx using xx on when matched then xx when not matched then xx去实现了,所以相对麻烦许多,不过好在也可以实现
- WriterUtil的getWriteTemplate增加Object objKey参数,这个主要是传的是write那边配置的pk,根据什么字段做update,然后下面增加一个else if(oralce)的逻辑代码
else if(dataBaseType == DataBaseType.Oracle && writeMode.trim().toLowerCase().startsWith("update")){
if(!(columnHolders.contains("id") || columnHolders.contains("ID")) && null == objKey){
throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE,
String.format("您所配置的 writeMode:%s 错误. 因为DataX 对oracle的update必须要有id主键或者指定的splitKey. 请检查您的配置并作出修改.", writeMode));
}
List<String> pkList = new ArrayList<>();
if(null != objKey){
pkList.addAll(Arrays.stream(objKey.toString().trim().split(",")).map(String::toUpperCase).collect(Collectors.toList()));
}else {
pkList.add("ID");
}
String pkCondition = StringUtils.join(pkList.stream()
.map(data -> new StringBuilder()
.append("(t1.").append(data.trim()).append("=").append("t2.").append(data.trim()).append(")").toString())
.collect(Collectors.toList()),
",");
writeDataSqlTemplate = new StringBuilder()
.append("merge into %s t1 using ( select ")
.append(StringUtils.join(buildOracleUpdateColumn(columnHolders,valueHolders,writeMode), ","))
.append(" from dual ) t2 on ")
.append(pkCondition)
.append(" when matched then update set ")
.append(StringUtils.join(columnHolders.stream().filter(c -> !pkList.contains(c.toUpperCase())).map(data ->
new StringBuilder().append("t1.").append(data).append("=").append("t2.").append(data).toString()
).collect(Collectors.toList()), ","))
.append(" when not matched then insert (")
.append(StringUtils.join(columnHolders, ","))
.append(") values(")
.append(StringUtils.join(valueHolders, ","))
.append(") ")
.toString();
}
- 修改getWriteTemplate方法调用的地方CommonRdbmsWriter.calcWriteRecordSql()和OriginalConfPretreatmentUtil.dealWriteMode()
private void calcWriteRecordSql() {
if (!VALUE_HOLDER.equals(calcValueHolder(""))) {
List<String> valueHolders = new ArrayList<String>(columnNumber);
for (int i = 0; i < columns.size(); i++) {
String type = resultSetMetaData.getRight().get(i);
valueHolders.add(calcValueHolder(type));
}
boolean forceUseUpdate = false;
//ob10的处理
if (dataBaseType != null && dataBaseType == DataBaseType.MySql && OriginalConfPretreatmentUtil.isOB10(jdbcUrl)) {
forceUseUpdate = true;
}
INSERT_OR_REPLACE_TEMPLATE = WriterUtil.getWriteTemplate(columns, valueHolders, writeMode, dataBaseType, forceUseUpdate, null);
String[] tabs = formatTable(INSERT_OR_REPLACE_TEMPLATE);
writeRecordSql = String.format(INSERT_OR_REPLACE_TEMPLATE, tabs);
LOG.info("=====writeRecordSql==="+writeRecordSql);
}
}
public static void dealWriteMode(Configuration originalConfig, DataBaseType dataBaseType) {
List<String> columns = originalConfig.getList(Key.COLUMN, String.class);
String jdbcUrl = originalConfig.getString(String.format("%s[0].%s",
Constant.CONN_MARK, Key.JDBC_URL, String.class));
// 默认为:insert 方式
String writeMode = originalConfig.getString(Key.WRITE_MODE, "INSERT");
List<String> valueHolders = new ArrayList<String>(columns.size());
for (int i = 0; i < columns.size(); i++) {
valueHolders.add("?");
}
boolean forceUseUpdate = false;
//ob10的处理
if (dataBaseType == DataBaseType.MySql && isOB10(jdbcUrl)) {
forceUseUpdate = true;
}
Object objKey = originalConfig.get(com.alibaba.datax.plugin.rdbms.reader.Key.SPLIT_PK);
String writeDataSqlTemplate = WriterUtil.getWriteTemplate(columns, valueHolders, writeMode,dataBaseType, forceUseUpdate, objKey);
LOG.info("Write data [\n{}\n], which jdbcUrl like:[{}]", writeDataSqlTemplate, jdbcUrl);
originalConfig.set(Constant.INSERT_OR_REPLACE_TEMPLATE_MARK, writeDataSqlTemplate);
}
- 因为我们merge的sql里面拼接?值绑定把List valueHolders多拼接了一遍,所以需要在绑定值那里也需要修改,多绑定一遍CommonRdbmsWriter.doBatchInsert()和CommonRdbmsWriter.doOneInsert()我增加了int valNum = this.writeRecordSql.indexOf("?");判断拼接了多少个?,传到值绑定方法fillPreparedStatement里面去
protected void doBatchInsert(Connection connection, List<Record> buffer)
throws SQLException {
PreparedStatement preparedStatement = null;
try {
connection.setAutoCommit(false);
preparedStatement = connection
.prepareStatement(this.writeRecordSql);
int valNum = StringUtils.countMatches(this.writeRecordSql,"?");
for (Record record : buffer) {
preparedStatement = fillPreparedStatement(
preparedStatement, record,valNum);
preparedStatement.addBatch();
}
preparedStatement.executeBatch();
connection.commit();
} catch (SQLException e) {
LOG.warn("回滚此次写入, 采用每次写入一行方式提交. 因为:" + e.getMessage());
connection.rollback();
doOneInsert(connection, buffer);
} catch (Exception e) {
throw DataXException.asDataXException(
DBUtilErrorCode.WRITE_DATA_ERROR, e);
} finally {
DBUtil.closeDBResources(preparedStatement, null);
}
}
protected void doOneInsert(Connection connection, List<Record> buffer) {
PreparedStatement preparedStatement = null;
try {
connection.setAutoCommit(true);
preparedStatement = connection
.prepareStatement(this.writeRecordSql);
int valNum = this.writeRecordSql.indexOf("?");
for (Record record : buffer) {
try {
preparedStatement = fillPreparedStatement(
preparedStatement, record, valNum);
preparedStatement.execute();
} catch (SQLException e) {
LOG.debug(e.toString());
this.taskPluginCollector.collectDirtyRecord(record, e);
} finally {
// 最后不要忘了关闭 preparedStatement
preparedStatement.clearParameters();
}
}
} catch (Exception e) {
throw DataXException.asDataXException(
DBUtilErrorCode.WRITE_DATA_ERROR, e);
} finally {
DBUtil.closeDBResources(preparedStatement, null);
}
}
- 值绑定方法PreparedStatement 修改,这里修改写得不是很好,我做了判断如果?的个数大于列个数,并且是列的2倍再把值绑定一遍,写得有点死了,可是我拼接merge的时候确实是把?多拼接了一遍,这里没想到特别好的办法修改,暂时这样改了,如果你有更好的修改办法,欢迎指教
// 直接使用了两个类变量:columnNumber,resultSetMetaData
protected PreparedStatement fillPreparedStatement(PreparedStatement preparedStatement, Record record, int valNum)
throws SQLException {
for (int i = 0; i < this.columnNumber; i++) {
int columnSqltype = this.resultSetMetaData.getMiddle().get(i);
preparedStatement = fillPreparedStatementColumnType(preparedStatement, i, columnSqltype, record.getColumn(i));
//处理oracle merge in using select value的拼值判断
if(valNum > this.columnNumber && valNum/this.columnNumber == 2){
int m = this.columnNumber+i;
preparedStatement = fillPreparedStatementColumnType(preparedStatement, m, columnSqltype, record.getColumn(i));
}
}
return preparedStatement;
}
- 支持,源码修改完成了后面就是mvn clean install打包,然后替换安装包里面的plugin-rdbms-util.jar包
测试应用
jar包替换好了后,就是写json脚本执行更新数据任务了,注意:因为oralce的update不能友好的支持clob字段更新,所有这里是不支持clob字段的,如果你的表字段有clob字段,最好不要用,尽量把要更新的字段列出来。*