文章目录
前文回顾:
《DataX 及 DataX-Web 安装使用详解》
《DataX 源码调试及打包》
《DataX-Web 源码调试及打包》
目前很多主流数据库支持 on duplicate key update(当主键冲突update数据)模式,DataX 也支持通过配置 writeMode 来配置写入模式。但是目前Oracle只支持insert配置项。
如何适配 Oracle 数据库 on duplicate key update 模式,今天大佬超就来带你进行二次开发。
1、原理
writeMode 的 insert、replace、update 配置项底层采用 INSERT INTOREPLACE INTO/INSERT INTO … ON DUPLICATE KEY UPDATE
语句:
其中 insert into 当主键/唯一性索引冲突时会写不进去冲突的行;后两者没有遇到主键/唯一性索引冲突时与 insert into 行为一致,遇到冲突时会用新行替换原有行所有字段。
oracle 不支持类似 MySQL的 REPLACE INTO
和 INSERT … ON DUPLICATE KEY UPDATE
,所以只支持 insert 配置项。要实现此功能,需要利用 Oracle 的 merge 语句,先来看下 merge 语法。
MERGE INTO [target-table] A USING [source-table sql] B
ON([conditional expression] and [...]...)
WHEN MATCHED THEN
[UPDATE sql]
WHEN NOT MATCHED THEN
[INSERT sql]
merge 语法其实就是存在就更新,不存在就插入。
示例:
MERGE INTO USERS A USING ( SELECT 18 AS "ID",'chaodev' AS "USER_ID" FROM DUAL ) TMP
ON (TMP."ID" = A."ID" AND TMP."USER_ID" = A."USER_ID" )
WHEN MATCHED THEN
UPDATE SET "USER_NAME" = '大佬超',"USER_PHONE" = '18000000000',"LASTUPDATETIME" = SYSDATE
WHEN NOT MATCHED THEN
INSERT ("ID","USER_ID","USER_NAME","USER_PHONE","LASTUPDATETIME") VALUES(18,'chaodev','大佬超','18000000000',SYSDATE)
所以最终实现原理就是:更改datax 的 oraclewriter 源码,通过 merge into 语句,实现 UPSERT 语义。
2、源码修改
涉及修改的类和方法如下:
oraclewriter包:
com.alibaba.datax.plugin.writer.oraclewriter.OracleWriter
:修改允许用户配置 writeMode。
plugin-dbms-util包:
com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil
:增加oralce的逻辑代码。
com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter
:CommonRdbmsWriter.Task类替换 startWriteWithConnection() 、doBatchInsert() 和fillPreparedStatement() 方法。
2.1 OracleWriter注释对writeMode的限制
2.2 WriterUtil,增加oracle逻辑
增加oracle的逻辑代码,如下
/**
* 新增oracle update模块
* @author 程序员大佬超
* @date 20221202
* @param columnHolders
* @param valueHolders
* @param writeMode
* @param dataBaseType
* @param forceUseUpdate
* @return
*/
public static String getWriteTemplate(List<String> columnHolders, List<String> valueHolders,
String writeMode, DataBaseType dataBaseType, boolean forceUseUpdate)
{
String mode = writeMode.trim().toLowerCase();
boolean isWriteModeLegal = mode.startsWith("insert") || mode.startsWith("replace") || mode.startsWith("update");
if (!isWriteModeLegal) {
throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE,
String.format("您所配置的 writeMode:%s 错误. 因为DataX 目前仅支持replace,update 或 insert 方式. 请检查您的配置并作出修改.", writeMode));
}
String writeDataSqlTemplate;
if (forceUseUpdate || mode.startsWith("update")) {
if (dataBaseType == DataBaseType.MySql || dataBaseType == DataBaseType.Tddl) {
writeDataSqlTemplate = new StringBuilder()
.append("INSERT INTO %s (").append(StringUtils.join(columnHolders, ","))
.append(") VALUES(").append(StringUtils.join(valueHolders, ","))
.append(")")
.append(onDuplicateKeyUpdateString(columnHolders))
.toString();
}
else if (dataBaseType == DataBaseType.Oracle) {
writeDataSqlTemplate = new StringBuilder().append(onMergeIntoDoString(writeMode, columnHolders, valueHolders)).append("INSERT (")
.append(StringUtils.join(columnHolders, ","))
.append(") VALUES(").append(StringUtils.join(valueHolders, ","))
.append(")").toString();
}
else {
throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE,
String.format("当前数据库不支持 writeMode:%s 模式.", writeMode));
}
}
else {
//这里是保护,如果其他错误的使用了update,需要更换为replace
if (writeMode.trim().toLowerCase().startsWith("update")) {
writeMode = "replace";
}
writeDataSqlTemplate = new StringBuilder().append(writeMode)
.append(" INTO %s (").append(StringUtils.join(columnHolders, ","))
.append(") VALUES(").append(StringUtils.join(valueHolders, ","))
.append(")").toString();
}
return writeDataSqlTemplate;
}
调用的方法新增如下,主要就是拼接merge语句:
public static String onMergeIntoDoString(String merge, List<String> columnHolders, List<String> valueHolders) {
String[] sArray = getStrings(merge);
StringBuilder sb = new StringBuilder();
sb.append("MERGE INTO %s A USING ( SELECT ");
boolean first = true;
boolean first1 = true;
StringBuilder str = new StringBuilder();
StringBuilder update = new StringBuilder();
for (String columnHolder : columnHolders) {
if (Arrays.asList(sArray).contains(columnHolder)) {
if (!first) {
sb.append(",");
str.append(" AND ");
} else {
first = false;
}
str.append("TMP.").append(columnHolder);
sb.append("?");
str.append(" = ");
sb.append(" AS ");
str.append("A.").append(columnHolder);
sb.append(columnHolder);
}
}
for (String columnHolder : columnHolders) {
if (!Arrays.asList(sArray).contains(columnHolder)) {
if (!first1) {
update.append(",");
} else {
first1 = false;
}
update.append(columnHolder);
update.append(" = ");
update.append("?");
}
}
sb.append(" FROM DUAL ) TMP ON (");
sb.append(str);
sb.append(" ) WHEN MATCHED THEN UPDATE SET ");
sb.append(update);
sb.append(" WHEN NOT MATCHED THEN ");
return sb.toString();
}
public static String[] getStrings(String merge) {
merge = merge.replace("update", "");
merge = merge.replace("(", "");
merge = merge.replace(")", "");
merge = merge.replace(" ", "");
return merge.split(",");
}
2.3 CommonRdbmsWriter.Task修改
修改 startWriteWithConnection() 方法
/**
* 更改适配oracle update
* @author 程序员大佬超
* @date 20221202
* @param recordReceiver
* @param taskPluginCollector
* @param connection
*/
public void startWriteWithConnection(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector, Connection connection)
{
this.taskPluginCollector = taskPluginCollector;
List<String> mergeColumns = new ArrayList<>();
if (this.dataBaseType == DataBaseType.Oracle && !"insert".equalsIgnoreCase(this.writeMode)) {
LOG.info("write oracle using {} mode", this.writeMode);
List<String> columnsOne = new ArrayList<>();
List<String> columnsTwo = new ArrayList<>();
String merge = this.writeMode;
String[] sArray = WriterUtil.getStrings(merge);
for (String s : this.columns) {
if (Arrays.asList(sArray).contains(s)) {
columnsOne.add(s);
}
}
for (String s : this.columns) {
if (!Arrays.asList(sArray).contains(s)) {
columnsTwo.add(s);
}
}
int i = 0;
for (String column : columnsOne) {
mergeColumns.add(i++, column);
}
for (String column : columnsTwo) {
mergeColumns.add(i++, column);
}
}
mergeColumns.addAll(this.columns);
// 用于写入数据的时候的类型根据目的表字段类型转换
this.resultSetMetaData = DBUtil.getColumnMetaData(connection,
this.table, StringUtils.join(mergeColumns, ","));
// 写数据库的SQL语句
calcWriteRecordSql();
List<Record> writeBuffer = new ArrayList<>(this.batchSize);
int bufferBytes = 0;
try {
Record record;
while ((record = recordReceiver.getFromReader()) != null) {
if (record.getColumnNumber() != this.columnNumber) {
// 源头读取字段列数与目的表字段写入列数不相等,直接报错
throw DataXException
.asDataXException(
DBUtilErrorCode.CONF_ERROR,
String.format(
"列配置信息有错误. 因为您配置的任务中,源头读取字段数:%s 与 目的表要写入的字段数:%s 不相等. 请检查您的配置并作出修改.",
record.getColumnNumber(),
this.columnNumber));
}
writeBuffer.add(record);
bufferBytes += record.getMemorySize();
if (writeBuffer.size() >= batchSize || bufferBytes >= batchByteSize) {
doBatchInsert(connection, writeBuffer);
writeBuffer.clear();
bufferBytes = 0;
}
}
if (!writeBuffer.isEmpty()) {
doBatchInsert(connection, writeBuffer);
writeBuffer.clear();
}
}
catch (Exception e) {
throw DataXException.asDataXException(
DBUtilErrorCode.WRITE_DATA_ERROR, e);
}
finally {
writeBuffer.clear();
DBUtil.closeDBResources(null, null, connection);
}
}
修改 doBatchInsert() 方法
/**
* 更改适配oracle update
* @author 程序员大佬超
* @date 20221202
* @param connection
* @param buffer
* @throws SQLException
*/
protected void doBatchInsert(Connection connection, List<Record> buffer)
throws SQLException
{
PreparedStatement preparedStatement = null;
try {
connection.setAutoCommit(false);
preparedStatement = connection
.prepareStatement(this.writeRecordSql);
if (this.dataBaseType == DataBaseType.Oracle && !"insert".equalsIgnoreCase(this.writeMode)) {
String merge = this.writeMode;
String[] sArray = WriterUtil.getStrings(merge);
for (Record record : buffer) {
List<Column> recordOne = new ArrayList<>();
for (int j = 0; j < this.columns.size(); j++) {
if (Arrays.asList(sArray).contains(this.columns.get(j))) {
recordOne.add(record.getColumn(j));
}
}
for (int j = 0; j < this.columns.size(); j++) {
if (!Arrays.asList(sArray).contains(this.columns.get(j))) {
recordOne.add(record.getColumn(j));
}
}
for (int j = 0; j < this.columns.size(); j++) {
recordOne.add(record.getColumn(j));
}
for (int j = 0; j < recordOne.size(); j++) {
record.setColumn(j, recordOne.get(j));
}
preparedStatement = fillPreparedStatement(
preparedStatement, record);
preparedStatement.addBatch();
}
}
else {
for (Record record : buffer) {
preparedStatement = fillPreparedStatement(
preparedStatement, record);
preparedStatement.addBatch();
}
}
preparedStatement.executeBatch();
connection.commit();
}
catch (SQLException e) {
LOG.warn("回滚此次写入, 采用每次写入一行方式提交. 因为: {}", e.getMessage());
connection.rollback();
doOneInsert(connection, buffer);
}
catch (Exception e) {
throw DataXException.asDataXException(
DBUtilErrorCode.WRITE_DATA_ERROR, e);
}
finally {
DBUtil.closeDBResources(preparedStatement, null);
}
}
修改 fillPreparedStatement() 方法
/**
* 更改适配oracle update
* @author 程序员大佬超
* @date 20221202
* @param preparedStatement
* @param record
* @return
* @throws SQLException
*/
protected PreparedStatement fillPreparedStatement(PreparedStatement preparedStatement, Record record)
throws SQLException
{
for (int i = 0; i < record.getColumnNumber(); i++) {
int columnSqltype = this.resultSetMetaData.getMiddle().get(i);
preparedStatement = fillPreparedStatementColumnType(preparedStatement, i,
columnSqltype, record.getColumn(i));
}
return preparedStatement;
}
2.4 测试
重新打包后,测试job
{
"job": {
"setting": {
"speed": {
"channel": 3,
"byte": 1048576
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "123456",
"column": [
"`id`",
"`user_id`",
"`user_password`",
"`user_name`",
"`user_phone`",
"`email`",
"`nick_name`",
"`head_url`",
"`sex`",
"`state`",
"`create_time`",
"`create_user`",
"`lastUpdateTime`"
],
"splitPk": "",
"connection": [
{
"table": [
"users"
],
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/im"
]
}
]
}
},
"writer": {
"name": "oraclewriter",
"parameter": {
"username": "yxc",
"password": "123456",
"column": [
"\"ID\"",
"\"USER_ID\"",
"\"USER_PASSWORD\"",
"\"USER_NAME\"",
"\"USER_PHONE\"",
"\"EMAIL\"",
"\"NICK_NAME\"",
"\"HEAD_URL\"",
"\"SEX\"",
"\"STATE\"",
"\"CREATE_TIME\"",
"\"CREATE_USER\"",
"\"LASTUPDATETIME\""
],
"writeMode": "update(\"ID\",\"USER_ID\")",
"connection": [
{
"table": [
"USERS"
],
"jdbcUrl": "jdbc:oracle:thin:@//192.168.157.142:1521/orcl"
}
]
}
}
}
]
}
}
注:writeMode的update括号里字段要加上\"
查看运行日志,可以看到已经正确拼接了MERGE语句:
任务执行成功。切换 insert 和 update 模式检验,无异常。
更多技术干货,请持续关注程序员大佬超。
原创不易,转载请注明出处。