一.现象
- 配置seatunnel从mysql数据同步到doris数据库,出现了某些字段数据为NULL或格式错误的问题。
- jdbc连接mysql,通过sql查出来的数据是正常返回的
- 执行seatunnel脚本结果报错:
Caused by: org.apache.seatunnel.connectors.doris.exception.DorisConnectorException: ErrorCode:[COMMON-10], ErrorDescription:[Flush data operation that in sink connector failed] - Failed to flush data to Doris.
errCode = 2, detailMessage = Column has no default value. column: school_code
{"TxnId":12346,"Label":"55f7b756-cea9-48dc-8e40-410dea3daabb","TwoPhaseCommit":"false","Status":"Fail","Message":"errCode = 2, detailMessage = Column has no default value. column: school_code","NumberTotalRows":0,"NumberLoadedRows":0,"NumberFilteredRows":0,"NumberUnselectedRows":0,"LoadBytes":0,"LoadTimeMs":0,"BeginTxnTimeMs":0,"StreamLoadPutTimeMs":0,"ReadDataTimeMs":0,"WriteDataTimeMs":0,"CommitAndPublishTimeMs":0}
at org.apache.seatunnel.connectors.doris.client.DorisStreamLoadVisitor.doStreamLoad(DorisStreamLoadVisitor.java:95)
at org.apache.seatunnel.connectors.doris.client.DorisSinkManager.flush(DorisSinkManager.java:109)
... 16 more
以下是我出错的配置
seatunnel config:
env {
execution.parallelism = 1
job.mode = "BATCH"
}
source {
#获取学校信息
Jdbc {
url = "jdbc:mysql://192.168.1.172/test?serverTimezone=GMT%2b8"
driver = "com.mysql.cj.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "root"
result_table_name = dws_school
query = "SELECT business_code AS school_code, school_name FROM basic_school WHERE del_flag = 0"
}
}
transform {
}
sink {
doris {
source_table_name = dws_school
nodeUrls = ["192.168.1.173:8030"]
username = "root"
password = "root"
database = "doris_table_name"
table = "dtn_school"
batch_max_rows = 100
sink.properties.format = "csv"
sink.properties.column_separator = ","
}
}
二.解决方案
据我分析应该还是seatunnel对doris的兼容性不足
可以看到 读取不到的字段都是sql里面起别名的字段
我们只需要把sql改成:
再包一层即可
修改后的配置:
env {
execution.parallelism = 1
job.mode = "BATCH"
}
source {
#获取学校信息
Jdbc {
url = "jdbc:mysql://192.168.1.172/edusys?serverTimezone=GMT%2b8"
driver = "com.mysql.cj.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "root"
result_table_name = dws_school
query = "SELECT * FROM (SELECT business_code as school_code,school_name from basic_school where del_flag = 0)t;"
}
}
transform {
}
sink {
doris {
sink {
doris {
source_table_name = dws_school
nodeUrls = ["192.168.1.173:8030"]
username = "root"
password = "root"
database = "doris_database"
table = "dd_school"
batch_max_rows = 100
sink.properties.format = "csv"
sink.properties.column_separator = ","
}
#Console {
#
# }
#jdbc {
# url = "jdbc:mysql://192.168.1.10/demo?serverTimezone=GMT%2b8"
# driver = "com.mysql.cj.jdbc.Driver"
# user = "root"
# password = "root"
# query = "INSERT INTO `table_name` VALUES (?,?, ?, ?, ?, ?);"
#}
# Redis {
# host = 192.168.1.10
# port = 6379
# key = age
# auth = "root"
# data_type = list
# }
}
三.其他
除此之外还有好多需要注意的点:
- 在seatunnel配置文件中,sql不能换行,需要在一行写,否则会报一些关于newline的错。
- jdbc读取到一个整数,比如说是10,seatunnel在sinkToDoris的时候可能会将结果解析成10.000,变成了一个小数,结果当然是赋值不进去导致报错,在sql中使用cast()函数对应的字段强转成对应类型即可
- 虽然官网说seatunnel可以在jdk11环境使用,但是公司环境实测11不行,换成了jdk8才可以