1.flinkcdc读取mysql报错
Caused by: java.lang.IllegalStateException: The connector is trying to read binlog starting at Struct{version=1.5.4.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1668071717703,db=,server_id=0,file=mysql-bin.096881,pos=53976934,row=0}, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed.
解决方案:
1.flinkcdc中的mysqlsource设置读取最新数据 .startupOptions(StartupOptions.latest());
可能 2.将mysql的binlog保存时间加长;
可能 3.mysql的默认连接时间是30min.
2.druid连接池的使用(一般配合线程池使用)
当需要用多线程和异步io的时候使用,平时可以直接使用jdbc连接;
3.flink的checkpoint在hdfs上只有元数据的问题
可能原因:
1.保存的数据太小,所以只有元数据一个文件,等保存的数据多了之后就显示正常了,不是报错
2.可能远程无法访问hdfs,在hdfs-site.xml配置文件增加
<!-- 通过公网IP访问阿里云上内网搭建的集群 -->
<property>
<description>only cofig in clients</description>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
并且把core-site.xml和hdfs-site.xml配置文件放到idea项目的resources文件夹里
4.idea中将类打包后可能出现多个类的checkpoint路径失效,都在同一个hdfs的路径里的问题
可能原因:
pom.xml里的maven的打包插件绑定了主类
5.flinkcdc读取postgresql数据库的报错:(需要账号权限,还有开启逻辑复制)
replication slot "flink" is active for PID 627067
解决方案:
1.postgresqlsource设置.slotName("flink_test")和.decodingPluginName("pgoutput")
可以查看的参数://复制槽视图
select * from pg_replication_slots;
//查看复制槽
select * from pg_stat_replication;
select * from pg_publication_tables;
//创建物理复制槽
SELECT * FROM pg_create_physical_replication_slot('test_slot');
//创建逻辑复制槽
select * from pg_create_logical_replication_slot('test_logical_slot_81_72','wal2json');
//删除复制槽
SELECT * FROM pg_drop_replication_slot('flink_test');
6.flinkcdc读取数据库类型为decimal类型时打印出来的是string类型
解决方案:
Map config = new HashMap();
config.put(JsonConverterConfig.DECIMAL_FORMAT_CONFIG, DecimalFormat.NUMERIC.name());
JsonDebeziumDeserializationSchema jdd = new JsonDebeziumDeserializationSchema(false, config);
7.flink读取数据库数字类型的字段为null,怎么把这个null存入数据库数字类型的表中
解决方案:
(推荐做法)1. if (data.getProject_id() == null){
statement.setNull(2, Types.NULL);
}else {
statement.setInt(2, data.getProject_id());
};
(不推荐做法)2.可以在map或process算子中if进行判断,判断为空,则jdbc写入的时候不写入这个字段.
将string类型数据写入mysql的datetime列?
解决方案: 可以通过jdbc的方式将string类型的datetime数据写入mysql表中datetime类型的列中
ps.setObject(9, obj.getString("add_date"));
ps.setString(9, obj.getString("add_date"));
if (obj.getString("add_date") == null || "0".equals(obj.getString("add_date"))){
ps.setString(9, "0000-00-00 00:00:00");
}else {
ps.setString(9, obj.getString("add_date"));
}
8.通过jdbc向phoenix插入数据,没报错,但是数据也没有写入问题
解决方案:
phoenixConnection.commit();
9.phoenix登录客户端报错
Error: org.apache.hadoop.hbase.DoNotRetryIOException: Unable to load configured region split policy 'org.apache.phoenix.schema.MetaDataSplitPolicy' for table 'SYSTEM.CATALOG' Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks
解决方案:
在hbase-site.xml配置文件加上:
<property>
<name>hbase.table.sanity.checks</name>
<value>false</value>
</property>
10.flinkjob的kafkasink数据写不出去
原因:hadoop,flink和kafka集群不在同一个集群(相同几台服务器)上,它们数据交互使用hostname,会通过hostname找不到对应服务器
解决方案:在flink(hadoop)集群的/etc/hosts文件下加上其他服务器的ip映射
11.flinkjob启动报错classloader.xxx
报错原因:
flink1.13.x的bug,但不影响程序运行
解决方案:
在flink-conf.yaml配置文件添加classloader.check-leaked-classloader: false
12.flinkjob在本地idea正常,在服务器上运行就中文乱码
解决方案:
(推荐做法)1.在flink-conf.yaml配置文件添加env.java.opts: "-Dfile.encoding=UTF-8"
2.在启动flink任务时加上参数就可以解决:-yD env.java.opts="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"
3.修改yarn配置文件https://www.freesion.com/article/86431040769/
13.flinkjob(有用到hbase框架)启动报错
报错信息:
Exception in thread “Thread-2” java.lang.RuntimeException: hbase-default.xml file seems to be for an older version of HBase (null), this version is 2.0.0
解决方案:
在hbase-site.xml配置文件添加:
<property>
<name>hbase.defaults.for.version.skip</name>
<value>true</value>
<description>
Set to true to skip the 'hbase.defaults.for.version' check.
Setting this to true can be useful in contexts other than
the other side of a maven generation; i.e. running in an
ide. You'll want to set this boolean to true to avoid
seeing the RuntimException complaint: "hbase-default.xml file
seems to be for and old version of HBase (@@@VERSION@@@), this
version is X.X.X-SNAPSHOT"
</description>
</property>
14.flink本地idea连接starrocks超时,将代码打包到云服务器上运行正常
原因:
starrocks拒接远程写入功能
解决方案:
在fe.conf配置文件开启远程写入功能
# 是否启用远程写入,0-不启用,1-启用,默认为0
remote_load_enable=1
# 是否对写入数据进行签名验证,0-不校验,1-校验,默认为0
remote_load_verify_hash=0
# 是否启用远程写入
remote_query_executor_enable=true
之后在代码配置url:
//本地idea:18040(默认8040)
// public static String STARROCKS_LOAD_URL = "10.206.65.215:18040;10.206.66.52:18040;10.206.64.55:18040";
//云服务器:18030(默认8030)
public static String STARROCKS_LOAD_URL = "10.206.65.215:18030;10.206.66.52:18030;10.206.64.55:18030";
15.将starrocks作为数仓,使用flink将数据写入starrocks的ods表时,一行数据如果太多列为null,则报错写不进去
tips: starrocks建立动态分区表后,不会立即生成对应的分区,时间不定,如果对应分区还未生成就想表中插入数据则会报错
报错信息:
ERROR com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor - Stream Load response:
{“Status”:“Fail”,“BeginTxnTimeMs”:101,“Message”:“too many filtered rows”,“NumberUnselectedRows”:0,“CommitAndPublishTimeMs”:0,“Label”:“061f0edd-7a68-445f-a4d5-079aa79d8d23”,“LoadBytes”:245396,“StreamLoadPlanTimeMs”:102,“NumberTotalRows”:984,“WriteDataTimeMs”:265,“TxnId”:13787,“LoadTimeMs”:469,“ErrorURL”:“http://172.30.16.26:18040/api/_load_error_log?file=error_log_5a4aad49f7900ad7_9dec92739f8027b7",“ReadDataTimeMs”:0,“NumberLoadedRows”:0,"NumberFilteredRows”:984}
原因:
starrocks默认开启strict mode,会将数据过滤掉
解决方案:
1)在starrocks建表ods表时,将before_xx列设置为 DEFAULT NULL,这样r或者c类型的before_xx列的值为null,u类型的before_xx列分别赋值before的对应数值;
2)ods表的那些before_xx列进行赋值,分别赋值为after的对应数值;--不推荐做法
16.flinkCDC读取mysql的Date类型是,获取的值是距离1970-01-01的天数
解决方案:
自己取值处理
if (after.getLong("entryDate") != null){
after.put("entryDate",ParseDateTime.longToDate(after.getLong("entryDate")*24*3600*1000));
}
17.flinkCDC将时间类型Date和DateTime类型数据写入starrocks时为null
解决方案:
将数据转为string类型写入starrocks
after.put("last_login",ParseDateTime.longToDateTime(after.getLong("last_login")-8*3600*1000));
18.flinkCDC读取mysql时间类型数据时,自定义处理类,但是(有个bug:如果同步历史数据的timestamp,则会+8小时,如果是同步binlog数据的timestamp则没有问题)
MySqlDateTimeConverter
19.有些系统的shell脚本不能识别飘号
shell脚本的sql语句中有飘号,脚本不能识别,只能将飘号删了
20.flink的一个流有多种数据时,用侧输出流的方式处理
解决方案:
用process算子,里面进行判断,将数据分为主流(out.collect(value);)和侧输出流(ctx.output(new OutputTag<HzdxPojo>("cycle_mark20yyMM") {}, value);)将数据扔出,分为不同的流
21.flume支持实时监控es,hbase,doris等框架
通过chatGPT的回答知道了,还未进行实践;doris可以开启binlog
22.通过jdbc的方式查询mysql时,如果列的类型是tinyint,返回的结果会变成BIT类型,既true或者false,如果该列初始值为0,1,-1等,则结果会不一致
解决方案:
String columnTypeName = metaData.getColumnTypeName(i);
if ("BIT".equals(columnTypeName)){
String columnName = metaData.getColumnLabel(i);
int v = resultSet.getInt(i);
BeanUtils.setProperty(t, columnName, v);
}else {
String columnName = metaData.getColumnLabel(i);
Object v = resultSet.getObject(i);
BeanUtils.setProperty(t, columnName, v);
}
23.flinkjob中使用Mybatis时,有时间类型数据时会有异常
解决方案:
mybatis_config.xml添加
<typeHandlers>
<typeHandler handler="com.lhjsdt.flink.utils.DateTimeTypeHandler"/>
</typeHandlers>
DateTimeTypeHandler
import io.debezium.spi.converter.CustomConverter;
import io.debezium.spi.converter.RelationalColumn;
import org.apache.kafka.connect.data.SchemaBuilder;
import java.time.*;
import java.time.format.DateTimeFormatter;
import java.util.Properties;
/**
* flinkcdc处理mysql日期字段时区/格式处理
* 有个bug:如果同步历史数据的timestamp,则会+8小时,如果是同步binlog数据的timestamp则没有问题
*/
public class MySqlDateTimeConverter implements CustomConverter<SchemaBuilder, RelationalColumn> {
private DateTimeFormatter dateFormatter = DateTimeFormatter.ISO_DATE;
private DateTimeFormatter timeFormatter = DateTimeFormatter.ISO_TIME;
private DateTimeFormatter datetimeFormatter = DateTimeFormatter.ISO_DATE_TIME;
private DateTimeFormatter timestampFormatter = DateTimeFormatter.ISO_DATE_TIME;
private ZoneId timestampZoneId = ZoneId.systemDefault();
@Override
public void configure(Properties props) {
}
@Override
public void converterFor(RelationalColumn column, ConverterRegistration<SchemaBuilder> registration) {
String sqlType = column.typeName().toUpperCase();
SchemaBuilder schemaBuilder = null;
Converter converter = null;
if ("DATE".equals(sqlType)) {
schemaBuilder = SchemaBuilder.string().optional().name("com.darcytech.debezium.date.string");
converter = this::convertDate;
}
if ("TIME".equals(sqlType)) {
schemaBuilder = SchemaBuilder.string().optional().name("com.darcytech.debezium.time.string");
converter = this::convertTime;
}
if ("DATETIME".equals(sqlType)) {
schemaBuilder = SchemaBuilder.string().optional().name("com.darcytech.debezium.datetime.string");
converter = this::convertDateTime;
}
if ("TIMESTAMP".equals(sqlType)) {
schemaBuilder = SchemaBuilder.string().optional().name("com.darcytech.debezium.timestamp.string");
converter = this::convertTimestamp;
}
if (schemaBuilder != null) {
registration.register(schemaBuilder, converter);
}
}
private String convertDate(Object input) {
if (input == null) {
return null;
}
if (input instanceof LocalDate) {
return dateFormatter.format((LocalDate) input);
}
if (input instanceof Integer) {
LocalDate date = LocalDate.ofEpochDay((Integer) input);
return dateFormatter.format(date);
}
return String.valueOf(input);
}
private String convertTime(Object input) {
if (input == null) {
return null;
}
if (input instanceof Duration) {
Duration duration = (Duration) input;
long seconds = duration.getSeconds();
int nano = duration.getNano();
LocalTime time = LocalTime.ofSecondOfDay(seconds).withNano(nano);
return timeFormatter.format(time);
}
return String.valueOf(input);
}
private String convertDateTime(Object input) {
if (input == null) {
return null;
}
if (input instanceof LocalDateTime) {
return datetimeFormatter.format((LocalDateTime) input).replaceAll("T", " ");
}
return String.valueOf(input);
}
private String convertTimestamp(Object input) {
if (input == null) {
return null;
}
if (input instanceof ZonedDateTime) {
// mysql的timestamp会转成UTC存储,这里的zonedDatetime都是UTC时间
ZonedDateTime zonedDateTime = (ZonedDateTime) input;
LocalDateTime localDateTime = zonedDateTime.withZoneSameInstant(timestampZoneId).toLocalDateTime();
return timestampFormatter.format(localDateTime).replaceAll("T", " ");
}
return String.valueOf(input);
}
}
import org.apache.flink.table.shaded.org.joda.time.DateTime;
import org.apache.ibatis.type.BaseTypeHandler;
import org.apache.ibatis.type.JdbcType;
import java.sql.CallableStatement;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
public class DateTimeTypeHandler extends BaseTypeHandler<DateTime> {
@Override
public void setNonNullParameter(PreparedStatement ps, int i, DateTime parameter, JdbcType jdbcType)
throws SQLException {
ps.setTimestamp(i, new java.sql.Timestamp(parameter.getMillis()));
}
@Override
public DateTime getNullableResult(ResultSet rs, String columnName) throws SQLException {
java.sql.Timestamp timestamp = rs.getTimestamp(columnName);
if (timestamp != null) {
return new DateTime(timestamp.getTime());
}
return null;
}
@Override
public DateTime getNullableResult(ResultSet rs, int columnIndex) throws SQLException {
java.sql.Timestamp timestamp = rs.getTimestamp(columnIndex);
if (timestamp != null) {
return new DateTime(timestamp.getTime());
}
return null;
}
@Override
public DateTime getNullableResult(CallableStatement cs, int columnIndex) throws SQLException {
java.sql.Timestamp timestamp = cs.getTimestamp(columnIndex);
if (timestamp != null) {
return new DateTime(timestamp.getTime());
}
return null;
}
}