Hudi：问题总结（2）Flink-1.13.1消费kafka并插入hudi

Joseph25

已于 2023-03-20 11:31:13 修改

阅读量1.1k

点赞数

分类专栏： hudi 文章标签： hudi

于 2023-03-10 15:36:33 首次发布

本文链接：https://blog.csdn.net/Joseph25/article/details/129443193

版权

hudi 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

问题一：java.lang.ClassNotFoundException: com.google.protobuf.MessageOrBuilder)

解决：字面意思，没找到类，将protobuf-java-3.2.0-jar包放到fink/lib/下

如果报commons-cli相关的错，就将commons-cli-1.4.jar放到flink/lib/下

问题二：Size of the state is larger than the maximum permitted memory-backed state. Size=5269207 , maxSize=5242880

解决：这是没有开启checkpoint状态后端，导致都将checkpoint放在内存里了，开启checkpoint就行。env.setStateBackend(new FsStateBackend("hdfs:///user/xx/flink_checkpoint"));

问题三：The heartbeat of TaskManager with id container ....... timed out

解决：修改flink-conf.yml里的参数

akka.ask.timeout: 100s

web.timeout: 100000

heartbeat.timeout: 500000

问题四：yment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster

解决：字面意思，超过60s没分到资源。要么等一会让yarn自动分配，要是长时间启动不起来检查下yarn集群

问题五：项目正常运行，hudi长时间没有数据

解决：flink-shaded-hadoop的包版本不匹配。hudi0.9.0用2.75-10.0的，hudi0.10的用2.83-10.0

问题六：

Failed to rollback hdfs://ns1/hudi/xx表 commits 20230310104805769

Cannot use marker based rollback strategy on completed instant:[20230310104805769__deltacommit__COMPLETED]

解决：hdfs dfs -rm -f /hudi/xx/.hoodie/20230310104805769*。然后重启就没问题了

问题七：java.lang.ClassCastException:org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

解决：hudi加ts时用的timestamp类型，但是在parquet中timestamp类型是用int96存储的，如果在执行hive建表时将表字段定义为timestamp类型，那么在查询时会出现转换异常错误，若将hive表字段变为bigint，查询结果回是时间的长整型表示形式：长度为10位，即表示的是秒数，从1970年1月1日开始的。这种长整型格式在使用时还需要进行数据格式转换，才能转换成“yyyyMMdd”类型，所以我们采用的策略是：将timestamp类型从数据源头抽取时就转换为string类型