SPARK本地启动失败

最新推荐文章于 2023-10-10 20:53:12 发布

老王的隔壁

最新推荐文章于 2023-10-10 20:53:12 发布

阅读量954

点赞数

分类专栏： spark 文章标签： spark

本文链接：https://blog.csdn.net/alvin_010/article/details/106562118

版权

spark 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

使用本地 kafka 获取流数据计算时，并输出计算结果到本地console出错

输出代码

StreamingQuery query = strRs.writeStream()
                .outputMode("update")
                .format("console")
                .trigger(Trigger.ProcessingTime(Duration.apply(batchDuration, TimeUnit.SECONDS)))
                .option("checkpointLocation", PropertiesUtils.get("spark.sql.streaming.checkpointLocation"))
                .start();

错误

Caused by: org.apache.spark.SparkException: Writing job aborted.

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost, executor driver): org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition mts_m_rt-27 could be determined

原因：

使用了 .option(“checkpointLocation”, PropertiesUtils.get(“spark.sql.streaming.checkpointLocation”))

分析

在使用本地kafka数据源前，使用了线上数据源测试、由于量太大计算不过来，切换成本地kafka
但是使用了checkpoint，线上kafka的partition 及 offset都已经被记录，重启程序会使用上一份checkpoint记录的offset开始。
但是本地kafka partition只有一个，导致程序启动按照不存在partition进行连接消费数据，最终导致超时。

老王的隔壁

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SPARK本地启动失败

使用本地 kafka 获取流数据计算时，并输出计算结果到本地console出错输出代码StreamingQuery query = strRs.writeStream() .outputMode("update") .format("console") .trigger(Trigger.ProcessingTime(Duration.apply(batchDuration, TimeUnit.SECONDS)
复制链接

扫一扫

专栏目录