Spark常见异常:Failed to get broadcast_32_piece0 of broadcast_32


类似于此异常:

org.apache.spark.SparkException: Failed to get broadcast_32_piece0 of broadcast_32
1
本人在此网站https://issues.apache.org/jira/browse/SPARK-5594找到了关于此异常的描述:

In my case, this issue was happening when spark context doesn’t close successfully. When spark context closes abruptly, the files in spark-local and spark-worker directories are left uncleaned. The next time any job is run, the broadcast exception occurs. I managed a workaround by redirecting spark-worker and spark-local outputs to specific folders and cleaning them up in case the spark context doesn’t close successfully.

大致意思是说:上一个程序sparkContect没有关闭,就打开了另一个。

解决方法:
1、删除spark-defaults.conf文件中spark.cleaner.ttl的配置。参考:

With the introduction of ContextCleaner, I think there’s no longer any reason for most users to enable the MetadataCleaner / spark.cleaner.ttl (except perhaps for super-long-lived Spark REPLs where you’re worried about orphaning RDDs or broadcast variables in your REPL history and having them never get cleaned up, although I think this is an uncommon use-case). I think that this property used to be relevant for Spark Streaming jobs, but I think that’s no longer the case since the latest Streaming docs have removed all mentions of spark.cleaner.ttl

2、看代码中是否重复创建了sparkContext和sparkSession;
另外删除或者重新设置spark.sql.warehouse.dir

Thank you for your awesome comment!!
It worked!!
Previously I was the getting the same issue as “Caused by: org.apache.spark.SparkException: Failed to get broadcast_2_piece0 of broadcast_2”
The problem was there were multiple contexts running
Why multiple contexts?: Spark session with the following command starts everything except Streaming Context and i did a mistake to start a new context for Streaming Context
i started the Streaming context with the create Spark Session 's Spark Context and there is no more problem
Here is the working code:
val warehouseLocation = “file:${system:user.dir}/spark-warehouse”
val spark = SparkSession.builder().appName(“somename”).config(“spark.sql.warehouse.dir”,warehouseLocation).getOrCreate()
val ssc = new StreamingContext(spark.sparkContext, Seconds(1)) —> This is streaming context using spark session’s spark context
I havent used this parameter: spark.cleaner.ttl

3、另外也有人谈到这是spark的版本bug,并在2.2.3, 2.3.2, 2.4.0:版本得到了修复:

There is a bug before 2.2.3/2.3.0
If you met “Failed to get broadcast” and the method call stack is from MapOutputTracker, then try to upgrade your spark. The bug is due to driver remove the broadcast but send the broadcast id to executor, method MapOutputTrackerMaster.getSerializedMapOutputStatuses . It has been fixed by 此网站https://issues.apache.org/jira/browse/SPARK-23243

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值