Spark_Spark_ERROR关于spark.sql.autoBroadcastJoinThreshold设置-CSDN博客

今天使用spark对一组大数据进行合并作join操作，一直都报下面的错：

Exception in thread “broadcast-exchange-0” java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes

想来想去，之前也跑过这类的任务，并且都没有类似的情况，从问题来分析说是内存溢出了，也就是说明广播内存不够用。但我不断设整任务的内存资源，无论是executor还是driver的内存都分配多一倍了，但是还是不起作用。于是回来看看自己配置的优化参数：

spark.config=

spark.reducer.maxSizeInFlight:64m,

spark.shuffle.file.buffer:128k,

spark.shuffle.io.maxRetries:10,

spark.shuffle.io.numConnectionsPerPeer:10,

spark.shuffle.io.retryWait:30s,

spark.rdd.compress:true,

spark.io.compression.codec:org.apache.spark.io.SnappyCompressionCodec,

spark.io.compression.snappy.blockSize:18k,

spark.serializer:org.apache.spark.serializer.KryoSerializer,

spark.sql.shuffle.partitions:300,

spark.default.parallelism:50,

spark.rpc.numRetries:3,

spark.rpc.retry.wait:4s,

spark.locality.wait.process:10,

spark.locality.wait.node:5,

spark.locality.wait.rack:3,

spark.speculation:true,

spark.speculation.multiplier:20,

spark.shuffle.consolidateFiles:true,

spark.sql.autoBroadcastJoinThreshold:209715200