org.apache.spark.SparkException: Kryo serialization failed

在sparksql中显示的指定了mapjoin,导致广播的数据量太大,导致序列化超过指定大小。去除显示指定mapjoin

sql如下

with einfo as

    (

       select

           E6.EMP_NO    ,

           E6.TEAM_ID       ,

           E6.TEAM_NAME  

           from mids.sys_org_cnl_emp_info E6

           where E6.dt = '20190220' AND E6.EMP_NO is not null

    )

    SELECT /*+ MAPJOIN(E2) */  C.CUST_NO          AS CUST_NO,

           C.CUST_MANAGER     AS CUST_MANAGER,

           C.CUST_COMMENDER   AS CUST_COMMENDER,

           E2.TEAM_ID         AS CUST_COMMENDER_TEAM,

           E2.TEAM_NAME       AS CUST_COMMENDER_TEAM_NAME,

           C.PROMOTION_ROLE   AS PROMOTION_ROLE,

           ''    AS PROMOTION_ROLE_QY

      FROM ODS.CUST_ADVALLINFO21 C

      LEFT JOIN einfo E2

        ON C.CUST_COMMENDER = E2.EMP_NO

       AND E2.EMP_NO in ('V99','VM2101')

     WHERE  C.dt='20190220'

       and c.CUST_COMMENDER in ('V99','V101')

union all

    SELECT /*+ MAPJOIN(E2) */  C.CUST_NO          AS CUST_NO,

           C.CUST_MANAGER     AS CUST_MANAGER,

           C.CUST_COMMENDER   AS CUST_COMMENDER,

           E2.TEAM_ID         AS CUST_COMMENDER_TEAM,

           E2.TEAM_NAME       AS CUST_COMMENDER_TEAM_NAME,

           C.PROMOTION_ROLE   AS PROMOTION_ROLE,

           ''    AS PROMOTION_ROLE_QY

      FROM ODS.CUST_ADVALLINFO21 C

      LEFT JOIN einfo E2

        ON C.CUST_COMMENDER = E2.EMP_NO

      WHERE C.dt='20190220'

       AND C.CUST_COMMENDER <>'V99'

异常如下

19/02/21 21:24:29 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, CHDD183, executor 1): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 134217728. To avoid this, increase spark.kryoserializer.buffer.max value.
   at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:350)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 134217728
   at com.esotericsoftware.kryo.io.Output.require(Output.java:163)
   at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:246)
   at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:232)
   at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:54)
   at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:43)
   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
   at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:37)
   at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
   at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:366)
   at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:307)
   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
   at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:347)
   ... 4 more

问题定位

第二个union all里的mapjoin15824019行数据(3856m)。数据量太大不适合mapjoin。第二阶段mapjoin去掉不加序列化参数就自动跑过去了

第一阶段mapjoin2条数据

解决办法

1、增加序列化参数的大小(没有根本解决,如果这个广播数据10G甚至更大的呢)

2、去除第二阶段的mapjoin(根本解决办法)

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

zxl333

原创不容易

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值