Spark问题13之Total size of serialized results of 30 tasks (2.0 GB) is bigger than spark.driver.maxResul

更多代码请见:https://github.com/xubo245/SparkLearning

Spark生态之Alluxio学习 版本:alluxio(tachyon) 0.7.1,spark-1.5.2,hadoop-2.6.0

1.问题描述

当使用cs-bwamem输出文件到local的sam时,文件过大,出现问题。driver的默认maxResultSize不够,报错

2.运行记录:

hadoop@Master:~/disk2/xubo/project/alignment/cs-bwamem$ ./csbwamemAlignP1Test10sam.sh > csbwamemAlignP1Test10samtime201702281803.txt
[Stage 2:========================>                               (28 + 18) / 64]17/02/28 18:19:07 ERROR TaskSetManager: Total size of serialized results of 30 tasks (2.0 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:08 ERROR TaskSetManager: Total size of serialized results of 31 tasks (2.1 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:08 ERROR TaskSetManager: Total size of serialized results of 32 tasks (2.2 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:30 ERROR TaskSetManager: Total size of serialized results of 33 tasks (2.2 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:30 ERROR TaskSetManager: Total size of serialized results of 34 tasks (2.3 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
Exception in thread "main" 17/02/28 18:19:31 ERROR TaskSetManager: Total size of serialized results of 35 tasks (2.4 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 30 tasks (2.0 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
		at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
		at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
		at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
		at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
		at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
		at scala.Option.foreach(Option.scala:236)
		at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
		at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
		at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
		at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
		at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:909)
		at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
		at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
		at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
		at org.apache.spark.rdd.RDD.collect(RDD.scala:908)
		at cs.ucla.edu.bwaspark.FastMap$.memPairEndMapping(FastMap.scala:397)
		at cs.ucla.edu.bwaspark.FastMap$.memMain(FastMap.scala:144)
		at cs.ucla.edu.bwaspark.BWAMEMSpark$.main(BWAMEMSpark.scala:320)
		at cs.ucla.edu.bwaspark.BWAMEMSpark.main(BWAMEMSpark.scala)
		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:606)
		at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/28 18:19:31 ERROR TaskSetManager: Total size of serialized results of 36 tasks (2.4 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:32 ERROR TaskSetManager: Total size of serialized results of 37 tasks (2.5 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:32 ERROR TaskSetManager: Total size of serialized results of 38 tasks (2.6 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:32 ERROR TaskSetManager: Total size of serialized results of 39 tasks (2.6 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:33 ERROR TaskSetManager: Total size of serialized results of 40 tasks (2.7 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:34 ERROR TaskSetManager: Total size of serialized results of 41 tasks (2.8 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
17/02/28 18:19:36 ERROR TaskSetManager: Total size of serialized results of 42 tasks (2.9 GB) is bigger than spark.driver.maxResultSize (2047.0 MB)
[Stage 1:>                                                        (0 + 16) / 64]^CException in thread "main" org.apache.spark.SparkException: Job cancelled because SparkContext was shut down
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:703)
		at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
    at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:702)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1514)
    at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1438)
    at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1724)
		at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
		at org.apache.spark.SparkContext.stop(SparkContext.scala:1723)
		at org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:587)
    at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
		at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
		at scala.util.Try$.apply(Try.scala:161)
		at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
		at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
    at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:909)
		at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
		at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
		at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
		at org.apache.spark.rdd.RDD.collect(RDD.scala:908)
		at cs.ucla.edu.bwaspark.FastMap$.memPairEndMapping(FastMap.scala:303)
		at cs.ucla.edu.bwaspark.FastMap$.memMain(FastMap.scala:144)
		at cs.ucla.edu.bwaspark.BWAMEMSpark$.main(BWAMEMSpark.scala:320)
		at cs.ucla.edu.bwaspark.BWAMEMSpark.main(BWAMEMSpark.scala)
		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:606)
		at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

参考

【1】http://spark.apache.org/docs/1.5.2/programming-guide.html
【2】https://github.com/xubo245/SparkLearning
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值