原始文章链接:https://alphablacktan.github.io/bigdata/2018/08/13/Spark任务偶现Task卡住很长时间导致Stage整体耗时长/
问题现象
提交大量Spark任务,概率性出现个别Task卡住一段时间,进而导致Stage整体耗时开销异常。
问题分析
采样Job836
异常Stage2249 -> 卡住Task8:
对应Executor日志:
...
INFO | [Executor task launch worker-78] | Running task 8.0 in stage 2249.0 (TID 222920) | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
ERROR | [shuffle-client-1] | Connection is dead; please adjust spark.network.timeout if this is wrong | org.apache.spark.network.server.TransportChannelHandler.userEventTriggered(TransportChannelHandler.java:128)
ERROR | [shuffle-client-1] |