执行spark-submit后报错,nodemanager异常关闭

cdh集群之前出了一些异常,提交spark代码后,执行长期卡在一处不动,报错信息如下: 

ERROR cluster.YarnScheduler: Lost executor 12 on nw-data-3: Container marked as failed: container_1571923398993_0015_01_0000
13 on host: nw-data-3. Exit status: -100. Diagnostics: Container released on a *lost* node19/10/24 21:57:27 WARN scheduler.TaskSetManager: Lost task 44.0 in stage 0.0 (TID 44, nw-data-3, executor 12): ExecutorLostFailure (executor 1
2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1571923398993_0015_01_000013 on host: nw-data-3. Exit status: -100. Diagnostics: Container released on a *lost* node19/10/24 21:57:27 WARN server.TransportChannelHandler: Exception in connection from /ip:62032
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)

在nodemanager日志中有发现这种报错:

19/10/24 20:33:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/10/24 20:33:00 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cl
eanup failures: false19/10/24 20:33:00 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileO
utputCommitter19/10/24 20:33:01 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
19/10/24 20:33:01 INFO storage.DiskBlockManager: Shutdown hook called
19/10/24 20:33:01 INFO util.ShutdownHookManager: Shutdown hook called
19/10/24 20:33:01 INFO util.ShutdownHookManager: Deleting directory /data/yarn/nm/usercache/root/appcache/application_1571911737584_0039/spark
-7f495f70-9aef-4c15-bb75-1bbe480a437219/10/24 20:33:01 ERROR util.Utils: Aborting task
java.nio.channels.ClosedChannelException
at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1993)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:87)
at org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:392)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$e
xecuteTask$3.apply(FileFormatWriter.scala:269)at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$e
xecuteTask$3.apply(FileFormatWriter.scala:267)at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTa
sk(FileFormatWriter.scala:272)at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

百度这个有说数据倾斜的,有说是需要关闭excuter动态分配的,有的说内存不足的。

最终发现问题是磁盘空间不足(hdfs dfs -du -h /user/hive/warehouse),因为hdfs删除的文件默认1天后再从垃圾箱删除,导致垃圾箱中文件在之前的频繁表操作中残留了过大的磁盘空间占用。手动清空回收站后执行代码恢复正常(hdfs dfs -expunge),也可将hdfs垃圾自动清除时间调短(Filesystem Trash Checkpoint Interval
fs.trash.checkpoint.interval选项从默认1天调到10分钟)。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值