1、问题描述
在运行spark任务时候报错如下:
17/11/03 10:27:54 ERROR ShuffleBlockFetcherIterator: Failed to get block(s) from 192.168.1.16:37205
java.io.IOException: Failed to connect to /192.168.1.16:37205
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:97)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 拒绝连接: /192.168.1.16:37205
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:640)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
... 2 more
2、分析&解决
1、spark安装时候没有预留内存
出现上述问题可能是一个原因可能是内存问题。在安装spark的时候没有对其使用内存进行限制,那么spark UI上面看到的内存就是机器的所有内存,即机器所有的内存都是可以用于跑spark任务,所有在提交spark任务的时候应该要给机器预留一下内存空间,我当时总共480G内存spark任务使用了450G,通过减少executor内存和数量最后问题解决了.
2、代码执行了shuffle操作,并且memory_only模式
持久化到内存时候发生溢出,那么有些数据就会被删除,在下一步执行时候就找不到了,所有就报错。解决方法是修改持久化级别,修改为memory and disk
3、其他原因
本人水平有限,只知道这两种,其他的也没有碰到过,暂时不清楚,若解决不了你的问题望见谅哈