spark提交任务java.nio.channels.ClosedChannelException

1.提交任务

./spark-submit --master "yarn" --driver-memory 1g --executor-memory 1g --class KeyCount /root/IdeaProjects/SparkApp/out/artifacts/SparkApp_jar/SparkApp.jar

报错如下:

17/08/25 14:47:03 ERROR client.TransportClient: Failed to send RPC 6159851572252707613 to /192.168.2.6:39986: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
17/08/25 14:47:03 ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 6159851572252707613 to /192.168.2.6:39986: java.nio.channels.ClosedChannelException
	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249)
	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
	at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
	at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException



2.因为spark on yarn,查看ResourceMangaer的log

2017-08-25 14:45:19,990 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:806)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:107)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:803)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:784)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
	at java.lang.Thread.run(Thread.java:745)


3. 查勘NodeMangaer日志

2017-08-25 14:47:03,147 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1503641152441_0014_02_000001 has processes older than 1 iteration running over the configured limit. Limit=2254857728, current usage = 2540118016
2017-08-25 14:47:03,147 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=14043,containerID=container_1503641152441_0014_02_000001] is running beyond virtual memory limits. Current usage: 360.4 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1503641152441_0014_02_000001 :
	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
	|- 14043 14041 14043 14043 (bash) 0 0 115847168 730 /bin/bash -c /usr/java/jdk1.8.0_73/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_000001/tmp -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1503641152441_0014/container_1503641152441_0014_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '192.168.2.6:45439' --properties-file /tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_000001/__spark_conf__/__spark_conf__.properties 1> /usr/local/hadoop/logs/userlogs/application_1503641152441_0014/container_1503641152441_0014_02_000001/stdout 2> /usr/local/hadoop/logs/userlogs/application_1503641152441_0014/container_1503641152441_0014_02_000001/stderr 
	|- 14047 14043 14043 14043 (java) 628 24 2424270848 91542 /usr/java/jdk1.8.0_73/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_000001/tmp -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1503641152441_0014/container_1503641152441_0014_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 192.168.2.6:45439 --properties-file /tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_000001/__spark_conf__/__spark_conf__.properties 

2017-08-25 14:47:03,148 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 14043
2017-08-25 14:47:03,148 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1503641152441_0014_02_000001 transitioned from RUNNING to KILLING
2017-08-25 14:47:03,148 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1503641152441_0014_02_000001
2017-08-25 14:47:03,152 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1503641152441_0014_02_000001 is : 143
2017-08-25 14:47:03,163 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1503641152441_0014_02_000001 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
2017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root	OPERATION=Container Finished - Killed	TARGET=ContainerImpl	RESULT=SUCCESS	APPID=application_1503641152441_0014	CONTAINERID=container_1503641152441_0014_02_000001
2017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1503641152441_0014_02_000001 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1503641152441_0014_02_000001 from application application_1503641152441_0014
2017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1503641152441_0014
2017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_000001
2017-08-25 14:47:03,196 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1503641152441_0014_000002 (auth:SIMPLE)
2017-08-25 14:47:03,207 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1503641152441_0014_02_000001
2017-08-25 14:47:03,207 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root	IP=192.168.2.6	OPERATION=Stop Container Request	TARGET=ContainerManageImpl	RESULT=SUCCESS	APPID=application_1503641152441_0014	CONTAINERID=container_1503641152441_0014_02_000001
2017-08-25 14:47:04,172 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1503641152441_0014_02_000003 is : 1
2017-08-25 14:47:04,172 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1503641152441_0014_02_000003 and exit code: 1
ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
	at org.apache.hadoop.util.Shell.run(Shell.java:479)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

很明显
Current usage: 360.4 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.
意思是说container使用的虚拟内存超过了设置的2.1G


那么,问题来了,这个虚拟内存的数量从那儿来的呢?

是从yarn-site.xml中配置计算来的,yarn.scheduler.minimum-allocation-mb  * yarn.nodemanager.vmem-pmem-ratio = 虚拟内存的总量,如果需要的虚拟内存总量超过这个计算所得的数值,就会出发 Killing container.

此处 我的yarn.scheduler.minimum-allocation-mb值没设置,默认为1G,yarn.nodemanager.vmem-pmem-ratio也没设置,默认为2.1,因此,就有了以上的日志,用了1g里的360M物理内存,用了2.1G里的2.4G虚拟内存。


然后修改yarn-site.xml如下几个配置

<property>
                <name>yarn.scheduler.maximum-allocation-mb</name>
                <value>9216</value>
                <discription>每个任务最多可用内存,单位MB,默认8182MB</discription>
        </property>
        <property>
                <name>yarn.scheduler.minimum-allocation-mb</name>
                <value>4000</value>
                <discription>每个任务最shao可用内存</discription>
        </property>
        <property>
                <name>yarn.nodemanager.vmem-pmem-ratio</name>
                <value>4.1</value>
        </property>

上边的报错消失,并且日志打印出如下内容:

2017-08-25 15:53:27,670 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 26478 for container-id container_1503646903552_0001_01_000001: 334.4 MB of 3.9 GB physical memory used; 2.4 GB of 16.0 GB virtual memory used


另外好多贴子的解决方法是关闭这个虚拟内存的检测,个人不太建议如此。

yarn-site.xml配置如下:

  1. <property>  
  2.     <name>yarn.nodemanager.vmem-check-enabled</name>  
  3.     <value>false</value>  
  4. </property> 



================

迷途小运维随笔

转载请注明出处

展开阅读全文
©️2020 CSDN 皮肤主题: 大白 设计师: CSDN官方博客 返回首页
实付0元
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值