spark版本:spark-2.4.5-bin-hadoop2.7.tgz
hadoop版本:hadoop-2.7.3.tar.gz
使用spark-submit将spark自带的一个example提交到yarn上执行,命令如下:
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
/opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
执行报错如下:
Application application_1591690063321_0006 failed 2 times due to AM Container for appattempt_1591690063321_0006_000002 exited with exitCode: -103
For more detailed output, check application tracking page:http://single:8088/cluster/app/application_1591690063321_0006Then, click on links to logs of each attempt.
Diagnostics: Container [pid=5335,containerID=container_1591690063321_0006_02_000001] is running beyond virtual memory limits. Current usage: 164.3 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1591690063321_0006_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 5340 5335 5335 5335 (java) 416 15 2347429888 41765 /opt/jdk/bin/java -server -Xmx512m -Djava.io.tmpdir=/opt/hadoop/data/nm-local-dir/usercache/root/appcache/application_1591690063321_0006/container_1591690063321_0006_02_000001/tmp -Dspark.yarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1591690063321_0006/container_1591690063321_0006_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg single:43289 --properties-file /opt/hadoop/data/nm-local-dir/usercache/root/appcache/application_1591690063321_0006/container_1591690063321_0006_02_000001/__spark_conf__/__spark_conf__.properties
|- 5335 5334 5335 5335 (bash) 1 0 115851264 304 /bin/bash -c /opt/jdk/bin/java -server -Xmx512m -Djava.io.tmpdir=/opt/hadoop/data/nm-local-dir/usercache/root/appcache/application_1591690063321_0006/container_1591690063321_0006_02_000001/tmp -Dspark.yarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1591690063321_0006/container_1591690063321_0006_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 'single:43289' --properties-file /opt/hadoop/data/nm-local-dir/usercache/root/appcache/application_1591690063321_0006/container_1591690063321_0006_02_000001/__spark_conf__/__spark_conf__.properties 1> /opt/hadoop/logs/userlogs/application_1591690063321_0006/container_1591690063321_0006_02_000001/stdout 2> /opt/hadoop/logs/userlogs/application_1591690063321_0006/container_1591690063321_0006_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
核心问题:
Diagnostics: Container [pid=5335,containerID=container_1591690063321_0006_02_000001] is running beyond virtual memory limits. Current usage: 164.3 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container.
从错误来看,申请到2.1G虚拟内存,实际使用2.3G虚拟内存,导致yarn将容器杀死。
解决方法:
扩大虚拟内存,将hadoop的yarn-site.xml配置属性yarn.nodemanager.vmem-pmem-ratio的值改大些,例如:
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
</property>
该值默认是2.1,表示虚拟内存比率,是物理内存的倍数,报错能看出物理内存是1G,2.1倍,所以是2.1G,实际使用2.3G虚拟内存,现在改为4倍,也就是有4G虚拟内存。
yarn-site.xml的其他重要配置:
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
说明:单个容器可申请的最小与最大内存,应用在运行申请内存时不能超过最大值,小于最小值则分配最小值,从这个角度看,最小值有点想操作系统中的页。最小值还有另外一种用途,计算一个节点的最大container数目注:这两个值一经设定不能动态改变(此处所说的动态改变是指应用运行时)。
默认值:1024/8192
yarn.scheduler.minimum-allocation-vcores
yarn.scheduler.maximum-allocation-vcores
参数解释:单个可申请的最小/最大虚拟CPU个数。比如设置为1和4,则运行MapRedce作业时,每个Task最少可申请1个虚拟CPU,最多可申请4个虚拟CPU。
默认值:1/32
yarn.nodemanager.resource.memory-mb
yarn.nodemanager.vmem-pmem-ratio
说明:每个节点可用的最大内存,RM中的两个值不应该超过此值。此数值可以用于计算container最大数目,即:用此值除以RM中的最小容器内存。虚拟内存率,是占task所用内存的百分比,默认值为2.1倍;注意:第一个参数是不可修改的,一旦设置,整个运行过程中不可动态修改,且该值的默认大小是8G,即使计算机内存不足8G也会按着8G内存来使用。
默认值:8G /2.1
yarn.nodemanager.resource.cpu-vcores
参数解释:NodeManager总的可用虚拟CPU个数。
默认值:8