问题:
最近现场反馈采用yarn-cluster方式提交spark application后,在提交节点机上依然会存在一个yarn的client进程不关闭,又由于spark application都是spark structured streaming程序(application常年累月的执行),最终导致spark application提交节点服务器资源被占满,当执行其他操作时,会出现以下错误:
[dx@my-linux-01 bin]$ yarn logs -applicationId application_15644802175503_0189 Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c000000, 702021632, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 702021632 bytes to committing reserved memory. # An error report file with more information is saved as: # /home/dx/myProj/appApp/bin/hs_err_pid53561.log [dx@my-linux-01 bin]$
现场对spark application提交节点进行分析发现占用进程主要是(yarn client集成占用):
[dx@my-linux-01 bin]$ top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 122236 dx 20 0 20.629g 1.347g 3520 S 0.3 2.1 7:02.42 java 122246 dx 20 0 20.629g 1.311g 3520 S 0.3 2.0 7:03.42 java 122236 dx 20 0 20.629g 1.288g 3520 S 0.3 2.2 7:05.83 java 122346 dx 20 0 20.629g 1.344g 3520 S 0.3 2.1 7:10.42 java 121246 dx 20 0 20.629g 1.343g 3520 S 0.3 2.3 7:01.42 java 122346 dx 20 0 20.629g 1.341g 3520 S 0.3 2.4 7:03.39 java 112246 dx 20 0 20.629g 1.344g 3520 S 0.3 2.0 7:02.42 java ............ 112260 dx 20 0 20.629g 1.344g 3520 S 0.3 2.0 7:02.02 java 112260 dx 20 0 113116 200 0 S 0.0 0.0 0:00.00 sh ............
Yarn提交Spark任务分析:
yarn方式提交spark application包含两种:
1)yarn-client(spark-submit --master yarn --deploy-mode client ...):
这种方式spark提交application任务之后,driver运行在提交服务器节点,且driver运行yarn的client进程中,因此如果关闭了提交服务器节点上client进程会导致driver被关闭,进而导致application被关闭。
2)yarn-cluster(spark-submit --master yarn --deploy-mode cluster):
这种方式spark提交application任务之后,driver运行yarn分配container内,container内分配一个AM(Application Master)进程,SparkContext(driver)运行在该AM内,在yarn提交时,在提交节点上也会启动一个yarn的client进程,默认yarn-client方式提交完application后会等待任务结束(failed,finished等),否则会一直运行。
解决方案:
yarn.client的参数
spark.yarn.submit.waitAppCompletion
如果设置这个参数为true 的话,client将会一直运行并且报告application的状态直到application退出(无论何种原因);
如果设置这个参数为false的话,client的进程将会在application提交后退出。
在spark-submit 参数添加参数
./bin/spark-submit.sh \ --master yarn \ --deploy-mode cluster \ --conf spark.yarn.submit.waitAppCompletion=false ....
对应yarn.client类中代码位置:
/** * Submit an application to the ResourceManager. * If set spark.yarn.submit.waitAppCompletion to true, it will stay alive * reporting the application's status until the application has exited for any reason. * Otherwise, the client process will exit after submission. * If the application finishes with a failed, killed, or undefined status, * throw an appropriate SparkException. */ def run(): Unit = { this.appId = submitApplication() if (!launcherBackend.isConnected() && fireAndForget) { val report = getApplicationReport(appId) val state = report.getYarnApplicationState logInfo(s"Application report for $appId (state: $state)") logInfo(formatReportDetails(report)) if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) { throw new SparkException(s"Application $appId finished with status: $state") } } else { val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId) if (yarnApplicationState == YarnApplicationState.FAILED || finalApplicationStatus == FinalApplicationStatus.FAILED) { throw new SparkException(s"Application $appId finished with failed status") } if (yarnApplicationState == YarnApplicationState.KILLED || finalApplicationStatus == FinalApplicationStatus.KILLED) { throw new SparkException(s"Application $appId is killed") } if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) { throw new SparkException(s"The final status of application $appId is undefined") } } }