spark 2.1 on yarn -- container shell analysis

最新推荐文章于 2023-03-13 07:30:00 发布

houzhizhen

最新推荐文章于 2023-03-13 07:30:00 发布

阅读量1.2k

点赞数

分类专栏： spark 文章标签： spark

本文链接：https://blog.csdn.net/houzhizhen/article/details/72675973

版权

spark 专栏收录该内容

157 篇文章 2 订阅

订阅专栏

I set the following content in spark-defaults.conf

spark.serializer                 org.apache.spark.serializer.KryoSerializer 
spark.master                   yarn
spark.executor.instances  2
spark.executor.cores      1
spark.executor.memory 512m

When execute spark-shell, it will create two executors.

jps
32412 CoarseGrainedExecutorBackend
32444 CoarseGrainedExecutorBackend

look at the command of one executor.

]$ ps aux | grep 32412
houzhiz+   374  0.0  0.0 112668   976 pts/1    R+   14:08   0:00 grep --color=auto 32412
houzhiz+ 32412 15.1  4.3 2371448 342156 ?      Sl   14:03   0:46 /usr/local/java/bin/java -server -Xmx512m -Djava.io.tmpdir=/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002/tmp -Dspark.driver.port=35736 -Dspark.yarn.app.container.log.dir=/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.122.1:35736 --executor-id 1 --hostname localhost --cores 1 --app-id application_1495532285542_0005 --user-class-path file:/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002/__app__.jar

Look the container directory.

$ cd /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002
[houzhizhen@localhost container_1495532285542_0005_01_000002]$ ll
总用量 20
-rw-rw-r--. 1 houzhizhen houzhizhen   86 5月  24 14:03 container_tokens
-rwx------. 1 houzhizhen houzhizhen  703 5月  24 14:03 default_container_executor_session.sh
-rwx------. 1 houzhizhen houzhizhen  757 5月  24 14:03 default_container_executor.sh
-rwx------. 1 houzhizhen houzhizhen 3590 5月  24 14:03 launch_container.sh
lrwxrwxrwx. 1 houzhizhen houzhizhen   89 5月  24 14:03 __spark_conf__ -> /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/17/__spark_conf__.zip
lrwxrwxrwx. 1 houzhizhen houzhizhen  108 5月  24 14:03 __spark_libs__ -> /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/16/__spark_libs__7172508084572895679.zip
drwx--x---. 2 houzhizhen houzhizhen    6 5月  24 14:03 tmp
[houzhizhen@localhost container_1495532285542_0005_01_000002]$

Open the spark configuration, you can see spark.executor.id=driver, and from __spark_conf__ -> /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/17/__spark_conf__.zip, so it can safely conclude that the configure file is shared across executors of the same spark application.

 cat __spark_conf__/__spark_conf__.properties 
#Spark configuration.
#Wed May 24 14:03:27 CST 2017
spark.yarn.cache.visibilities=PRIVATE
spark.yarn.cache.timestamps=1495605805866
spark.executor.memory=512m
spark.executor.id=driver
spark.driver.host=192.168.122.1
spark.yarn.cache.confArchive=hdfs\://localhost\:8020/user/houzhizhen/.sparkStaging/application_1495532285542_0005/__spark_conf__.zip
spark.files.ignoreCorruptFiles=true
spark.yarn.cache.sizes=200756074
spark.jars=
spark.sql.catalogImplementation=hive
spark.home=/usr/local/spark
spark.submit.deployMode=client
spark.executor.heartbeatInterval=2
spark.master=yarn
spark.yarn.cache.filenames=hdfs\://localhost\:8020/user/houzhizhen/.sparkStaging/application_1495532285542_0005/__spark_libs__7172508084572895679.zip\#__spark_libs__
spark.executor.cores=1
spark.yarn.cache.types=ARCHIVE
spark.driver.appUIAddress=http\://192.168.122.1\:4040
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.repl.class.outputDir=/tmp/spark-caaf86f0-267d-4b39-9bfe-833d97db838e/repl-e03f92dd-176d-42b5-9ebd-a1e3d66c7e1c
spark.executor.instances=2
spark.app.name=Spark shell
spark.repl.class.uri=spark\://192.168.122.1\:35736/classes
spark.driver.port=35736

Open launch_container.sh, you can see $PWD/__spark_conf__:$PWD/__spark_libs__/* is included in the CLASSPATH. From the last command, it can see the executor-id is override with --executor-id 1

launch_container.sh

cat launch_container.sh 
#!/bin/bash

export SPARK_YARN_STAGING_DIR="hdfs://localhost:8020/user/houzhizhen/.sparkStaging/application_1495532285542_0005"
export HADOOP_CONF_DIR="/usr/local/hadoop/etc/hadoop"
export JAVA_HOME="/usr/local/java"
export SPARK_LOG_URL_STDOUT="http://localhost:8042/node/containerlogs/container_1495532285542_0005_01_000002/houzhizhen/stdout?start=-4096"
export NM_HOST="localhost"
export SPARK_HOME="/usr/local/spark"
export HADOOP_HDFS_HOME="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2"
export LOGNAME="houzhizhen"
export JVM_PID="$$"
export PWD="/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002"
export HADOOP_COMMON_HOME="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2"
export LOCAL_DIRS="/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005"
export NM_HTTP_PORT="8042"
export LOG_DIRS="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002"
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
"
export NM_PORT="33996"
export USER="houzhizhen"
export HADOOP_YARN_HOME="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2"
export CLASSPATH="$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*"
export SPARK_YARN_MODE="true"
export HADOOP_TOKEN_FILE_LOCATION="/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002/container_tokens"
export SPARK_USER="houzhizhen"
export SPARK_LOG_URL_STDERR="http://localhost:8042/node/containerlogs/container_1495532285542_0005_01_000002/houzhizhen/stderr?start=-4096"
export HOME="/home/"
export CONTAINER_ID="container_1495532285542_0005_01_000002"
export MALLOC_ARENA_MAX="4"
ln -sf "/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/17/__spark_conf__.zip" "__spark_conf__"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi
ln -sf "/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/16/__spark_libs__7172508084572895679.zip" "__spark_libs__"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi
exec /bin/bash -c "$JAVA_HOME/bin/java -server -Xmx512m -Djava.io.tmpdir=$PWD/tmp '-Dspark.driver.port=35736' -Dspark.yarn.app.container.log.dir=/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.122.1:35736 --executor-id 1 --hostname localhost --cores 1 --app-id application_1495532285542_0005 --user-class-path file:$PWD/__app__.jar 1>/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002/stdout 2>/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002/stderr"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi

houzhizhen

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark 2.1 on yarn -- container shell analysis

I set the following content in spark-defaults.confspark.serializer org.apache.spark.serializer.KryoSerializer spark.master yarnspark.executor.instances 2spark.exec
复制链接

扫一扫