spark-shell脚本浅析(Spark2.0.0)

一、首先,po出spark-shell脚本代码,位置坐标为:$SPARK_HOME/bin/spark-shell

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# Shell script for starting the Spark Shell REPL
cygwin=false
case "`uname`" in
  CYGWIN*) cygwin=true;;
esac
# Enter posix mode for bash
set -o posix
if [ -z "${SPARK_HOME}" ]; then
  export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]"
# SPARK-4161: scala does not assume use of the java classpath,
# so we need to add the "-Dscala.usejavacp=true" flag manually. We
# do this specifically for the Spark shell because the scala REPL
# has its own class loader, and any additional classpath specified
# through spark.driver.extraClassPath is not automatically propagated.
SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"
function main() {
  if $cygwin; then
"spark-shell" 95L, 3026C                                                                                                                              1,1           Top
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# Shell script for starting the Spark Shell REPL
cygwin=false
case "`uname`" in
  CYGWIN*) cygwin=true;;
esac
# Enter posix mode for bash
set -o posix
if [ -z "${SPARK_HOME}" ]; then
  export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]"
# SPARK-4161: scala does not assume use of the java classpath,
# so we need to add the "-Dscala.usejavacp=true" flag manually. We
# do this specifically for the Spark shell because the scala REPL
# has its own class loader, and any additional classpath specified
# through spark.driver.extraClassPath is not automatically propagated.
SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"
function main() {
  if $cygwin; then
    # Workaround for issue involving JLine and Cygwin
    # (see http://sourceforge.net/p/jline/bugs/40/).
    # If you're using the Mintty terminal emulator in Cygwin, may need to set the
    # "Backspace sends ^H" setting in "Keys" section of the Mintty options
    # (see https://github.com/sbt/sbt/issues/562).
    stty -icanon min 1 -echo > /dev/null 2>&1
    export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"
    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
    stty icanon echo > /dev/null 2>&1
  else
    export SPARK_SUBMIT_OPTS
    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
  fi
}
# Copy restore-TTY-on-exit functions from Scala script so spark-shell exits properly even in
# binary distribution of Spark where Scala is not installed
exit_status=127
saved_stty=""
# restore stty settings (echo in particular)
function restoreSttySettings() {
  stty $saved_stty
  saved_stty=""
}
function onExit() {
  if [[ "$saved_stty" != "" ]]; then
    restoreSttySettings
  fi
  exit $exit_status
}
# to reenable echo if we are interrupted before completing.
trap onExit INT
# save terminal settings
saved_stty=$(stty -g 2>/dev/null)
# clear on error so we don't later try to restore them
if [[ ! $? ]]; then
  saved_stty=""
fi
main "$@"
# record the exit status lest it be overwritten:
# then reenable echo and propagate the code.
exit_status=$?
onExit

二、采用bash -x命令执行spark-shell,观察执行的步骤:

xxxx@Master:/opt/spark/bin$ bash -x spark-shell
+ bash -x spark-shell
+ cygwin=false
+ case "`uname`" in
++ uname
+ set -o posix
+ '[' -z /opt/spark ']'
+ export '_SPARK_CMD_USAGE=Usage: ./bin/spark-shell [options]'
+ _SPARK_CMD_USAGE='Usage: ./bin/spark-shell [options]'
+ SPARK_SUBMIT_OPTS=' -Dscala.usejavacp=true'
+ exit_status=127
+ saved_stty=
+ trap onExit INT
++ stty -g
+ saved_stty=500:5:bf:8a3b:3:1c:7f:15:4:0:1:0:11:13:1a:0:12:f:17:16:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0
+ [[ ! -n 0 ]]
+ main
+ false
+ export SPARK_SUBMIT_OPTS
+ /opt/spark/bin/spark-submit --class org.apache.spark.repl.Main --name 'Spark shell'

执行结果当然就是那个熟悉的画面:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/10/13 10:40:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/13 10:40:24 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.100.3.88:4040
Spark context available as 'sc' (master = local[*], app id = local-1476326423649).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
scala> 

好,我们一步一步来分析下每一行被执行的代码是什么个意思~
1)step1(69~72行):判断是否为cygwin;在linux系统中,uname运行的结果是linux,值不等于CYGWIN*,故cygwin=false

cygwin=false
case "`uname`" in
  CYGWIN*) cygwin=true;;
esac

2)step2(75行):开启posix模式

set -o posix

3)step3(77~79行):判断系统环境变量SPARK_HOME是否为空

if [ -z "${SPARK_HOME}" ]; then
  export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi

4)step4(81行):输出环境变量_SPARK_CMD_USAGE

export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]"

5)step5(88行):声明变量SPARK_SUBMIT_OPTS

SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"

6)step6(109~110行)

# Copy restore-TTY-on-exit functions from Scala script so spark-shell exits properly even in
# binary distribution of Spark where Scala is not installed
exit_status=127
saved_stty=""

7)step7(126行)

# to reenable echo if we are interrupted before completing.
trap onExit INT

8)step8(128~133行):保存终端设置,标准输出和标准错误输出都送往/dev/null(linux中的空设备文件)

# save terminal settings
saved_stty=$(stty -g 2>/dev/null)
# clear on error so we don't later try to restore them
if [[ ! $? ]]; then
  saved_stty=""
fi

注:2>&1 的意思是将标准错误也输出到标准输出当中;1>&2是将标准输出输出到标准错误当中

9)step9(90~105行):传输入参数执行main函数

main "$@"

然后,由于$cygwin=false(判断不是cygwin模式),所以执行到第102~103行:

export SPARK_SUBMIT_OPTS
    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"

从上面可以看到,其实最后调用的是spark-submit命令,并指定–class参数为org.apache.spark.repl.Main类,后面接的是spark-submit的提交参数,再后面是spark-shell,最后是传递应用的参数
注:bash 中,$@ 是获取脚本所有的输入参数

10)step10(137~140行):获取main函数运行结果:

# record the exit status lest it be overwritten:
# then reenable echo and propagate the code.
exit_status=$?
onExit

三、从spark-shell命令行模式退出,看看退出时的执行步骤:

scala> :quit
+ exit_status=0
+ onExit
+ [[ 500:5:bf:8a3b:3:1c:7f:15:4:0:1:0:11:13:1a:0:12:f:17:16:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 != '' ]]
+ restoreSttySettings
+ stty 500:5:bf:8a3b:3:1c:7f:15:4:0:1:0:11:13:1a:0:12:f:17:16:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0
+ saved_stty=
+ exit 0

从上述结果,可看到命令行模式正常退出,返回0;之后调用了restoreSttySettings函数,还原终端设置,最后退出程序。

  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值