首先我们来看SPARK_HOME/bin/spark-shell脚本,内容:
function main() { if $cygwin; then stty -icanon min 1 -echo > /dev/null 2>&1 export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix" "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@" stty icanon echo > /dev/null 2>&1 else export SPARK_SUBMIT_OPTS "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@" fi }
可以看到使用的是org.apache.spark.repl.Main为入口。
1.查阅repl.Main源代码,主要设计如下内容
1.1定义私有变量 conf,sparkContext,sparkSession,SparkILoop,hasErrors=false用来标记错误,代码如下:
其中main方法中调用了object Main的私有方法doMain,同时创建了新的SparkILoop对象,该类主要提供repl功能,如BufferReader提供Read,JPrintWriter提供Writer,而通过SparkILoop的父类ILoop提供命令的run,processLine方法实现命令循环计算等操作,SparkILoop和ILoop的源代码如下:object Main extends Logging {val conf = new SparkConf()var sparkContext: SparkContext = _var sparkSession: SparkSession = _// this is a public var because tests reset it.var interp: SparkILoop = _
private var hasErrors = falseprivate def scalaOptionError(msg: String): Unit = {hasErrors = trueConsole.err.println(msg)}
def main(args: Array[String]) {doMain(args, new SparkILoop)}}
class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop(in0, out) {def this(in0: BufferedReader, out: JPrintWriter) = this(Some(in0), out)
def initializeSpark() {intp.beQuietDuring {//创建sparkSession,使用transient修饰符来标识一个成员变量在序列化子系统中应被忽略//调用父类ILoop的processLine方法processLine("""@transient val spark = if (org.apache.spark.repl.Main.sparkSession != null) {org.apache.spark.repl.Main.sparkSession} else {org.apache.spark.repl.Main.createSparkSession()}//创建sc@transient val sc = {val _sc = spark.sparkContext_sc.uiWebUrl.foreach(webUrl => println(s"Spark context Web UI available at ${webUrl}"))println("Spark context available as 'sc' " +s"(master = ${_sc.master}, app id = ${_sc.applicationId}).")println("Spark session available as 'spark'.")_sc}""")processLine("import org.apache.spark.SparkContext._")processLine("import spark.implicits._")processLine("import spark.sql")processLine("import org.apache.spark.sql.functions._")replayCommandStack = Nil // remove above commands from session history.}}
/** Add repl commands that needs to be blocked. e.g. reset */private val blockedCommands = Set[String]()
/** Standard commands */lazy val spark