Spark1.3从创建到提交:2)spark-submit和SparkContext源码分析

spark-sumit脚本

假如Spark提交的命令如下:

bin/spark-submit --jar lib/spark-examples-1.3.1-hadoop2.6.0.jar --class cn.spark.WordCount  --master spark://master:7077 --executor-memory 2g --total-executor-cores 4 

其shell脚本的核心执行语句:

exec "$SPARK_HOME"/bin/spark-class org.apache.spark.deploy.SparkSubmit "${ORIG_ARGS[@]}"
和上一节的Master节点的启动一样,它最最终也调用了spark-class

 exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"


org.apache.spark.deploy.SparkSubmit

先看下其伴生对象的main方法

def main(args: Array[String]): Unit = {
    val appArgs = new SparkSubmitArguments(args)
    if (appArgs.verbose) {
      printStream.println(appArgs)
    }
    appArgs.action match {
      case SparkSubmitAction.SUBMIT => submit(appArgs)
      case SparkSubmitAction.KILL => kill(appArgs)
      case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
    }
  }
通过参数识别出操作是提交Action,调用submit(appArgs)方法,在里面最终调用了一个比较重要的函数runMain
private def runMain(
      childArgs: Seq[String],
      childClasspath: Seq[String],
      sysProps: Map[String, String],
      childMainClass: String,
      verbose: Boolean): Unit = {
	 /* only keep the core codes */
	 //childMainClass就是前面spark-sumit中的--class  cn.spark.WordCount
      mainClass = Class.forName(childMainClass, true, loader);
	  //获得WordCount类中的main方法
	  val mainMethod = mainClass.getMethod("main", new Array[String](0).getClass)
	  //触发了WordCount.main方法
	  mainMethod.invoke(null, childArgs.toArray)  
}

上面的代码运行后,通过反射机制,我们定义的代码被执行了,比如我的写的WordCount如下:

object WordCount {
  def main(args: Array[String]): Unit = {
    val conf=new SparkConf()
    val sc=new SparkContext(conf)
    val rdd=sc.textFile("/tmp/word.txt");
    val rdd1=rdd.flatMap(_.split("\\s")).map((_,1)).reduceByKey(_+_)
    rdd.collect()
    sc.stop()
  }
}


SparkContext

SparkContext是进入Spark的唯一入口,在SparkContext被初始化的时候主要做了如下的事情 

private[spark] val env = createSparkEnv(conf, isLocal, listenerBus) 
private[spark] var (schedulerBackend, taskScheduler) =SparkContext.createTaskScheduler(this, master)   
@volatile private[spark] var dagScheduler: DAGScheduler = _
dagScheduler = new DAGScheduler(this)
taskScheduler.start()

1.创建了SparkEnv(ActorSystem),负责driver和excutor的通信

2.创建了DagScheduler 

3.创建了TaskScheduler,并启动了taskScheduler


SparkEnv

sparkEnv主要负责Spark之间的通信,Spark1.3使用的是Akka,Spark1.6之后变成了Netty(其实akka的底层就是netty)

  private[spark] def createSparkEnv(
      conf: SparkConf,
      isLocal: Boolean,
      listenerBus: LiveListenerBus): SparkEnv = {
    SparkEnv.createDriverEnv(conf, isLocal, listenerBus)
  }
最终在 类sparkEnv中调用了create方法,又看见了熟悉的面孔actorSystem
private def create(
      conf: SparkConf,
      executorId: String,
      hostname: String,
      port: Int,
      isDriver: Boolean,
      isLocal: Boolean,
      listenerBus: LiveListenerBus = null,
      numUsableCores: Int = 0,
      mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv = {

    // Listener bus is only used on the driver
    if (isDriver) {
      assert(listenerBus != null, "Attempted to create driver SparkEnv with null listener bus!")
    }
    val securityManager = new SecurityManager(conf)
    // Create the ActorSystem for Akka and get the port it binds to.
    val (actorSystem, boundPort) = {
      val actorSystemName = if (isDriver) driverActorSystemName else executorActorSystemName
      AkkaUtils.createActorSystem(actorSystemName, hostname, port, conf, securityManager)
    }
    //......
 }


如下贴出Spark提交时的工作图,里面的有些内容在之后的章节中提及到


啰嗦下:最近忙着开题都没有时间玩Spark了,加上最近有点感冒,博客也没时间写了..委屈


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值