Spark的部署和Application提交方式会按照Cluster Manager和Deploy Mode进行划分。。。以前只知道Spark的Cluster部署分为Standalone、YARN和Mesos。。。
关于Cluster Manager和Deploy Mode的组合在SparkSubmit.scala的createLaunchEnv中有比较详细的逻辑。
Cluster Manager基本上有Standalone,YARN和Mesos三种情况,说明Cluster Manager用来指明集群的资源管理器。这就是说不管是Client还是Cluster部署方式(deployMode的两种可能),都会使用它们做集群管理器,也就是说Client也是一种集群部署方式???
/**
* @return a tuple containing
* (1) the arguments for the child process,
* (2) a list of classpath entries for the child,
* (3) a list of system properties and env vars, and
* (4) the main class for the child
*/
//createLaunchEnv的方法返回值
//1.子进程的参数,字符串数组,ArrayBuffer[String]
//2.子进程JVM的classpath路径列表,字符串数组,ArrayBuffer[String]
//3.子进程的系统变量和环境变量 ,HashMap类型
//4.子进程JVM的main class
private[spark] def createLaunchEnv(args: SparkSubmitArguments)
: (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], String) = {
// Values to return
val childArgs = new ArrayBuffer[String]()
val childClasspath = new ArrayBuffer[String]()
val sysProps = new HashMap[String, String]()
var childMainClass = ""
// Set the cluster manager
//集群管理器,这里指定了四种:YARN,STANDALONE,MESON和LOCAL
//需要注意的是,为什么LOCAL也是一种集群管理器,它的集群含义是什么?
//根据args.master参数值决定clusterManager,注意,区分大小写
//这里只检查master是否以yarn, spark, mesos或者local开头,实际中,以yarn开头的master值可能是yarn-client,yarn-cluster,yarn-standalone,所以代码后面对master做了更进一步的检查
val clusterManager: Int = args.master match {
case m if m.startsWith("yarn") => YARN
case m if m.startsWith("spark") => STANDALONE
case m if m.startsWith("mesos") => MESOS
case m if m.startsWith("local") => LOCAL
//如果master不以这四个开头,提示出错信息是***Master***必须以yarn, spark, mesos, or local开头
case _ => printErrorAndExit("Master must start with yarn, spark, mesos, or local"); -1
}
// Set the deploy mode; default is client mode
//设置部署模式,有两种模式,client和cluster模式
//如果没有设置deployMode(取值null),则默认认为是client模式
var deployMode: Int = args.deployMode match {
case "client" | null => CLIENT
case "cluster" => CLUSTER
case _ => printErrorAndExit("Deploy mode must be either client or cluster"); -1
}
// Because "yarn-cluster" and "yarn-client" encapsulate both the master
// and deploy mode, we have some logic to infer the master and deploy mode
// from each other if only one is specified, or exit early if they are at odds.
///因为yarn-cluster和yarn-client封装了master和deployMode, 这里对yarn-cluster和yarn-client两种集群管理器和部署模式的组合进行了特殊处理
///只要知道一个就可以推倒出另一个we have some logic to infer the master and deploy mode from each other
//如果args.master以yarn开头(导致clusterManager == YARN为tr