Spark-deploy

Spark-deploy

@(spark)[deploy|yarn]

写在前面的话

请参考Spark源码分析之-deploy模块,虽然是13年的文章,但是作者写的比我明白多了。so 我写一半就算了。。。

在前文Spark源码分析之-scheduler模块中提到了Spark在资源管理和调度上采用了Hadoop YARN的方式:外层的资源管理器和应用内的任务调度器;并且分析了Spark应用内的任务调度模块。本文就Spark的外层资源管理器-deploy模块进行分析,探究Spark是如何协调应用之间的资源调度和管理的。

Spark最初是交由Mesos进行资源管理,为了使得更多的用户,包括没有接触过Mesos的用户使用Spark,Spark的开发者添加了Standalone的部署方式,也就是deploy模块。因此deploy模块只针对不使用Mesos进行资源管理的部署方式。

Deploy模块整体架构
deploy模块主要包含3个子模块:master, worker, client。他们继承于Actor,通过actor实现互相之间的通信。

Master:master的主要功能是接收worker的注册并管理所有的worker,接收client提交的application,(FIFO)调度等待的application并向worker提交。

Worker:worker的主要功能是向master注册自己,根据master发送的application配置进程环境,并启动StandaloneExecutorBackend。
Client:client的主要功能是向master注册并监控application。当用户创建SparkContext时会实例化SparkDeploySchedulerBackend,而实例化SparkDeploySchedulerBackend的同时就会启动client,通过向client传递启动参数和application有关信息,client向master发送请求注册application并且在slave node上启动StandaloneExecutorBackend。

ClientArguments

Command-line parser for the driver client.

  var master: String = ""                                                                                                                                               
  var jarUrl: String = ""                                                                                                                                               
  var mainClass: String = ""                                                                                                                                            
  var supervise: Boolean = DEFAULT_SUPERVISE                                                                                                                            
  var memory: Int = DEFAULT_MEMORY                                                                                                                                      
  var cores: Int = DEFAULT_CORES                                                                                                                                        
  private var _driverOptions = ListBuffer[String]()                                                                                                                     
  def driverOptions = _driverOptions.toSeq  

Command

private[spark] case class Command(
mainClass: String,
arguments: Seq[String],
environment: Map[String, String],
classPathEntries: Seq[String],
libraryPathEntries: Seq[String],
javaOpts: Seq[String]) {
}

ApplicationDescription

private[spark] class ApplicationDescription(                                                                                                                            
    val name: String,                                                                                                                                                   
    val maxCores: Option[Int],                                                                                                                                          
    val memoryPerSlave: Int,                                                                                                                                            
    val command: Command,                                                                                                                                               
    var appUiUrl: String,                                                                                                                                               
    val eventLogDir: Option[URI] = None,                                                                                                                                
    // short name of compression codec used when writing event logs, if any (e.g. lzf)                                                                                  
    val eventLogCodec: Option[String] = None)                                                                                                                           
  extends Serializable {   

DriverDescription

private[spark] class DriverDescription(                                                                                                                                 
    val jarUrl: String,                                                                                                                                                 
    val mem: Int,                                                                                                                                                       
    val cores: Int,                                                                                                                                                     
    val supervise: Boolean,                                                                                                                                             
    val command: Command)                                                                                                                                               
  extends Serializable {    

ExecutorState

private[spark] object ExecutorState extends Enumeration {                                                                                                               

  val LAUNCHING, LOADING, RUNNING, KILLED, FAILED, LOST, EXITED = Value                                                                                                 

  type ExecutorState = Value                                                                                                                                            

  def isFinished(state: ExecutorState): Boolean = Seq(KILLED, FAILED, LOST, EXITED).contains(state)                                                                     
}       

ExecutorDescription

private[spark] class ExecutorDescription(                                                                                                                               
    val appId: String,                                                                                                                                                  
    val execId: Int,                                                                                                                                                    
    val cores: Int,                                                                                                                                                     
    val state: ExecutorState.Value)                                                                                                                                     
  extends Serializable {    

SparkSubmitArguments

 * Parses and encapsulates arguments from the spark-submit script.                                                                                                      
 * The env argument is used for testing.                                                                                                                                
 */                                                                                                                                                                     
private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, String] = sys.env) {      

SparkSubmitDriverBootstrapper

/**                                                                                                                                                                     
 * Launch an application through Spark submit in client mode with the appropriate classpath,                                                                            
 * library paths, java options and memory. These properties of the JVM must be set before the                                                                           
 * driver JVM is launched. The sole purpose of this class is to avoid handling the complexity                                                                           
 * of parsing the properties file for such relevant configs in Bash.                                                                                                    
 *                                                                                                                                                                      
 * Usage: org.apache.spark.deploy.SparkSubmitDriverBootstrapper <submit args>                                                                                           
 */                                                                                                                                                                     
private[spark] object SparkSubmitDriverBootstrapper {   

JsonProtocol

private[spark] object JsonProtocol {   
把XXXinfo和XXXDescription转化成json

Client

AppClientListener

/**                                                                                                                                                                     
 * Callbacks invoked by deploy client when various events happen. There are currently four events:                                                                      
 * connecting to the cluster, disconnecting, being given an executor, and having an executor                                                                            
 * removed (either due to failure or due to revocation).                                                                                                                
 *                                                                                                                                                                      
 * Users of this API should *not* block inside the callback methods.                                                                                                    
 */                                                                                                                                                                     
private[spark] trait AppClientListener {  

AppClient

/**                                                                                                                                                                     
 * Interface allowing applications to speak with a Spark deploy cluster. Takes a master URL,                                                                            
 * an app description, and a listener for cluster events, and calls back the listener when various                                                                      
 * events occur.                                                                                                                                                        
 *                                                                                                                                                                      
 * @param masterUrls Each url should look like spark://host:port.                                                                                                       
 */                                                                                                                                                                     
private[spark] class AppClient(                                                                                                                                         
    actorSystem: ActorSystem,                                                                                                                                           
    masterUrls: Array[String],                                                                                                                                          
    appDescription: ApplicationDescription,                                                                                                                             
    listener: AppClientListener,                                                                                                                                        
    conf: SparkConf)                                                                                                                                                    
  extends Logging {    

ClientActor

/**                                                                                                                                                                     
 * Proxy that relays messages to the driver.                                                                                                                            
 */                                                                                                                                                                     
private class ClientActor(driverArgs: ClientArguments, conf: SparkConf)                                                                                                 
  extends Actor with ActorLogReceive with Logging {                                                                                                                     

PythonRunner

/**                                                                                                                                                                     
 * A main class used to launch Python applications. It executes python as a                                                                                             
 * subprocess and then has it connect back to the JVM to access system properties, etc.                                                                                 
 */                                                                                                                                                                     
object PythonRunner {       
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值