【Spark三十六】Spark On Yarn之yarn-client方式部署

最新推荐文章于 2023-04-17 23:14:59 发布

axxbc123

最新推荐文章于 2023-04-17 23:14:59 发布

阅读量670

点赞数

分类专栏： Spark 文章标签：大数据 java ui

本文链接：https://blog.csdn.net/axxbc123/article/details/84699419

版权

按照Spark的部署设置，对于Spark运行于Yarn之上，有如下四种选择方式(本质上是两种)，

yarn-client+client
yarn-cluster+cluster
yarn-client(部署方式默认为client)
yarn-cluster(部署方式默认为cluster)

yarn-client+cluster组合以及yarn-cluster+client是不正确的组合，Spark报错退出。

本文首先探讨Spark On Yarn之yarn-client+client方式部署下的代码执行流程

程序提交给Yarn运行时环境

对于部署方式是Client的情况，SparkSubmit的main函数中通过反射执行应用程序的main方法
在应用程序的main方法中，创建SparkContext实例
在创建SparkContext的实例过程中，通过如下语句创建Scheduler和Backend实例

  private[spark] var (schedulerBackend, taskScheduler) =  SparkContext.createTaskScheduler(this, master)

由于当前是yarn-client和client组合部署模式，

1.代码执行逻辑是： taskScheduler是org.apache.spark.scheduler.cluster.YarnClientClusterScheduler实例，它是TaskSchedulerImpl的子类，它的文档说明为

/**
 * This scheduler launches executors through Yarn - by calling into Client to launch the Spark AM.
 */

2.schedulerBackend是org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend实例，它是CoarseGrainedSchedulerBackend的子类，它是文档说明为无

继续SparkContext实例的创建过程，调用taskScheduler的start方法，也即YarnClientClusterScheduler的start方法，因为YarnClientClusterScheduler并没有覆盖TaskSchedulerImpl的start方法，所以执行逻辑进入到TaskSchedulerImpl的start方法中
在TaskSchedulerImpl的start方法中，调用backend的start方法，由于此处的backend是YarnClientSchedulerBackend，所以代码逻辑进入到YarnClientSchedulerBackend的start方法中
在YarnClientSchedulerBackend的start方法中，创建YarnClient（将用户编写的应用程序提交给Yarn的ResourceManager）
在YarnClient创建yarn.Client对象，然后调用submitApplication，等待Application执行完，如下代码所示

    client = new Client(args, conf) //yarn.client
    appId = client.submitApplication()
    waitForApplication() ///阻塞等待Application进入Running状态
    asyncMonitorApplication() ///异步监控Application的运行状态，If the application has exited for any reason, stop the SparkContext.

程序逻辑进入了yarn.Client调用submitApplication的逻辑，执行代码：

1.submitApplication的代码(Spark)

  /**
   * Submit an application running our ApplicationMaster to the ResourceManager.
   *
   * The stable Yarn API provides a convenience method (YarnClient#createApplication) for
   * creating applications and setting up the application submission context. This was not
   * available in the alpha API.
   */
  这里借助Hadoop Yarn提供的API提交应用程序，这里的API是Hadoop Yarn的YarnClient
  override def submitApplication(): ApplicationId = {
    yarnClient.init(yarnConf) yarnClient是通过YarnClient.createYarnClient创建，而YarnClient是Hadoop API,所以yarnClient也是Hadoop的API
    yarnClient.start() ///启动Yarn

    logInfo("Requesting a new application from cluster with %d NodeManagers" ///NodeManager是什么概念？
      .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))

    // Get a new application from our RM
    val newApp = yarnClient.createApplication()///代码运行到此处，还没有真正的把程序代码提交给Yarn去运行；这里使用YarnClient创建一个Application，类型为YarnClientApplication
    val newAppResponse = newApp.getNewApplicationResponse() ///返回GetNewApplicationResponse类型
    val appId = newAppResponse.getApplicationId() ///获取applicationId

    // Verify whether the cluster has enough resources for our AM
    verifyClusterResources(newAppResponse)

    // Set up the appropriate contexts to launch our AM
    val containerContext = createContainerLaunchContext(newAppResponse) ///创建启动Ap

最低0.47元/天解锁文章

axxbc123

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【Spark三十六】Spark On Yarn之yarn-client方式部署

按照Spark的部署设置，对于Spark运行于Yarn之上，有如下四种选择方式(本质上是两种)，yarn-client+clientyarn-cluster+clusteryarn-client(部署方式默认为client)yarn-cluster(部署方式默认为cluster)yarn-client+cluster组合以及yarn-cluster+client是不正确...
复制链接

扫一扫

专栏目录