LoadingCache在Spark HistoryServer中的运用

最新推荐文章于 2023-02-16 14:57:44 发布

夜爬梧桐山

最新推荐文章于 2023-02-16 14:57:44 发布

阅读量574

点赞数

分类专栏： Spark 文章标签： spark 源码

本文链接：https://blog.csdn.net/u013597009/article/details/75240944

版权

Spark 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

之前分析了Spark HistoryServer的Web界面构建和后台数据解析的流程，下面介绍一下Web操作在后台执行的流程以及为了提高查询速度，数据在HistoryServer的缓存策略

绑定路由
在HistoryServer的实例化过程中，会绑定以/api/v1/开头的路由

attachHandler(ApiRootResource.getServletHandler(this))

ApiRootResource中制定了路由的规则，通过url访问路由会返回对应的查询结果
除了 /api/v1/applications、/api/v1/application/{appid}直接从内存中获取结果外，其他路由基本都是通过缓存中间层获取查询结果

查询缓存

def withSparkUI[T](appId: String, attemptId: Option[String])(f: SparkUI => T): T = {
  val appKey = attemptId.map(appId + "/" + _).getOrElse(appId)
  getSparkUI(appKey) match {
    case Some(ui) =>
      f(ui)
    case None => throw new NotFoundException("no such app: " + appId)
  }
}

def getSparkUI(appKey: String): Option[SparkUI] = {
  appCache.getSparkUI(appKey)
}

private val appCache = new ApplicationCache(this, retainedApplications, new SystemClock())

private[history] class ApplicationCache(
    val operations: ApplicationCacheOperations,
    val retainedApplications: Int,
    val clock: Clock) extends Logging {

  private val appLoader = new CacheLoader[CacheKey, CacheEntry] {
    /** the cache key doesn't match a cached entry, or the entry is out-of-date, so load it. */
    override def load(key: CacheKey): CacheEntry = {
      loadApplicationEntry(key.appId, key.attemptId)
    }

  }
………………………
  protected val appCache: LoadingCache[CacheKey, CacheEntry] = {
    CacheBuilder.newBuilder()
        .maximumSize(retainedApplications)
        .removalListener(removalListener)
        .build(appLoader)
  }
………………………

ApplicationCache该类中保存了一个appCache成员变量，该类是google提供的第三方包，用以实现缓存功能，使用该类，会将第一次查询的结果缓存起来，如果客户端再次发送相同的url请求，则将缓存的结果直接返回即可，节省资源和时间
通过查询UIRoot提供的接口withSparkUI（实际执行的是appCache实例的getSparkUI方法）,最终会执行lookupAndUpdate函数，如下：

private def lookupAndUpdate(appId: String, attemptId: Option[String]): (CacheEntry, Boolean) = {
  metrics.lookupCount.inc()
  val cacheKey = CacheKey(appId, attemptId)
  var entry = appCache.getIfPresent(cacheKey)
  var updated = false
  if (entry == null) {
    // no entry, so fetch without any post-fetch probes for out-of-dateness
    // this will trigger a callback to loadApplicationEntry()
    entry = appCache.get(cacheKey)
  } else if (!entry.completed) {
    val now = clock.getTimeMillis()
    log.debug(s"Probing at time $now for updated application $cacheKey -> $entry")
    metrics.updateProbeCount.inc()
    updated = time(metrics.updateProbeTimer) {
      entry.updateProbe()
    }
    if (updated) {
      logDebug(s"refreshing $cacheKey")
      metrics.updateTriggeredCount.inc()
      appCache.refresh(cacheKey)
      // and repeat the lookup
      entry = appCache.get(cacheKey)
    } else {
      // update the probe timestamp to the current time
      entry.probeTime = now
    }
  }
  (entry, updated)
}

在该函数中，首先会调用appCache.getIfPresent(cacheKey)方法。如果缓存中存在值，则进一步判断是否application是否完成，如果完成了则直接返回，如果未完成，则检查数据是否有更新，如果有更新，则刷新该值后返回，如果没有更新则直接返回；如果缓存中不存在值，则直接计算值并返回。
计算的过程实际上就是调用CacheLoader（也就是appLoader实例）的load函数，在load函数中，会调用loadApplicationEntry函数

def loadApplicationEntry(appId: String, attemptId: Option[String]): CacheEntry = {
  logDebug(s"Loading application Entry $appId/$attemptId")
  metrics.loadCount.inc()
  time(metrics.loadTimer) {
    operations.getAppUI(appId, attemptId) match {

在loadApplicationEntry函数中，会调用operations（在这里实际是HistoryServer）的getAppUI方法

override def getAppUI(appId: String, attemptId: Option[String]): Option[LoadedAppUI] = {
  provider.getAppUI(appId, attemptId)
}

在这里又有一层调用，provider实例实际上是FsHistoryProvider，相关代码如下：

override def getAppUI(appId: String, attemptId: Option[String]): Option[LoadedAppUI] = {
  try {
    applications.get(appId).flatMap { appInfo =>
      appInfo.attempts.find(_.attemptId == attemptId).flatMap { attempt =>
        val replayBus = new ReplayListenerBus()
        val ui = {
          val conf = this.conf.clone()
          val appSecManager = new SecurityManager(conf)
          SparkUI.createHistoryUI(conf, replayBus, appSecManager, appInfo.name,
            HistoryServer.getAttemptURI(appId, attempt.attemptId), attempt.startTime)
          // Do not call ui.bind() to avoid creating a new server for each application
        }
        val fileStatus = fs.getFileStatus(new Path(logDir, attempt.logPath))
        val appListener = replay(fileStatus, isApplicationCompleted(fileStatus), replayBus)
        if (appListener.appId.isDefined) {
          val uiAclsEnabled = conf.getBoolean("spark.history.ui.acls.enable", false)
          ui.getSecurityManager.setAcls(uiAclsEnabled)
          // make sure to set admin acls before view acls so they are properly picked up
          ui.getSecurityManager.setAdminAcls(appListener.adminAcls.getOrElse(""))
          ui.getSecurityManager.setViewAcls(attempt.sparkUser,
            appListener.viewAcls.getOrElse(""))
          ui.getSecurityManager.setAdminAclsGroups(appListener.adminAclsGroups.getOrElse(""))
          ui.getSecurityManager.setViewAclsGroups(appListener.viewAclsGroups.getOrElse(""))
          Some(LoadedAppUI(ui, updateProbe(appId, attemptId, attempt.fileSize)))
        } else {
          None
        }
      }
    }
  } catch {
    case e: FileNotFoundException => None
  }
}

在该函数中，调用SparkUI.createHistoryUI会构件好相关的Web页面，同时将各模块的监听器（EnvironmentListener、StorageStatusListener、ExecutorsListener、StorageListener、RDDOperationGraphListener等）注册至SparkListenerBus，
然后调用replay函数，在该函数中会将eventlog文件中内容进行逐条解析并通知在SparkListenerBus上注册的各监听器获取各自需要的数据，为各模块提供数据呈现。最后将构好的UI返回去，用以渲染前端web页面。

到这里，已经完成了url请求到数据返回的流程

夜爬梧桐山

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
LoadingCache在Spark HistoryServer中的运用

之前分析了Spark HistoryServer的Web界面构建和后台数据解析的流程，下面介绍一下Web操作在后台执行的流程以及为了提高查询速度，数据在HistoryServer的缓存策略绑定路由在HistoryServer的实例化过程中，会绑定以/api/v1/开头的路由attachHandler(ApiRootResource.getServletHandler(this))ApiRootR
复制链接

扫一扫