LoadingCache在Spark HistoryServer中的运用

之前分析了Spark HistoryServer的Web界面构建和后台数据解析的流程,下面介绍一下Web操作在后台执行的流程以及为了提高查询速度,数据在HistoryServer的缓存策略

绑定路由
在HistoryServer的实例化过程中,会绑定以/api/v1/开头的路由

attachHandler(ApiRootResource.getServletHandler(this))

ApiRootResource中制定了路由的规则,通过url访问路由会返回对应的查询结果
除了 /api/v1/applications、/api/v1/application/{appid}直接从内存中获取结果外,其他路由基本都是通过缓存中间层获取查询结果

查询缓存

def withSparkUI[T](appId: String, attemptId: Option[String])(f: SparkUI => T): T = {
  val appKey = attemptId.map(appId + "/" + _).getOrElse(appId)
  getSparkUI(appKey) match {
    case Some(ui) =>
      f(ui)
    case None => throw new NotFoundException("no such app: " + appId)
  }
}
def getSparkUI(appKey: String): Option[SparkUI] = {
  appCache.getSparkUI(appKey)
}
private val appCache = new ApplicationCache(this, retainedApplications, new SystemClock())
private[history] class ApplicationCache(
    val operations: ApplicationCacheOperations,
    val retainedApplications: Int,
    val clock: Clock) extends Logging {

  private val appLoader = new CacheLoader[CacheKey, CacheEntry] {
    /** the cache key doesn't match a cached entry, or the entry is out-of-date, so load it. */
    override def load(key: CacheKey): CacheEntry = {
      loadApplicationEntry(key.appId, key.attemptId)
    }

  }
………………………
  protected val appCache: LoadingCache[CacheKey, CacheEntry] = {
    CacheBuilder.newBuilder()
        .maximumSize(retainedApplications)
        .removalListener(removalListener)
        .build(appLoader)
  }
………………………

ApplicationCache该类中保存了一个appCache成员变量,该类是google提供的第三方包,用以实现缓存功能,使用该类,会将第一次查询的结果缓存起来,如果客户端再次发送相同的url请求,则将缓存的结果直接返回即可,节省资源和时间
通过查询UIRoot提供的接口withSparkUI(实际执行的是appCache实例的getSparkUI方法),最终会执行lookupAndUpdate函数,如下:

private def lookupAndUpdate(appId: String, attemptId: Option[String]): (CacheEntry, Boolean) = {
  metrics.lookupCount.inc()
  val cacheKey = CacheKey(appId, attemptId)
  var entry = appCache.getIfPresent(cacheKey)
  var updated = false
  if (entry == null) {
    // no entry, so fetch without any post-fetch probes for out-of-dateness
    // this will trigger a callback to loadApplicationEntry()
    entry = appCache.get(cacheKey)
  } else if (!entry.completed) {
    val now = clock.getTimeMillis()
    log.debug(s"Probing at time $now for updated application $cacheKey -> $entry")
    metrics.updateProbeCount.inc()
    updated = time(metrics.updateProbeTimer) {
      entry.updateProbe()
    }
    if (updated) {
      logDebug(s"refreshing $cacheKey")
      metrics.updateTriggeredCount.inc()
      appCache.refresh(cacheKey)
      // and repeat the lookup
      entry = appCache.get(cacheKey)
    } else {
      // update the probe timestamp to the current time
      entry.probeTime = now
    }
  }
  (entry, updated)
}

在该函数中,首先会调用appCache.getIfPresent(cacheKey)方法。如果缓存中存在值,则进一步判断是否application是否完成,如果完成了则直接返回,如果未完成,则检查数据是否有更新,如果有更新,则刷新该值后返回,如果没有更新则直接返回;如果缓存中不存在值,则直接计算值并返回。
计算的过程实际上就是调用CacheLoader(也就是appLoader实例)的load函数,在load函数中,会调用loadApplicationEntry函数

def loadApplicationEntry(appId: String, attemptId: Option[String]): CacheEntry = {
  logDebug(s"Loading application Entry $appId/$attemptId")
  metrics.loadCount.inc()
  time(metrics.loadTimer) {
    operations.getAppUI(appId, attemptId) match {

在loadApplicationEntry函数中,会调用operations(在这里实际是HistoryServer)的getAppUI方法

override def getAppUI(appId: String, attemptId: Option[String]): Option[LoadedAppUI] = {
  provider.getAppUI(appId, attemptId)
}

在这里又有一层调用,provider实例实际上是FsHistoryProvider,相关代码如下:

override def getAppUI(appId: String, attemptId: Option[String]): Option[LoadedAppUI] = {
  try {
    applications.get(appId).flatMap { appInfo =>
      appInfo.attempts.find(_.attemptId == attemptId).flatMap { attempt =>
        val replayBus = new ReplayListenerBus()
        val ui = {
          val conf = this.conf.clone()
          val appSecManager = new SecurityManager(conf)
          SparkUI.createHistoryUI(conf, replayBus, appSecManager, appInfo.name,
            HistoryServer.getAttemptURI(appId, attempt.attemptId), attempt.startTime)
          // Do not call ui.bind() to avoid creating a new server for each application
        }
        val fileStatus = fs.getFileStatus(new Path(logDir, attempt.logPath))
        val appListener = replay(fileStatus, isApplicationCompleted(fileStatus), replayBus)
        if (appListener.appId.isDefined) {
          val uiAclsEnabled = conf.getBoolean("spark.history.ui.acls.enable", false)
          ui.getSecurityManager.setAcls(uiAclsEnabled)
          // make sure to set admin acls before view acls so they are properly picked up
          ui.getSecurityManager.setAdminAcls(appListener.adminAcls.getOrElse(""))
          ui.getSecurityManager.setViewAcls(attempt.sparkUser,
            appListener.viewAcls.getOrElse(""))
          ui.getSecurityManager.setAdminAclsGroups(appListener.adminAclsGroups.getOrElse(""))
          ui.getSecurityManager.setViewAclsGroups(appListener.viewAclsGroups.getOrElse(""))
          Some(LoadedAppUI(ui, updateProbe(appId, attemptId, attempt.fileSize)))
        } else {
          None
        }
      }
    }
  } catch {
    case e: FileNotFoundException => None
  }
}

在该函数中,调用SparkUI.createHistoryUI会构件好相关的Web页面,同时将各模块的监听器(EnvironmentListener、StorageStatusListener、ExecutorsListener、StorageListener、RDDOperationGraphListener等)注册至SparkListenerBus,
然后调用replay函数,在该函数中会将eventlog文件中内容进行逐条解析并通知在SparkListenerBus上注册的各监听器获取各自需要的数据,为各模块提供数据呈现。最后将构好的UI返回去,用以渲染前端web页面。

到这里,已经完成了url请求到数据返回的流程

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值