Spark是如何实现远程Thread Dump 的？

最新推荐文章于 2023-10-09 14:46:07 发布

wankunde

最新推荐文章于 2023-10-09 14:46:07 发布

阅读量1.7k

点赞数

分类专栏： spark

本文链接：https://blog.csdn.net/wankunde/article/details/89792008

版权

spark 专栏收录该内容

69 篇文章 7 订阅

订阅专栏

问题研究入口还是从web页面ExecutorThreadDumpPage入手，提供ThreadDump的入口在SparkContext的getExecutorThreadDump方法。

// 1. 查看Executor Thread页面入口，通过 executorId 获取ThreadDump,
private[ui] class ExecutorThreadDumpPage(
    parent: SparkUITab,
    sc: Option[SparkContext]) extends WebUIPage("threadDump")

    val executorId =
      Option(UIUtils.stripXSS(request.getParameter("executorId"))).map { executorId =>
      UIUtils.decodeURLParameter(executorId)
    }.getOrElse {
      throw new IllegalArgumentException(s"Missing executorId parameter")
    }
    val time = System.currentTimeMillis()
    val maybeThreadDump = sc.get.getExecutorThreadDump(executorId)

实现的基本思路是通过ThreadMXBean获取获取threadInfos信息，并通过加工来使ThreadInfo信息更容易读取。如果是获取Driver端堆栈就直接调用该方法；如果是获取Executor端堆栈，则进行RPC Call来调用该方法。


// 2. 如果 executorId 是Driver 本身，则Utils.getThreadDump()，否则向对应的Executor endpointRef 发送TriggerThreadDump RPC。Executor Endpoint接收到请求后，也还是调用 Utils.getThreadDump()
/**
   * Called by the web UI to obtain executor thread dumps.  This method may be expensive.
   * Logs an error and returns None if we failed to obtain a thread dump, which could occur due
   * to an executor being dead or unresponsive or due to network issues while sending the thread
   * dump message back to the driver.
   */
  private[spark] def getExecutorThreadDump(executorId: String): Option[Array[ThreadStackTrace]] = {
    try {
      if (executorId == SparkContext.DRIVER_IDENTIFIER) {
        Some(Utils.getThreadDump())
      } else {
        val endpointRef = env.blockManager.master.getExecutorEndpointRef(executorId).get
        Some(endpointRef.askSync[Array[ThreadStackTrace]](TriggerThreadDump))
      }
    } catch {
      case e: Exception =>
        logError(s"Exception getting thread dump from executor $executorId", e)
        None
    }
  }

  // 远程RPC Call过程
  /**
   * Driver to Executor message to trigger a thread dump.
   */
  case object TriggerThreadDump extends ToBlockManagerSlave

    // BlockManagerSlaverEndpoint.scala 
      case TriggerThreadDump =>
      context.reply(Utils.getThreadDump())

  // 3. 通过ThreadMXBean获取所有Thread，并使用 threadInfoToThreadStackTrace() 方法来使Thread 更好读
  /** Return a thread dump of all threads' stacktraces.  Used to capture dumps for the web UI */
  def getThreadDump(): Array[ThreadStackTrace] = {
    // We need to filter out null values here because dumpAllThreads() may return null array
    // elements for threads that are dead / don't exist.
    val threadInfos = ManagementFactory.getThreadMXBean.dumpAllThreads(true, true).filter(_ != null)
    threadInfos.sortBy(_.getThreadId).map(threadInfoToThreadStackTrace)
  }

  def getThreadDumpForThread(threadId: Long): Option[ThreadStackTrace] = {
    if (threadId <= 0) {
      None
    } else {
      // The Int.MaxValue here requests the entire untruncated stack trace of the thread:
      val threadInfo =
        Option(ManagementFactory.getThreadMXBean.getThreadInfo(threadId, Int.MaxValue))
      threadInfo.map(threadInfoToThreadStackTrace)
    }
  }

  // 4. 将ThreadInfo对象转换为 ThreadStackTrace 对象
  private def threadInfoToThreadStackTrace(threadInfo: ThreadInfo): ThreadStackTrace = {
    val monitors = threadInfo.getLockedMonitors.map(m => m.getLockedStackFrame -> m).toMap
    val stackTrace = threadInfo.getStackTrace.map { frame =>
      monitors.get(frame) match {
        case Some(monitor) =>
          monitor.getLockedStackFrame.toString + s" => holding ${monitor.lockString}"
        case None =>
          frame.toString
      }
    }.mkString("\n")

    // use a set to dedup re-entrant locks that are held at multiple places
    val heldLocks =
      (threadInfo.getLockedSynchronizers ++ threadInfo.getLockedMonitors).map(_.lockString).toSet

    ThreadStackTrace(
      threadId = threadInfo.getThreadId,
      threadName = threadInfo.getThreadName,
      threadState = threadInfo.getThreadState,
      stackTrace = stackTrace,
      blockedByThreadId =
        if (threadInfo.getLockOwnerId < 0) None else Some(threadInfo.getLockOwnerId),
      blockedByLock = Option(threadInfo.getLockInfo).map(_.lockString).getOrElse(""),
      holdingLocks = heldLocks.toSeq)
  }

private[spark] case class ThreadStackTrace(
  threadId: Long,
  threadName: String,
  threadState: Thread.State,
  stackTrace: String,
  blockedByThreadId: Option[Long],
  blockedByLock: String,
  holdingLocks: Seq[String])

wankunde

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Spark是如何实现远程Thread Dump 的？

问题研究入口还是从web页面ExecutorThreadDumpPage入手，提供ThreadDump的入口在SparkContext的getExecutorThreadDump方法。// 1. 查看Executor Thread页面入口，通过 executorId 获取ThreadDump,private[ui] class ExecutorThreadDumpPage( paren...
复制链接

扫一扫

专栏目录