Spark中Utils.getCallSite()的作用

二话不说,亮出源代码

def getCallSite(skipClass: String => Boolean = sparkInternalExclusionFunction): CallSite = {
  // Keep crawling up the stack trace until we find the first function not inside of the spark
  // package. We track the last (shallowest) contiguous Spark method. This might be an RDD
  // transformation, a SparkContext function (such as parallelize), or anything else that leads
  // to instantiation of an RDD. We also track the first (deepest) user method, file, and line.
  var lastSparkMethod = "<unknown>"
  var firstUserFile = "<unknown>"
  var firstUserLine = 0
  var insideSpark = true
  var callStack = new ArrayBuffer[String]() :+ "<unknown>"

  Thread.currentThread.getStackTrace().foreach { ste: StackTraceElement =>
    // When running under some profilers, the current stack trace might contain some bogus
    // frames. This is intended to ensure that we don't crash in these situations by
    // ignoring any frames that we can't examine.
    if (ste != null && ste.getMethodName != null
      && !ste.getMethodName.contains("getStackTrace")) {
      if (insideSpark) {
        if (skipClass(ste.getClassName)) {
          lastSparkMethod = if (ste.getMethodName == "<init>") {
            // Spark method is a constructor; get its class name
            ste.getClassName.substring(ste.getClassName.lastIndexOf('.') + 1)
          } else {
            ste.getMethodName
          }
          callStack(0) = ste.toString // Put last Spark method on top of the stack trace.
        } else {
          if (ste.getFileName != null) {
            firstUserFile = ste.getFileName
            if (ste.getLineNumber >= 0) {
              firstUserLine = ste.getLineNumber
            }
          }
          callStack += ste.toString
          insideSpark = false
        }
      } else {
        callStack += ste.toString
      }
    }
  }

  val callStackDepth = System.getProperty("spark.callstack.depth", "20").toInt
  val shortForm =
    if (firstUserFile == "HiveSessionImpl.java") {
      // To be more user friendly, show a nicer string for queries submitted from the JDBC
      // server.
      "Spark JDBC Server Query"
    } else {
      s"$lastSparkMethod at $firstUserFile:$firstUserLine"
    }
  val longForm = callStack.take(callStackDepth).mkString("\n")

  CallSite(shortForm, longForm)
}
 
首先这个方法返回的是一个CallSite对象,CallSite是Utils类的一个内部类,附上CallSite的源码
 
/** CallSite represents a place in user code. It can have a short and a long form. */
private[spark] case class CallSite(shortForm: String, longForm: String)

private[spark] object CallSite {
  val SHORT_FORM = "callSite.short"
  val LONG_FORM = "callSite.long"
  val empty = CallSite("", "")
}

 
这个对象是一个case class,case class 通常情况下被用做数据载体,也即是Java里面的VO,这个类里面保存了两个东西,
一个是SHORT_FORM,一个是LONG_FORM,字面上的意思是短格式和长格式,那么这两个东西究竟是什么东西呢?请看getCallSite()方法
Thread.currentThread.getStackTrace().foreach { ste: StackTraceElement =>
  // When running under some profilers, the current stack trace might contain some bogus
  // frames. This is intended to ensure that we don't crash in these situations by
  // ignoring any frames that we can't examine.
  if (ste != null && ste.getMethodName != null
    && !ste.getMethodName.contains("getStackTrace")) {
    if (insideSpark) {
      if (skipClass(ste.getClassName)) {
        lastSparkMethod = if (ste.getMethodName == "<init>") {
          // Spark method is a constructor; get its class name
          ste.getClassName.substring(ste.getClassName.lastIndexOf('.') + 1)
        } else {
          ste.getMethodName
        }
        callStack(0) = ste.toString // Put last Spark method on top of the stack trace.
      } else {
        if (ste.getFileName != null) {
          firstUserFile = ste.getFileName
          if (ste.getLineNumber >= 0) {
            firstUserLine = ste.getLineNumber
          }
        }
        callStack += ste.toString
        insideSpark = false
      }
    } else {
      callStack += ste.toString
    }
  }
}
分析代码可知,这个方法是取当前线程的堆栈信息,遍历堆栈,将方法名符合一定规则的放入栈顶
这个规则源码如下:
/** Default filtering function for finding call sites using `getCallSite`. */
private def sparkInternalExclusionFunction(className: String): Boolean = {
  // A regular expression to match classes of the internal Spark API's
  // that we want to skip when finding the call site of a method.
  val SPARK_CORE_CLASS_REGEX =
    """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?(\.broadcast)?\.[A-Z]""".r
  val SPARK_SQL_CLASS_REGEX = """^org\.apache\.spark\.sql.*""".r
  val SCALA_CORE_CLASS_PREFIX = "scala"
  val isSparkClass = SPARK_CORE_CLASS_REGEX.findFirstIn(className).isDefined ||
    SPARK_SQL_CLASS_REGEX.findFirstIn(className).isDefined
  val isScalaClass = className.startsWith(SCALA_CORE_CLASS_PREFIX)
  // If the class is a Spark internal class or a Scala class, then exclude.
  isSparkClass || isScalaClass
}
也即是符合
org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?(\.broadcast)?\.[A-Z]和
org\.apache\.spark\.sql.* 
以上两个正则表达式的方法名,赋值给lastSparkMethod并且将该栈元素放入栈顶,记住是每一次都放入栈顶,也即是覆盖之前的,得到的是最后一个方法名。以官方提供的LogQuery为栗子。最后得到的是这样的堆栈信息

如果不符合上面两个正则表达式的则将调用SparkContext的文件名放入firstUserFile变量,堆栈里的行数放入firstUserLine变量,并且将insideSpark赋值为负数,
这样堆栈里的下一个元素则直接放入callStack变量的最后
如果SparkContext是在HiveSessionImpl实例化的,则short_form标记为Spark JDBC Server Query
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值