附录A Spark2.1核心工具类Utils

最新推荐文章于 2021-08-02 10:18:11 发布

泰山不老生

最新推荐文章于 2021-08-02 10:18:11 发布

阅读量3.5k

点赞数 6

分类专栏： Spark 大数据 Scala Java 深入理解Spark 文章标签： Spark2.0 Spark2 Spark2.1 Utils.scala Spark core

本文链接：https://blog.csdn.net/beliefer/article/details/77449771

版权

本文详细介绍了Spark2.1中的核心工具类Utils，包括getSystemProperties、localHostName等方法，这些方法在Spark Core中广泛使用，提供基础功能。文章旨在辅助《Spark内核设计的艺术架构设计与实现》一书的读者更好地理解和查阅相关内容。

摘要由CSDN通过智能技术生成

注：本文是为了配合《Spark内核设计的艺术架构设计与实现》一书的内容而编写，目的是为了节省成本、方便读者查阅。书中附录A的内容都在本文呈现。

Utils是Spark最常用的工具类之一，Spark Core大量使用了此类提供的基础功能。即使不关心其实现也不会对理解本书对Spark源码的分析有太多影响。下面将逐个介绍Utils提供的方法。

getSystemProperties

功能描述：获取系统属性的键值对。

  def getSystemProperties: Map[String, String] = {
    System.getProperties.stringPropertyNames().asScala
      .map(key => (key, System.getProperty(key))).toMap
  }

localHostName

功能描述：获取本地机器名。

  def localHostName(): String = {
    customHostname.getOrElse(localIpAddress.getHostAddress)
  }

getDefaultPropertiesFile

功能描述：获取默认的Spark属性文件。

  def getDefaultPropertiesFile(env: Map[String, String] = sys.env): String = {
    env.get("SPARK_CONF_DIR")
      .orElse(env.get("SPARK_HOME").map { t => s"$t${File.separator}conf" })
      .map { t => new File(s"$t${File.separator}spark-defaults.conf")}
      .filter(_.isFile)
      .map(_.getAbsolutePath)
      .orNull
  }

getPropertiesFromFile

功能描述：从文件中获取属性。

  def getPropertiesFromFile(filename: String): Map[String, String] = {
    val file = new File(filename)
    require(file.exists(), s"Properties file $file does not exist")
    require(file.isFile(), s"Properties file $file is not a normal file")

    val inReader = new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8)
    try {
      val properties = new Properties()
      properties.load(inReader)
      properties.stringPropertyNames().asScala.map(
        k => (k, properties.getProperty(k).trim)).toMap
    } catch {
      case e: IOException =>
        throw new SparkException(s"Failed when loading Spark properties from $filename", e)
    } finally {
      inReader.close()
    }
  }

loadDefaultSparkProperties

功能描述：加载指定文件中的Spark属性，如果没有指定文件，则加载默认Spark属性文件的属性。

  def loadDefaultSparkProperties(conf: SparkConf, filePath: String = null): String = {
    val path = Option(filePath).getOrElse(getDefaultPropertiesFile())
    Option(path).foreach { confFile =>
      getPropertiesFromFile(confFile).filter { case (k, v) =>
        k.startsWith("spark.")
      }.foreach { case (k, v) =>
        conf.setIfMissing(k, v)
        sys.props.getOrElseUpdate(k, v)
      }
    }
    path
  }

getCallSite

功能描述：获取当前SparkContext的当前调用堆栈，将栈里最靠近栈底的属于Spark或者Scala核心的类压入callStack的栈顶，并将此类的方法存入lastSparkMethod；将栈里最靠近栈顶的用户类放入callStack，将此类的行号存入firstUserLine，类名存入firstUserFile，最终返回的样例类CallSite存储了最短栈和长度默认为20的最长栈的样例类。在JavaWordCount例子中，获得的数据如下：

最短栈：getOrCreate atJavaWordCount.java:48；

最长栈：org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)

org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:48)

  def getCallSite(skipClass: String => Boolean = sparkInternalExclusionFunction): CallSite = {
    var lastSparkMethod = "<unknown>"
    var firstUserFile = "<unknown>"
    var firstUserLine = 0
    var insideSpark = true
    var callStack = new ArrayBuffer[String]() :+ "<unknown>"

    Thread.currentThread.getStackTrace().foreach { ste: StackTraceElement =>
      if (ste != null && ste.getMethodName != null
        && !ste.getMethodName.contains("getStackTrace")) {
        if (insideSpark) {
          if (skipClass(ste.getClassName)) {
            lastSparkMethod = if (ste.getMethodName == "<init>") {
              // Spark method is a constructor; get its class name
              ste.getClassName.substring(ste.getClassName.lastIndexOf('.') + 1)
            } else {
              ste.getMethodName
            }
            callStack(0) = ste.toString // Put last Spark method on top of the stack trace.
          } else {
            if (ste.getFileName != null) {
              firstUserFile = ste.getFileName
              if (ste.getLineNumber >= 0) {
                firstUserLine = ste.getLineNumber
              }
            }
            callStack += ste.toString
            insideSpark = false
          }
        } else {
          callStack += ste.toString
        }
      }
    }

    val callStackDepth = System.getProperty("spark.callstack.depth", "20").toInt
    val shortForm =
      if (firstUserFile == "HiveSessionImpl.java") {
        // To be more user friendly, show a nicer string for queries submitted from the JDBC
        // server.
        "Spark JDBC Server Query"
      } else {
        s"$lastSparkMethod at $firstUserFile:$firstUserLine"
      }
    val longForm = callStack.take(callStackDepth).mkString("\n")

    CallSite(shortForm, longForm)
  }

tryOrStopSparkContext

功能描述：用于在执行目标方法抛出异常后新启一个用于异步停止SparkContext的线程。

  def tryOrStopSparkContext(sc: SparkContext)(block: => Unit) {
    try {
      block
    } catch {
      case e: ControlThrowable => throw e
      case t: Throwable =>
        val currentThreadName = Thread.currentThread().getName
        if (sc != null) {
          logError(s"uncaught error in thread $currentThreadName, stopping SparkContext", t)
          sc.stopInNewThread()
        }
        if (!NonFatal(t)) {
          logError(s"throw uncaught fatal error in thread $currentThreadName", t)
          throw t
        }
    }
  }

getCurrentUserName

功能描述：用于获取当前用户的用户名。此用户名默认为当前系统的登录用户，也可以通过系统环境变量SPARK_USER进行设置。

  def getCurrentUserName(): String = {
    Option(System.getenv("SPARK_USER"))
      .getOrElse(UserGroupInformation.getCurrentUser().getShortUserName())
  }