SparkSQL中DataFrame registerTempTable源码浅析

最新推荐文章于 2021-12-15 12:05:42 发布

u011180846

最新推荐文章于 2021-12-15 12:05:42 发布

阅读量512

点赞数

分类专栏： Spark 文章标签：大数据实时计算 SparkSQL

Spark 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

dataFrame.registerTempTable(tableName);
最近在使用SparkSQL时想到1万条数据注册成临时表和1亿条数据注册成临时表时，效率上是否会有很大的差距，也对DataFrame注册成临时表到底做了哪些比较好奇，拿来源码拜读了下相关部分，记录一下。

临时表的生命周期是和创建该DataFrame的SQLContext有关系的，SQLContext生命周期结束，该临时表的生命周期也结束了

DataFrame.scala相关源码
/**
   * Registers this [[DataFrame]] as a temporary table using the given name. The lifetime of this
   * temporary table is tied to the [[SQLContext]] that was used to create this DataFrame.
   *
   * @group basic
   * @since 1.3.0
   */
def registerTempTable(tableName: String): Unit = {
    sqlContext.registerDataFrameAsTable(this, tableName)
}

DataFrame中的registerTempTable调用SQLContext中的registerDataFrameAsTable,
SQLContext中使用SimpleCatalog类去实现Catalog接口中的registerTable方法.

SQLContext.scala相关源码
@transient
protected[sql] lazy val catalog: Catalog = new SimpleCatalog(conf)
/**
   * Registers the given [[DataFrame]] as a temporary table in the catalog. Temporary tables exist
   * only during the lifetime of this instance of SQLContext.
   */
private[sql] def registerDataFrameAsTable(df: DataFrame, tableName: String): Unit = {
    catalog.registerTable(Seq(tableName), df.logicalPlan)
}

    在SimpleCatalog中定义了Map，registerTable中按tableIdentifier为key，logicalPlan为Value注册到名为tables的map中

Catalog.scala相关源码
val tables = new mutable.HashMap[String, LogicalPlan]()
override def registerTable(
      tableIdentifier: Seq[String],
      plan: LogicalPlan): Unit = {
    val tableIdent = processTableIdentifier(tableIdentifier)
    tables += ((getDbTableName(tableIdent), plan))
}

protected def processTableIdentifier(tableIdentifier: Seq[String]): Seq[String] = {
    if (conf.caseSensitiveAnalysis) {
      tableIdentifier
    } else {
      tableIdentifier.map(_.toLowerCase)
    }
}

protected def getDbTableName(tableIdent: Seq[String]): String = {
    val size = tableIdent.size
    if (size <= 2) {
      tableIdent.mkString(".")
    } else {
      tableIdent.slice(size - 2, size).mkString(".")
    }
}

阅读以上代码，最终registerTempTable是将表名(或表的标识)和对应的逻辑计划加载到Map中，并随着SQLContext的消亡而消亡

u011180846

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SparkSQL中DataFrame registerTempTable源码浅析

dataFrame.registerTempTable(tableName); 最近在使用SparkSQL时想到1万条数据注册成临时表和1亿条数据注册成临时表时，效率上是否会有很大的差距，也对DataFrame注册成临时表到底做了哪些比较好奇，拿来源码拜读了下相关部分，记录一下。临时表的生命周期是和创建该DataFrame的SQLContext有关系的，SQLContext生命周期结...
复制链接

扫一扫

专栏目录