Spark-Sql源码解析之三 Analyzer:Unresolved logical plan –> analyzed logical plan

本文详细探讨了Spark SQL的Analyzer阶段,包括ResolveRelations解析数据源,ResolveReferences处理输出属性,ResolveSortReferences处理排序引用,ResolveFunctions解析用户自定义函数,GlobalAggregates解析全局聚合函数,以及UnresolvedHavingClauseAttributes解析having子句的过滤条件。通过对这些步骤的解析,揭示了Spark如何将未解析的逻辑计划转化为分析后的逻辑计划。
摘要由CSDN通过智能技术生成

Analyzer主要职责就是将通过Sql Parser未能Resolved的Logical Plan给Resolved掉。

lazy val analyzed: LogicalPlan = analyzer.execute(logical)//分析过的LogicalPlan
protected[sql] lazy val analyzer: Analyzer =
  new Analyzer(catalog, functionRegistry, conf) {
    override val extendedResolutionRules =
      ExtractPythonUdfs ::
      sources.PreInsertCastAndRename ::
      Nil
    override val extendedCheckRules = Seq(
      sources.PreWriteCheck(catalog)
    )
  }
class Analyzer(
    catalog: Catalog,
    registry: FunctionRegistry,
    conf: CatalystConf,
    maxIterations: Int = 100)
  extends RuleExecutor[LogicalPlan] with HiveTypeCoercion with CheckAnalysis {
  def resolver: Resolver = {
    if (conf.caseSensitiveAnalysis) {
      caseSensitiveResolution
    } else {
      caseInsensitiveResolution
    }
  }

  val fixedPoint = FixedPoint(maxIterations)

  /**
   * Override to provide additional rules for the "Resolution" batch.
   */
  val extendedResolutionRules: Seq[Rule[LogicalPlan]] = Nil

  lazy val batches: Seq[Batch] = Seq(//不同的Batch代表不同的策略
    Batch("Substitution", fixedPoint,
      CTESubstitution ::
      WindowsSubstitution ::
      Nil : _*),
    Batch("Resolution", fixedPoint,
      //通过catalog解析表名
      ResolveRelations ::
      //解析从子节点的操作生成的属性,一般是别名引起的,比如a.id
      ResolveReferences ::
      ResolveGroupingAnalytics ::
      //在select语言里,order by的属性往往在前面没写,查询的时候也需要把这些字段查出来,排序完毕之后再删除
      ResolveSortReferences ::
      ResolveGenerate ::
      //解析函数
      ResolveFunctions ::
      ExtractWindowExpressions ::
      //解析全局的聚合函数,比如select sum(score) from table
      GlobalAggregates ::
      //解析having子句后面的聚合过滤条件,比如having sum(score) > 400
      UnresolvedHavingClauseAttributes ::
      //typeCoercionRules是hive的类型转换规则
      TrimGroupingAliases ::
      typeCoercionRules ++
      extendedResolutionRules : _*)
  )
…
}

其中val analyzed: LogicalPlan= analyzer.execute(logical),logical就是sqlparser解析出来的unresolved logical plan,analyzed就是analyzed logical plan。那么exectue究竟是这么样的过程呢?

def execute(plan: TreeType): TreeType = {
  var curPlan = plan
  batches.foreach { batch =>//针对每个Batch进行处理
    val batchStartPlan = curPlan
    var iteration = 1
    var lastPlan = curPlan
    var continue = true
    // Run until fix point (or the max number of iterations as specified in the strategy.
    while (continue) {//只要对这个plan应用这个batch里面的所有rule之后,最后生成的plan没有发生变化才认为所有都遍历过了,只要有变化,就继续遍历
      //fold函数操作遍历问题集合的顺序。foldLeft是从左开始计算,然后往右遍历。foldRight是从右开始算,然后往左遍历。
      curPlan = batch.rules.foldLeft(curPlan) {
        case (plan, rule) =>
          val result = rule(plan)//对这个plan应用rule.apply转化里面的TreeNode
          logInfo(s"plan (${plan}) \n result (${result}) \n rule (${rule})")//加这个打印可以看到每个plan应用之后的result是什么,方便后面讲解
          if (!result.fastEquals(plan)) {
            logTrace(
              s"""
                |=== Applying Rule ${rule.ruleName} ===
                |${sideBySide(plan.treeString, result.treeString).mkString("\n")}
              """.stripMargin)
          }
          result
      }
      iteration += 1
      if (iteration > batch.strategy.maxIterations) {
        // Only log if this is a rule that is supposed to run more than once.
        if (iteration != 2) {
          logInfo(s"Max iterations (${iteration - 1}) reached for batch ${batch.name}")
        }
        continue = false
      }
      if (curPlan.fastEquals(lastPlan)) {
        logTrace(
          s"Fixed point reached for batch ${batch.name} after ${iteration - 1} iterations.")
        continue = false
      }
      lastPlan = curPlan
    }
    if (!batchStartPlan.fastEquals(curPlan)) {
      logDebug(
        s"""
        |=== Result of Batch ${batch.name} ===
        |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")}
      """.stripMargin)
    } else {
      logTrace(s"Batch ${batch.name} has no effect.")
    }
  }
  curPlan
}

重点在于以下这个函数:

val result = rule(plan)//对这个plan应用rule.apply转化里面的TreeNode

rule(plan)调用的是对应的Rule[LogicalPlan]对象里面的apply函数,例如ResolveRelations和ResolveReferences

object ResolveRelations extends Rule[LogicalPlan] {
  def getTable(u: UnresolvedRelation): LogicalPlan = {
    try {
      catalog.lookupRelation(u.tableIdentifier, u.alias)
    } catch {
      case _: NoSuchTableException =>
        u.failAnalysis(s"no such table ${u.tableName}")
    }
  }
  //输入(plan)logical 返回logical,transform是遍历各个节点,对每个节点应用该rule
  def apply(plan: LogicalPlan): LogicalPlan = plan transform {//调用transformDown,本质上就是二叉树的前序(pre-order
)遍历
    case i@InsertIntoTable(u: UnresolvedRelation, _, _, _, _) =>
      i.copy(table = EliminateSubQueries(getTable(u)))
    case u: UnresolvedRelation =>
      getTable(u)
  }
}

object ResolveReferences extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {// transformUp本质上就是二叉树的后序(post-order
)遍历
    case p: LogicalPlan if !p.childrenResolved => p

    // If the projection list contains Stars, expand it.
    case p @ Project(projectList, child) if containsStar(projectList) =&g
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值