spark-sql剖析

18 篇文章 1 订阅
16 篇文章 0 订阅
SQL 基本组成

Projection, Data Source, Filter

SQL 解析

SQL语句首先被Parser模块解析成Unresolved Logical Plan;
Unresolved Logical Plan通过Analyzer模块借助于Catalog中的表信息解析为Logical Plan;
Optimizer再通过各种基于规则的优化策略进行深入优化,得到Optimized Logical Plan;
优化后的逻辑执行计划依然是逻辑的,并不能被Spark系统理解,此时需要将此逻辑执行计划转换为Physical Plan。

源码探究

SessionCatalog

An internal catalog that is used by a Spark Session. This internal catalog serves as a proxy to the underlying metastore (e.g. Hive Metastore) and it also manages temporary views and functions of the Spark Session that it belongs to.
This class must be thread-safe.

UnresolvedRelation
/**
 * Holds the name of a relation that has yet to be looked up in a catalog.(保存尚未在目录中查找的关系的名称。)
 *
 * @param tableIdentifier table name
 */
case class UnresolvedRelation(tableIdentifier: TableIdentifier)
  extends LeafNode {

  /** Returns a `.` separated name for this relation. */
  def tableName: String = tableIdentifier.unquotedString

  override def output: Seq[Attribute] = Nil

  override lazy val resolved = false
}
sql

sql 入口解析

SparkSession.scala
SparkSession 通常为 pplication的入口.

/**
   * Executes a SQL query using Spark, returning the result as a `DataFrame`.
   * The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.
   *
   * @since 2.0.0
   */
  def sql(sqlText: String): DataFrame = {
    Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
  }

ParseDriver.scala
parsePlan
生成 unresolved LogicalPlan

// abstract class AbstractSqlParser`
/** Creates LogicalPlan for a given SQL string. */
  override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>
    astBuilder.visitSingleStatement(parser.singleStatement()) match {
      case plan: LogicalPlan => plan
      case _ =>
        val position = Origin(None, None)
        throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position)
    }
  }

回到 ofRows

def ofRows(sparkSession: SparkSession, logicalPlan: LogicalPlan): DataFrame = {
    val qe = sparkSession.sessionState.executePlan(logicalPlan)
    qe.assertAnalyzed()
    new Dataset[Row](sparkSession, qe, RowEncoder(qe.analyzed.schema))
  }

sparkSession.sessionState.executePlan(logicalPlan)

//SessionState.scala
def executePlan(plan: LogicalPlan): QueryExecution = createQueryExecution(plan)
//BaseSessionStateBuilder.scala
/**
   * Create a query execution object.
   */
  protected def createQueryExecution: LogicalPlan => QueryExecution = { plan =>
    new QueryExecution(session, plan)
  }

以下有三个环节:Analyzed、Optimized、 Physical
Note: 因为是懒执行,实际运行过程,执行顺序是回溯式的,SparkPlan -> optimizedPlan -> analyzed

//QueryExecution.scala
class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan)
{
...
lazy val analyzed: LogicalPlan = {
    SparkSession.setActiveSession(sparkSession)
    sparkSession.sessionState.analyzer.executeAndCheck(logical)
  }

  lazy val withCachedData: LogicalPlan = {
    assertAnalyzed()
    assertSupported()
    sparkSession.sharedState.cacheManager.useCachedData(analyzed)
  }

  lazy val optimizedPlan: LogicalPlan = sparkSession.sessionState.optimizer.execute(withCachedData)

  lazy val sparkPlan: SparkPlan = {
    SparkSession.setActiveSession(sparkSession)
    // TODO: We use next(), i.e. take the first plan returned by the planner, here for now,
    //       but we will implement to choose the best plan.
    planner.plan(ReturnAnswer(optimizedPlan)).next()
  }
}

SparkPlanner 继承于 SparkStrategies, SparkStrategies 继承于 QueryPlanner

planner.plan

用于生成物理执行计划

strategies
//SparkPlanner.scala
override def strategies: Seq[Strategy] =
    experimentalMethods.extraStrategies ++
      extraPlanningStrategies ++ (
      DataSourceV2Strategy ::
      FileSourceStrategy ::
      DataSourceStrategy(conf) ::
      SpecialLimits ::
      Aggregation ::
      JoinSelection ::
      InMemoryScans ::
      BasicOperators :: Nil)
plan
// QueryPlanner.scala
def plan(plan: LogicalPlan): Iterator[PhysicalPlan] = {

    val candidates: Iterator[PhysicalPlan] = strategies.iterator.flatMap(_(plan))

    
    val plans = candidates.flatMap { candidate =>
      val placeholders = collectPlaceholders(candidate)

      if (placeholders.isEmpty) {
        // Take the candidate as is because it does not contain placeholders.
        Iterator(candidate)
      } else {
        // Plan the logical plan marked as [[planLater]] and replace the placeholders.
        placeholders.iterator.foldLeft(Iterator(candidate)) {
          case (candidatesWithPlaceholders, (placeholder, logicalPlan)) =>
            // Plan the logical plan for the placeholder.
            val childPlans = this.plan(logicalPlan)

            candidatesWithPlaceholders.flatMap { candidateWithPlaceholders =>
              childPlans.map { childPlan =>
                // Replace the placeholder by the child plan
                candidateWithPlaceholders.transformUp {
                  case p if p == placeholder => childPlan
                }
              }
            }
        }
      }
    }

    val pruned = prunePlans(plans)
    assert(pruned.hasNext, s"No plan for $plan")
    pruned
  }
SQL 语句执行流程

在这里插入图片描述

一般来讲,对于 Spark SQL 系统,从 SQL到 Spark 中 RDD 的执行需要经过两个大的阶段,分别是逻辑计划( LogicalPlan)和物理计划(PhysicalPlan )

逻辑计划阶段会将用户所写的 SQL 语句转换成树型数据结构(逻辑算子树), SQL 语句中蕴含的逻辑映射到逻辑算子树的不同节点 。 顾名思义,逻辑计划阶段生成的逻辑算子树并不会直接提交执行,仅作为中间阶段 。 最终逻辑算子树的生成过程经历 3 个子阶段,分别对应未解析的逻辑算子树( Unresolved Logica!Plan ,仅仅是数据结构,不包含任何数据信息等)、解析后的逻辑算子树( Analyzed LogicalPlan ,节点中绑定各种信息)和优化后的逻辑算子树( Optimized LogicalPlan ,应用各种优化规则对一些低效的逻辑计划进行转换) 。

物理计划阶段将上一步逻辑计划阶段生成的逻辑算子树进行进一步转换,生成物理算子树 。物理算子树的节点会直接生成 RDD 或对 RDD 进行 transformation 操作(注:每个物理计划节点中都实现了对 RDD 进行转换的 execute 方法) 。 同样地,物理计划阶段也包含 3 个子阶段:首先,根据逻辑算子树,生成物理算子树的列表 Iterator[PhysicalPlan] (同样的逻辑算子树可能对应多个物理算子树);然后,从列表中按照一定的策略选取最优的物理算子树( SparkPlan);最后,对选取的物理算子树进行提交前的准备工作,例如,确保分区操作正确、物理算子树节点重用、执行代码生成等,得到“准备后”的物理算子树( Prepared SparkPlan ) 。经过上述步骤后 ,物理算子树生成的 RDD 执行 action 操作(如例子中的 show),即可提交执行 。

从 SQL 语句的解析一直到提交之前,上述整个转换过程都在 Spark 集群的 Driver 端进行,不涉及分布式环境 。 SparkSession 类的 sq!方法调用 SessionState 中的各种对象 ,包括上述不同阶段对应的 SparkSqlParser 类、 Analyzer 类、 Optimizer 类和 SparkPlanner 类等,最后封装成一个QueryExecution 对象 。 因此,在进行 Spark SQL 开发时,可以很方便地将每一步生成的计划单独剥离出来分析。

unresolved table
  override def visitShowPartitions(ctx: ShowPartitionsContext): LogicalPlan = withOrigin(ctx) {
    val partitionKeys = Option(ctx.partitionSpec).map { specCtx =>
      UnresolvedPartitionSpec(visitNonOptionalPartitionSpec(specCtx), None)
    }
    ShowPartitions(
      createUnresolvedTable(ctx.multipartIdentifier(), "SHOW PARTITIONS"),
      partitionKeys)
  }
  private def createUnresolvedTable(
      ctx: MultipartIdentifierContext,
      commandName: String,
      relationTypeMismatchHint: Option[String] = None): UnresolvedTable = withOrigin(ctx) {
    UnresolvedTable(visitMultipartIdentifier(ctx), commandName, relationTypeMismatchHint)
  }
override def visitMultipartIdentifier(ctx: MultipartIdentifierContext): Seq[String] =
    withOrigin(ctx) {
      ctx.parts.asScala.map(_.getText).toSeq
    }
lookupTableOrView
      case u @ UnresolvedTable(identifier, cmd, relationTypeMismatchHint) =>
        lookupTableOrView(identifier).map {
          case v: ResolvedView =>
            throw QueryCompilationErrors.expectTableNotViewError(
              v, cmd, relationTypeMismatchHint, u)
          case table => table
        }.getOrElse(u)
        

将table 变db.table

    private def lookupTableOrView(identifier: Seq[String]): Option[LogicalPlan] = {
      lookupTempView(identifier).map { _ =>
        ResolvedView(identifier.asIdentifier, isTemp = true)
      }.orElse {
        expandIdentifier(identifier) match {
          case CatalogAndIdentifier(catalog, ident) =>
            CatalogV2Util.loadTable(catalog, ident).map {
              case v1Table: V1Table if CatalogV2Util.isSessionCatalog(catalog) &&
                v1Table.v1Table.tableType == CatalogTableType.VIEW =>
                ResolvedView(ident, isTemp = false)
              case table =>
                ResolvedTable.create(catalog.asTableCatalog, ident, table)
            }
          case _ => None
        }
      }
    }

  // If we are resolving database objects (relations, functions, etc.) insides views, we may need to
  // expand single or multi-part identifiers with the current catalog and namespace of when the
  // view was created.
  private def expandIdentifier(nameParts: Seq[String]): Seq[String] = {
    if (!isResolvingView || isReferredTempViewName(nameParts)) return nameParts

    if (nameParts.length == 1) {
      AnalysisContext.get.catalogAndNamespace :+ nameParts.head
    } else if (catalogManager.isCatalogRegistered(nameParts.head)) {
      nameParts
    } else {
      AnalysisContext.get.catalogAndNamespace.head +: nameParts
    }
  }
  /**
   * Extract catalog and identifier from a multi-part name with the current catalog if needed.
   * Catalog name takes precedence over identifier, but for a single-part name, identifier takes
   * precedence over catalog name.
   *
   * Note that, this pattern is used to look up permanent catalog objects like table, view,
   * function, etc. If you need to look up temp objects like temp view, please do it separately
   * before calling this pattern, as temp objects don't belong to any catalog.
   */
  object CatalogAndIdentifier {
    import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper

    private val globalTempDB = SQLConf.get.getConf(StaticSQLConf.GLOBAL_TEMP_DATABASE)

    def unapply(nameParts: Seq[String]): Option[(CatalogPlugin, Identifier)] = {
      assert(nameParts.nonEmpty)
      if (nameParts.length == 1) {
        Some((currentCatalog, Identifier.of(catalogManager.currentNamespace, nameParts.head)))
      } else if (nameParts.head.equalsIgnoreCase(globalTempDB)) {
        // Conceptually global temp views are in a special reserved catalog. However, the v2 catalog
        // API does not support view yet, and we have to use v1 commands to deal with global temp
        // views. To simplify the implementation, we put global temp views in a special namespace
        // in the session catalog. The special namespace has higher priority during name resolution.
        // For example, if the name of a custom catalog is the same with `GLOBAL_TEMP_DATABASE`,
        // this custom catalog can't be accessed.
        Some((catalogManager.v2SessionCatalog, nameParts.asIdentifier))
      } else {
        try {
          Some((catalogManager.catalog(nameParts.head), nameParts.tail.asIdentifier))
        } catch {
          case _: CatalogNotFoundException =>
            Some((currentCatalog, nameParts.asIdentifier))
        }
      }
    }
  }

关键是这行代码,组装了 新的 Identifier

Some((currentCatalog, Identifier.of(catalogManager.currentNamespace, nameParts.head)))
  def currentNamespace: Array[String] = {
    val defaultNamespace = if (currentCatalog.name() == SESSION_CATALOG_NAME) {
      Array(v1SessionCatalog.getCurrentDatabase)
    } else {
      currentCatalog.defaultNamespace()
    }

    this.synchronized {
      _currentNamespace.getOrElse {
        defaultNamespace
      }
    }
  }
listPartitionNames 可以自动判断database 是否存在
  test("new partitions should be added to catalog after writing to catalog table") {
    val table = "partitioned_catalog_table"
    val tempTable = "partitioned_catalog_temp_table"
    val numParts = 210
    withTable(table) {
      withTempView(tempTable) {
        val df = (1 to numParts).map(i => (i, i)).toDF("part", "col1")
        df.createOrReplaceTempView(tempTable)
        sql(s"CREATE TABLE $table (part Int, col1 Int) USING parquet PARTITIONED BY (part)")
        sql(s"INSERT INTO TABLE $table SELECT * from $tempTable")
        val partitions = spark.sessionState.catalog.listPartitionNames(TableIdentifier(table))
        assert(partitions.size == numParts)
      }
    }
  }

listPartitionNames 可以 获取 getCurrentDatabase

val db = formatDatabaseName(tableName.database.getOrElse(getCurrentDatabase))

  def listPartitionNames(
      tableName: TableIdentifier,
      partialSpec: Option[TablePartitionSpec] = None): Seq[String] = {
    val db = formatDatabaseName(tableName.database.getOrElse(getCurrentDatabase))
    val table = formatTableName(tableName.table)
    requireDbExists(db)
    requireTableExists(TableIdentifier(table, Option(db)))
    partialSpec.foreach { spec =>
      requirePartialMatchedPartitionSpec(Seq(spec), getTableMetadata(tableName))
      requireNonEmptyValueInPartitionSpec(Seq(spec))
    }
    externalCatalog.listPartitionNames(db, table, partialSpec)
  }
case class TableIdentifier(table: String, database: Option[String])
  extends IdentifierWithDatabase {

  override val identifier: String = table

  def this(table: String) = this(table, None)
}

示例
  protected def uncacheTable(tableName: String): Unit = {
    val tableIdent: TableIdentifier = spark.sessionState.sqlParser.parseTableIdentifier(tableName)
    val cascade = !spark.sessionState.catalog.isTempView(tableIdent)
    spark.sharedState.cacheManager.uncacheQuery(
      spark,
      spark.table(tableName).logicalPlan,
      cascade = cascade,
      blocking = true)
  }
两个表传递数据
insert into student SELECT * FROM odps.mc_test.student;
CREATE TABLE student USING CSV AS SELECT * FROM odps.mc_test.student;
Unresolved 相关
UnresolvedTable
/**
 * Holds the name of a table that has yet to be looked up in a catalog. It will be resolved to
 * [[ResolvedTable]] during analysis.
 */
case class UnresolvedTable(
    multipartIdentifier: Seq[String],
    commandName: String,
    relationTypeMismatchHint: Option[String]) extends LeafNode {
  override lazy val resolved: Boolean = false

  override def output: Seq[Attribute] = Nil
}
UnresolvedAttribute

注意单引号的地方

/**
 * Holds the name of an attribute that has yet to be resolved.
 */
case class UnresolvedAttribute(nameParts: Seq[String]) extends Attribute with Unevaluable {

  def name: String =
    nameParts.map(n => if (n.contains(".")) s"`$n`" else n).mkString(".")
    
  override def toString: String = s"'$name"

UnresolvedAttribute 构造实例

  private val i: Expression = UnresolvedAttribute("i")
  private val d1: Expression = UnresolvedAttribute("d1")
  private val d2: Expression = UnresolvedAttribute("d2")
  private val u: Expression = UnresolvedAttribute("u")
  private val f: Expression = UnresolvedAttribute("f")
  private val b: Expression = UnresolvedAttribute("b")
ScalarSubquery 构造实例
  override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressionsWithPruning(
    _.containsPattern(EXISTS_SUBQUERY)) {
    case exists: Exists if exists.children.isEmpty =>
      IsNotNull(
        ScalarSubquery(
          plan = Limit(Literal(1), Project(Seq(Alias(Literal(1), "col")()), exists.plan)),
          exprId = exists.exprId))
  }

相应UT

  test("scalar sub-query") {
    assertEqual(
      "select (select max(b) from s) ss from t",
      table("t").select(ScalarSubquery(table("s").select('max.function('b))).as("ss")))

assertEqual 的后者 logical plan 展开如下
在这里插入图片描述

单引号效果
  table("t").select(ScalarSubquery(table("s").
    select('max.function(UnresolvedAttribute("b")))).as("ss"))

以上效果等同于

table("t").select(ScalarSubquery(table("s").select('max.function('b))).as("ss"))

其中max 亦可如下构造

在这里插入图片描述

常见的逻辑计划拆解
      provider = provider.get match {
        case name if name.equalsIgnoreCase("aliorc") => Some("orc")
        case _ => provider
      }
模式匹配
    if(right.isInstanceOf[UnresolvedFunction] &&
      right.asInstanceOf[UnresolvedFunction].nameParts.head.equalsIgnoreCase("max_pt")) {
      val unresolvedFunction = right.asInstanceOf[UnresolvedFunction]
      println("unresolvedFunction.arguments.head.toString:" + unresolvedFunction.arguments.head.toString)
      right = ScalarSubquery(table(unresolvedFunction.arguments.head.asInstanceOf[UnresolvedAttribute].nameParts.head)
        .select('max.function(left)))
    }
sparkSession 转义点分制tablename 的方式
  def table(tableName: String): DataFrame = {
    assertNoSpecifiedSchema("table")
    val multipartIdentifier =
      sparkSession.sessionState.sqlParser.parseMultipartIdentifier(tableName)
    Dataset.ofRows(sparkSession, UnresolvedRelation(multipartIdentifier,
      new CaseInsensitiveStringMap(extraOptions.toMap.asJava)))
  }
spark hint G4
hint
    : '/*+' hintStatements+=hintStatement (','? hintStatements+=hintStatement)* '*/'
    ;

hintStatement
    : hintName=identifier
    | hintName=identifier '(' parameters+=primaryExpression (',' parameters+=primaryExpression)* ')'
    ;
spark 读,合并小文件策略
  def maxSplitBytes(
                     sparkSession: SparkSession,
                     selectedPartitions: Seq[PartitionDirectory]): Long = {
    val defaultMaxSplitBytes = sparkSession.sessionState.conf.filesMaxPartitionBytes
    val openCostInBytes = sparkSession.sessionState.conf.filesOpenCostInBytes
    val minPartitionNum = sparkSession.sessionState.conf.filesMinPartitionNum
      .getOrElse(sparkSession.leafNodeDefaultParallelism)
    val totalBytes = selectedPartitions.flatMap(_.files.map(_.getLen + openCostInBytes)).sum
    val bytesPerCore = totalBytes / minPartitionNum

    Math.min(defaultMaxSplitBytes, Math.max(openCostInBytes, bytesPerCore))
  }
  • Math.max(openCostInBytes, bytesPerCore),这里当 totalBytes 很小,而 minPartitionNum很大时, bytesPerCore 会很小。 所以取其大者。
  • Math.min(defaultMaxSplitBytes, ; 外层之所以取其小者 ,是因为bytesPerCore可能很大,所以最高不能超过配置设置的上限defaultMaxSplitBytes 。
  • Math.min(defaultMaxSplitBytes, Math.max(openCostInBytes, bytesPerCore)) 外小内大,保证无论如何至少会切成等同于 minPartitionNum 数量的数据切片。
  • 当totalBytes 很大时,调大defaultMaxSplitBytes 到一定的值,可以保证切片数量等同于 minPartitionNum 数量。

大数据集可参见下图

在这里插入图片描述

debug 堆栈详图

对于 hive table (非text 格式), data source table。 均走以下路径。
在这里插入图片描述

一个parquet 文件有 1个 rowgroup, 但是拆成了29个部分。 后续就可以看出只有一个task 有效
在这里插入图片描述

因为 parquet 是以 rowgroup 为最小任务单元,其他task 只是空转而已。
在这里插入图片描述

FILES_MIN_PARTITION_NUM
  val FILES_MIN_PARTITION_NUM = buildConf("spark.sql.files.minPartitionNum")
    .doc("The suggested (not guaranteed) minimum number of split file partitions. " +
      "If not set, the default value is `spark.default.parallelism`. This configuration is " +
      "effective only when using file-based sources such as Parquet, JSON and ORC.")
    .version("3.1.0")
    .intConf
    .checkValue(v => v > 0, "The min partition number must be a positive integer.")
    .createOptional
spark.default.parallelism

local 模式下:

org.apache.spark.scheduler.local.LocalSchedulerBackend.scala

  override def defaultParallelism(): Int =
    scheduler.conf.getInt("spark.default.parallelism", totalCores)

集群模式下:
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.scala

  override def defaultParallelism(): Int = {
    conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
  }

          addressToExecutorId(executorAddress) = executorId
          totalCoreCount.addAndGet(cores)
          totalRegisteredExecutors.addAndGet(1)

在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值