spark读取文件源码分析-2


本篇接 上一篇,这一篇主要是针对job1的源码分析
在这里插入图片描述

1. job1产生时机源码分析

job1是在job0执行完在driver端往下执行的的时候发生的,所以先来复习一下job0的过程。


DataFrameReader.load()
DataFrameReader.loadV1Source()
DataSoure.resolveRelation()
DataSource.getOrInferFileFormatSchema()
new InMemoryFileIndex(sparkSession, globbedPaths, options, None, fileStatusCache)
InMemoryFileIndex.refresh0()
InMemoryFileIndex.listLeafFiles()
InMemoryFileIndex.bulkListLeafFiles()

job1产生的调用是在DataSoure.getOrInferFileFormatSchema()方法中。
再贴一遍这个方法

1. DataSoure.getOrInferFileFormatSchema()


  private def getOrInferFileFormatSchema(
      format: FileFormat,
      fileStatusCache: FileStatusCache = NoopCache): (StructType, StructType) = {
    // the operations below are expensive therefore try not to do them if we don't need to, e.g.,
    // in streaming mode, we have already inferred and registered partition columns, we will
    // never have to materialize the lazy val below
    // job0 这里定义的是lazy变量,最终使用的时候才会初始化
    lazy val tempFileIndex = {
      val allPaths = caseInsensitiveOptions.get("path") ++ paths
      val hadoopConf = sparkSession.sessionState.newHadoopConf()
      val globbedPaths = allPaths.toSeq.flatMap { path =>
        val hdfsPath = new Path(path)
        val fs = hdfsPath.getFileSystem(hadoopConf)
        val qualified = hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
        SparkHadoopUtil.get.globPathIfNecessary(fs, qualified)
      }.toArray
      //job0 这个地方初始化了InMemoryFileIndex 对象,也就是在这里形成了第一个job
      new InMemoryFileIndex(sparkSession, globbedPaths, options, None, fileStatusCache)
    }
    val partitionSchema = if (partitionColumns.isEmpty) {
      // Try to infer partitioning, because no DataSource in the read path provides the partitioning
      // columns properly unless it is a Hive DataSource
      // job0 在这里第一次真正使用lazy的tempFileIndex变量,也就促使了InMemoryFileIndex 的初始化。
      combineInferredAndUserSpecifiedPartitionSchema(tempFileIndex)
    } else {
      // maintain old behavior before SPARK-18510. If userSpecifiedSchema is empty used inferred
      // partitioning
      if (userSpecifiedSchema.isEmpty) {
        val inferredPartitions = tempFileIndex.partitionSchema
        inferredPartitions
      } else {
        val partitionFields = partitionColumns.map { partitionColumn =>
          userSpecifiedSchema.flatMap(_.find(c => equality(c.name, partitionColumn))).orElse {
            val inferredPartitions = tempFileIndex.partitionSchema
            val inferredOpt = inferredPartitions.find(p => equality(p.name, partitionColumn))
            if (inferredOpt.isDefined) {
              logDebug(
                s"""Type of partition column: $partitionColumn not found in specified schema
                   |for $format.
                   |User Specified Schema
                   |=====================
                   |${userSpecifiedSchema.orNull}
                   |
                   |Falling back to inferred dataType if it exists.
                 """.stripMargin)
            }
            inferredOpt
          }.getOrElse {
            throw new AnalysisException(s"Failed to resolve the schema for $format for " +
              s"the partition column: $partitionColumn. It must be specified manually.")
          }
        }
        StructType(partitionFields)
      }
    }

    // 到这里job0已经执行完并且返回回来了
    val dataSchema = userSpecifiedSchema.map { schema =>
      StructType(schema.filterNot(f => partitionSchema.exists(p => equality(p.name, f.name))))
    }.orElse {
      // job1 始于这里的调用
      format.inferSchema(
        sparkSession,
        caseInsensitiveOptions,
        tempFileIndex.allFiles())
    }.getOrElse {
      throw new AnalysisException(
        s"Unable to infer schema for $format. It must be specified manually.")
    }

    // We just print a waring message if the data schema and partition schema have the duplicate
    // columns. This is because we allow users to do so in the previous Spark releases and
    // we have the existing tests for the cases (e.g., `ParquetHadoopFsRelationSuite`).
    // See SPARK-18108 and SPARK-21144 for related discussions.
    try {
      SchemaUtils.checkColumnNameDuplication(
        (dataSchema ++ partitionSchema).map(_.name),
        "in the data schema and the partition schema",
        equality)
    } catch {
      case e: AnalysisException => logWarning(e.getMessage)
    }

    (dataSchema, partitionSchema)
  }


在这个方法中会先触发job0,在job0返回后会走下面的步骤,job0主要是文件分析,收集总共有多少个文件,每个文件的block信息等。
job1的主要功能是做data-schema的提取,job1的触发在这一段,下面的代码会进入到orElse{}部分

    // 到这里job0已经执行完并且返回回来了
    val dataSchema = userSpecifiedSchema.map { schema =>
      StructType(schema.filterNot(f => partitionSchema.exists(p => equality(p.name, f.name))))
    }.orElse {
      // job1 始于这里的调用
      format.inferSchema(
        sparkSession,
        caseInsensitiveOptions,
        tempFileIndex.allFiles())
    }.getOrElse {
      throw new AnalysisException(
        s"Unable to infer schema for $format. It must be specified manually.")
    }

对应的format的类型是ParquetFileFormat,所以就是调用了ParquetFileFormat.inferSchema()

2. ParquetFileFormat.inferSchema

  /**
   * When possible, this method should return the schema of the given `files`.  When the format
   * does not support inference, or no valid files are given should return None.  In these cases
   * Spark will require that user specify the schema manually.
   */

  override def inferSchema(
      sparkSession: SparkSession,
      parameters: Map[String, String],
      files: Seq[FileStatus]): Option[StructType] = {
    val parquetOptions = new ParquetOptions(parameters, sparkSession.sessionState.conf)

    // Should we merge schemas from all Parquet part-files?
    val shouldMergeSchemas = parquetOptions.mergeSchema

    val mergeRespectSummaries = sparkSession.sessionState.conf.isParquetSchemaRespectSummaries

    val filesByType = splitFiles(files)

    // Sees which file(s) we need to touch in order to figure out the schema.
    //
    // Always tries the summary files first if users don't require a merged schema.  In this case,
    // "_common_metadata" is more preferable than "_metadata" because it doesn't contain row
    // groups information, and could be much smaller for large Parquet files with lots of row
    // groups.  If no summary file is available, falls back to some random part-file.
    //
    // NOTE: Metadata stored in the summary files are merged from all part-files.  However, for
    // user defined key-value metadata (in which we store Spark SQL schema), Parquet doesn't know
    // how to merge them correctly if some key is associated with different values in different
    // part-files.  When this happens, Parquet simply gives up generating the summary file.  This
    // implies that if a summary file presents, then:
    //
    //   1. Either all part-files have exactly the same Spark SQL schema, or
    //   2. Some part-files don't contain Spark SQL schema in the key-value metadata at all (thus
    //      their schemas may differ from each other).
    //
    // Here we tend to be pessimistic and take the second case into account.  Basically this means
    // we can't trust the summary files if users require a merged schema, and must touch all part-
    // files to do the merge.
    val filesToTouch =
      if (shouldMergeSchemas) {
        // Also includes summary files, 'cause there might be empty partition directories.

        // If mergeRespectSummaries config is true, we assume that all part-files are the same for
        // their schema with summary files, so we ignore them when merging schema.
        // If the config is disabled, which is the default setting, we merge all part-files.
        // In this mode, we only need to merge schemas contained in all those summary files.
        // You should enable this configuration only if you are very sure that for the parquet
        // part-files to read there are corresponding summary files containing correct schema.

        // As filed in SPARK-11500, the order of files to touch is a matter, which might affect
        // the ordering of the output columns. There are several things to mention here.
        //
        //  1. If mergeRespectSummaries config is false, then it merges schemas by reducing from
        //     the first part-file so that the columns of the lexicographically first file show
        //     first.
        //
        //  2. If mergeRespectSummaries config is true, then there should be, at least,
        //     "_metadata"s for all given files, so that we can ensure the columns of
        //     the lexicographically first file show first.
        //
        //  3. If shouldMergeSchemas is false, but when multiple files are given, there is
        //     no guarantee of the output order, since there might not be a summary file for the
        //     lexicographically first file, which ends up putting ahead the columns of
        //     the other files. However, this should be okay since not enabling
        //     shouldMergeSchemas means (assumes) all the files have the same schemas.

        val needMerged: Seq[FileStatus] =
          if (mergeRespectSummaries) {
            Seq.empty
          } else {
            filesByType.data
          }
        needMerged ++ filesByType.metadata ++ filesByType.commonMetadata
      } else {
        // Tries any "_common_metadata" first. Parquet files written by old versions or Parquet
        // don't have this.
        filesByType.commonMetadata.headOption
            // Falls back to "_metadata"
            .orElse(filesByType.metadata.headOption)
            // Summary file(s) not found, the Parquet file is either corrupted, or different part-
            // files contain conflicting user defined metadata (two or more values are associated
            // with a same key in different files).  In either case, we fall back to any of the
            // first part-file, and just assume all schemas are consistent.
            .orElse(filesByType.data.headOption)
            .toSeq
      }
    ParquetFileFormat.mergeSchemasInParallel(filesToTouch, sparkSession)
  }


这个方法看着很长,其实注释很多,主要就是判断是否需要读取每个parquet文件的schema来进行schema的判断,以及具体的判断方式
把代码简化一下。

1. 简化后代码

  override def inferSchema(
      sparkSession: SparkSession,
      parameters: Map[String, String],
      files: Seq[FileStatus]): Option[StructType] = {
    val parquetOptions = new ParquetOptions(parameters, sparkSession.sessionState.conf)

    // 是否要从所有的parquet的part-files中merge schema, 这个值是默认是false
    val shouldMergeSchemas = parquetOptions.mergeSchema

    // merge的话,是否所有的parquet文件的summary信息都是一致的,默认是false,就是不一致,这个变量只有在shouldMergeSchemas 为true的时候才会用上
    val mergeRespectSummaries = sparkSession.sessionState.conf.isParquetSchemaRespectSummaries

    val filesByType = splitFiles(files)

    val filesToTouch =
      // 这里是false
      if (shouldMergeSchemas) {

        val needMerged: Seq[FileStatus] =
          if (mergeRespectSummaries) {
            Seq.empty
          } else {
            filesByType.data
          }
        needMerged ++ filesByType.metadata ++ filesByType.commonMetadata
      } else {
        会走到这里来
        filesByType.commonMetadata.headOption
            .orElse(filesByType.metadata.headOption)
            .orElse(filesByType.data.headOption)
            .toSeq
      }
    最终调用这里来
    ParquetFileFormat.mergeSchemasInParallel(filesToTouch, sparkSession)
  }

2. parquetOptions.mergeSchema 为false

parquetOptions.mergeSchema对应的默认值是false

  val PARQUET_SCHEMA_MERGING_ENABLED = buildConf("spark.sql.parquet.mergeSchema")
    .doc("When true, the Parquet data source merges schemas collected from all data files, " +
         "otherwise the schema is picked from the summary file or a random data file " +
         "if no summary file is available.")
    .booleanConf
    .createWithDefault(false)

3. isParquetSchemaRespectSummaries 默认值为false

sparkSession.sessionState.conf.isParquetSchemaRespectSummaries默认的值是false


  val PARQUET_SCHEMA_RESPECT_SUMMARIES = buildConf("spark.sql.parquet.respectSummaryFiles")
    .doc("When true, we make assumption that all part-files of Parquet are consistent with " +
         "summary files and we will ignore them when merging schema. Otherwise, if this is " +
         "false, which is the default, we will merge all part-files. This should be considered " +
         "as expert-only option, and shouldn't be enabled before knowing what it means exactly.")
    .booleanConf
    .createWithDefault(false)

4. filesByType 信息

filesByType 是所有文件的一个封装,最后生成的filesToTouch 的内容是

LocatedFileStatus{path=hdfs://test.com:9000/user/daily/20200828/part-00000-0e0dc5b5-5061-41ca-9fa6-9fb7b3e09e98-c000.snappy.parquet; isDirectory=false; length=193154555; replication=3; blocksize=134217728; modification_time=1599052749676; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false}

也就是filesByType所有文件中的第一个,然后使用这个文件作为参数来调用ParquetFileFormat.mergeSchemasInParallel(filesToTouch, sparkSession)方法

3. ParquetFileFormat.mergeSchemasInParallel


  /**
   * Figures out a merged Parquet schema with a distributed Spark job.
   *
   * Note that locality is not taken into consideration here because:
   *
   *  1. For a single Parquet part-file, in most cases the footer only resides in the last block of
   *     that file.  Thus we only need to retrieve the location of the last block.  However, Hadoop
   *     `FileSystem` only provides API to retrieve locations of all blocks, which can be
   *     potentially expensive.
   *
   *  2. This optimization is mainly useful for S3, where file metadata operations can be pretty
   *     slow.  And basically locality is not available when using S3 (you can't run computation on
   *     S3 nodes).
   */
  def mergeSchemasInParallel(
      filesToTouch: Seq[FileStatus],
      sparkSession: SparkSession): Option[StructType] = {
    val assumeBinaryIsString = sparkSession.sessionState.conf.isParquetBinaryAsString
    val assumeInt96IsTimestamp = sparkSession.sessionState.conf.isParquetINT96AsTimestamp
    val serializedConf = new SerializableConfiguration(sparkSession.sessionState.newHadoopConf())

    // !! HACK ALERT !!
    //
    // Parquet requires `FileStatus`es to read footers.  Here we try to send cached `FileStatus`es
    // to executor side to avoid fetching them again.  However, `FileStatus` is not `Serializable`
    // but only `Writable`.  What makes it worse, for some reason, `FileStatus` doesn't play well
    // with `SerializableWritable[T]` and always causes a weird `IllegalStateException`.  These
    // facts virtually prevents us to serialize `FileStatus`es.
    //
    // Since Parquet only relies on path and length information of those `FileStatus`es to read
    // footers, here we just extract them (which can be easily serialized), send them to executor
    // side, and resemble fake `FileStatus`es there.
    val partialFileStatusInfo = filesToTouch.map(f => (f.getPath.toString, f.getLen))

    // Set the number of partitions to prevent following schema reads from generating many tasks
    // in case of a small number of parquet files.
    val numParallelism = Math.min(Math.max(partialFileStatusInfo.size, 1),
      sparkSession.sparkContext.defaultParallelism)

    val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles

    // Issues a Spark job to read Parquet schema in parallel.
    val partiallyMergedSchemas =
      sparkSession
        .sparkContext
        .parallelize(partialFileStatusInfo, numParallelism)
        .mapPartitions { iterator =>
          // Resembles fake `FileStatus`es with serialized path and length information.
          val fakeFileStatuses = iterator.map { case (path, length) =>
            new FileStatus(length, false, 0, 0, 0, 0, null, null, null, new Path(path))
          }.toSeq

          // Reads footers in multi-threaded manner within each task
          val footers =
            ParquetFileFormat.readParquetFootersInParallel(
              serializedConf.value, fakeFileStatuses, ignoreCorruptFiles)

          // Converter used to convert Parquet `MessageType` to Spark SQL `StructType`
          val converter = new ParquetToSparkSchemaConverter(
            assumeBinaryIsString = assumeBinaryIsString,
            assumeInt96IsTimestamp = assumeInt96IsTimestamp)
          if (footers.isEmpty) {
            Iterator.empty
          } else {
            var mergedSchema = ParquetFileFormat.readSchemaFromFooter(footers.head, converter)
            footers.tail.foreach { footer =>
              val schema = ParquetFileFormat.readSchemaFromFooter(footer, converter)
              try {
                mergedSchema = mergedSchema.merge(schema)
              } catch { case cause: SparkException =>
                throw new SparkException(
                  s"Failed merging schema of file ${footer.getFile}:\n${schema.treeString}", cause)
              }
            }
            Iterator.single(mergedSchema)
          }
        }.collect()

    if (partiallyMergedSchemas.isEmpty) {
      None
    } else {
      var finalSchema = partiallyMergedSchemas.head
      partiallyMergedSchemas.tail.foreach { schema =>
        try {
          finalSchema = finalSchema.merge(schema)
        } catch { case cause: SparkException =>
          throw new SparkException(
            s"Failed merging schema:\n${schema.treeString}", cause)
        }
      }
      Some(finalSchema)
    }
  }


ParquetFileFormat.mergeSchemasInParallel有两个参数,一个是sparksession,另一个是filesToTouch,存储的是一个文件列表信息,上面也提到默认情况下实际上传进来的只有一个文件。
这个方法的作用就是创建一个job1,根据传进来的文件来解析出来schema信息,对应的job1的内容有:

    val partialFileStatusInfo = filesToTouch.map(f => (f.getPath.toString, f.getLen))

    val numParallelism = Math.min(Math.max(partialFileStatusInfo.size, 1),
      sparkSession.sparkContext.defaultParallelism)

    val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles

    val partiallyMergedSchemas =
      sparkSession
        .sparkContext
        .parallelize(partialFileStatusInfo, numParallelism)
        .mapPartitions { iterator =>
		...
		...

            Iterator.single(mergedSchema)
        }.collect()

job1的并行度numParallelism

    val numParallelism = Math.min(Math.max(partialFileStatusInfo.size, 1),
      sparkSession.sparkContext.defaultParallelism)


partialFileStatusInfo.size是传进来的参数,这里是1
sparkSession.sparkContext.defaultParallelism返回的是1,实际上这个在不同的环境下值是不同的,但是肯定不会小于1

所以 numParallelism 的值就是1,也就是job1的partition数量是1

2. job1的调用总结

方法的调用链是


DataFrameReader.load()
DataFrameReader.loadV1Source()
DataSoure.resolveRelation()
DataSource.getOrInferFileFormatSchema()
ParquetFileFormat.inferSchema()
ParquetFileFormat.mergeSchemasInParallel()

3. spark read 总结

再回顾一下我们的代码。

public class UserProfileTest {

    static String filePath = "hdfs:///user/daily/20200828/*.parquet";
    public static void main(String[] args) {
        SparkConf sparkConf = new SparkConf()
                .setMaster("local")
                .setAppName("user_profile_test")
                .set(ConfigurationOptions.ES_NODES, "")
                .set(ConfigurationOptions.ES_PORT, "")
                .set(ConfigurationOptions.ES_MAPPING_ID, "uid");

	//主要想要考察一下这个地方为什么会产生更多的job
        SparkSession sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();
        Dataset<Row> userProfileSource = sparkSession.read().parquet(filePath);
        userProfileSource.count();
        userProfileSource.write().parquet("hdfs:///user/daily/result2020082808/");
    }
}

当filePath命中的path数量大于32的话便会产生一个单独的job来进行文件的递归查找,找到所有复合条件的file信息(包括block size等信息,为后面的schema识别和rdd操作的partition做准备)
在找到所有的文件之后对应的读取文件如果是parquet的话会创建一个新的job来解析对应的schema。

基于上面的解析,假如我们对上面代码中的filepath进行修改,假如设置filePath

static String filePath = "hdfs://bj3-dev-search-01.tencn:9000/user/daily/20200828/part-00057-0e0dc5b5-5061-41ca-9fa6-9fb7b3e09e98-c000.snappy.parquet";

那么程序运行的时候就没有job0了。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: Spark submit任务提交是指将用户编写的Spark应用程序提交到集群中运行的过程。在Spark中,用户可以通过命令行工具或API方式提交任务。 Spark submit命令的基本语法如下: ``` ./bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ <application-jar> \ [application-arguments] ``` 其中,`--class`指定应用程序的主类,`--master`指定集群的URL,`--deploy-mode`指定应用程序的部署模式,`--conf`指定应用程序的配置参数,`<application-jar>`指定应用程序的jar包路径,`[application-arguments]`指定应用程序的命令行参数。 在Spark中,任务提交的过程主要包括以下几个步骤: 1. 创建SparkConf对象,设置应用程序的配置参数; 2. 创建SparkContext对象,连接到集群; 3. 加载应用程序的主类; 4. 运行应用程序的main方法; 5. 关闭SparkContext对象,释放资源。 在任务提交的过程中,Spark会自动将应用程序的jar包和依赖的库文件上传到集群中,并在集群中启动Executor进程来执行任务。任务执行完成后,Spark会将结果返回给Driver进程,并将Executor进程关闭。 总之,Spark submit任务提交是Spark应用程序运行的关键步骤,掌握任务提交的原理和方法对于开发和调试Spark应用程序非常重要。 ### 回答2: Spark 作为一款强大的分布式计算框架,提供了很多提交任务的方式,其中最常用的方法就是通过 spark-submit 命令来提交任务。spark-submit 是 Spark 提供的一个命令行工具,用于在集群上提交 Spark 应用程序,并在集群上运行。 spark-submit 命令的语法如下: ``` ./bin/spark-submit [options] <app jar | python file> [app arguments] ``` 其中,[options] 为可选的参数,包括了执行模式、执行资源等等,<app jar | python file> 为提交的应用程序的文件路径,[app arguments] 为应用程序运行时的参数。 spark-submit 命令会将应用程序的 jar 文件以及所有的依赖打包成一个 zip 文件,然后将 zip 文件提交到集群上运行。在运行时,Spark 会根据指定的主类(或者 Python 脚本文件)启动应用程序。 在提交任务时,可以通过设置一些参数来控制提交任务的方式。例如: ``` --master:指定该任务运行的模式,默认为 local 模式,可设置为 Spark Standalone、YARN、Mesos、Kubernetes 等模式。 --deploy-mode:指定该任务的部署模式,默认为 client,表示该应用程序会在提交任务的机器上运行,可设置为 cluster,表示该应用程序会在集群中一台节点上运行。 --num-executors:指定该任务需要的 executor 数量,每个 executor 会占用一个计算节点,因此需要根据集群配置与任务要求确定该参数的值。 --executor-memory:指定每个 executor 可用的内存量,默认为 1g,可以适当调整该值以达到更好的任务运行效果。 ``` 此外,还有一些参数可以用来指定应用程序运行时需要传递的参数: ``` --conf:指定应用程序运行时需要的一些配置参数,比如 input 文件路径等。 --class:指定要运行的类名或 Python 脚本文件名。 --jars:指定需要使用的 Jar 包文件路径。 --py-files:指定要打包的 python 脚本,通常用于将依赖的 python 包打包成 zip 文件上传。 ``` 总之,spark-submit 是 Spark 提交任务最常用的方法之一,通过该命令能够方便地将应用程序提交到集群上运行。在提交任务时,需要根据实际场景调整一些参数,以达到更好的任务运行效果。 ### 回答3: Spark是一个高效的分布式计算框架,其中比较重要的组成部分就是任务提交。在Spark中,任务提交主要通过spark-submit来实现。本文将从两方面,即任务提交之前的准备工作和任务提交过程中的细节进行探讨。 一、任务提交之前的准备工作 1.环境配置 在执行任务提交前,需要确保所在的计算机环境已经配置好了SparkSpark的环境配置主要包括JAVA环境、Spark的二进制包、PATH路径配置、SPARK_HOME环境变量配置等。 2.编写代码 Spark的任务提交是基于代码的,因此在任务提交前,需要编写好自己的代码,并上传到集群中的某个路径下,以便后续提交任务时调用。 3.参数设置 在任务提交时,需要对一些关键的参数进行设置。例如,任务名、任务对应的代码路径、任务需要的资源、任务需要的worker节点等。 二、任务提交过程中的细节 1.启动Driver 当使用spark-submit命令提交任务时,Spark会启动一个Driver来运行用户的代码。这个Driver通常需要连接到Spark集群来执行任务。 2.上传文件 Spark支持在任务提交时上传所需的文件。这些文件可以用于设置Spark的环境变量、为任务提供数据源等。 3.资源需求 Spark的任务执行依赖于一定的资源。每个任务可以指定自己的资源需求,例如需要多少内存、需要多少CPU等。这些资源需求通常与提交任务时需要的worker节点数量有关系。 4.监控和日志 在任务执行的过程中,Spark会收集任务的监控数据和日志信息。这些数据可用于后续的调试和性能优化。 总之,在Spark任务提交过程中,需要充分考虑任务的资源需求和监控日志信息的收集,以便更好地完成任务和优化Spark运行效率。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值