为什么textFile用string去接收

最新推荐文章于 2022-08-19 09:45:27 发布

CoreDao

最新推荐文章于 2022-08-19 09:45:27 发布

阅读量190

点赞数

分类专栏： Spark 文章标签： spark 大数据 scala string

本文链接：https://blog.csdn.net/Jin_Lemon/article/details/114546524

版权

Spark 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

因为spark中textFile读文件的函数沿用的MR，MR读文件是行读取器，一行一行读出来，只能string去接收。

源码：

/**
   * Read a text file from HDFS, a local file system (available on all nodes), or any
   * Hadoop-supported file system URI, and return it as an RDD of Strings.
   * @param path path to the text file on a supported file system
   * @param minPartitions suggested minimum number of partitions for the resulting RDD
   * @return RDD of lines of the text file
   */
  def textFile(
      path: String,
      minPartitions: Int = defaultMinPartitions): RDD[String] = withScope {
    assertNotStopped()
    hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text],
      minPartitions).map(pair => pair._2.toString).setName(path)
  }