spark将Rdd转成list和set

最新推荐文章于 2024-06-11 10:37:15 发布

我是浣熊的微笑

最新推荐文章于 2024-06-11 10:37:15 发布

阅读量1.3w

点赞数 2

本文链接：https://blog.csdn.net/gz1993/article/details/100004751

版权

有需求要将Rdd转成list，上网查资料实现都很复杂，后来发现其实是非常简单的，collect()完已经就是Array了，看源码

  /**
   * Return an array that contains all of the elements in this RDD.
   *
   * @note This method should only be used if the resulting array is expected to be small, as
   * all the data is loaded into the driver's memory.
   */
  def collect(): Array[T] = withScope {
    val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray)
    Array.concat(results: _*)
  }

之后只要接toList或toSet就可以，上代码：

import org.apache.kudu.spark.kudu.KuduContext
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession

import scala.collection.mutable.ArrayBuffer

object Spark_kudu {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf()
    val sc = new SparkContext()
  
    val arr = Array(1,2,3,4,5)
    val rdd = sc.parallelize(arr)
    val list: List[Int] = rdd.collect().toList
    val set: Set[Int] = rdd.collect().toSet

    spark.close()
  }
}

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

我是浣熊的微笑

关注关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
3
评论
spark将Rdd转成list和set

有需求要将Rdd转成list，上网查资料实现都很复杂，后来发现其实是非常简单的，collect()完已经就是Array了，看源码 /** * Return an array that contains all of the elements in this RDD. * * @note This method should only be used if the resu...
复制链接

扫一扫