Spark运算:统计RDD分区中的元素及数量

Spark RDD是被分区的,在生成RDD时候,一般可以指定分区的数量,如果不指定分区数量,当RDD从集合创建时候,则默认为该程序所分配到的资源的CPU核数,如果是从HDFS文件创建,默认为文件的Block数。

 

def main(args: Array[String]) {
  //默认分区12个
  val sc = new SparkContext(new SparkConf().setMaster("local").setAppName("test").set("spark.default.parallelism", "12"))

  //创建一个RDD,分区3个
  val rdd = sc.makeRDD(1 to 128, 3)
  println("RDD partition size:" + rdd.partitions.size)

  //默认分区12个
  val rdd1 = sc.makeRDD(1 to 128)
  println("RDD1 partition size:" + rdd1.partitions.size)

  //从HDFS文件创建的RDD,包含76个分区,因为该文件由76个Block
  val hdfs = sc.textFile("C:\\Windows\\Logs\\CBS\\CbsPersist_20151208074613.log")
  println("RDD partition size:" + hdfs.partitions.size)
}

 

RDD partition size:3
RDD1 partition size:12

16/12/19 15:47:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 229.9 KB, free 900.4 MB)
16/12/19 15:47:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.1 KB, free 900.4 MB)
16/12/19 15:47:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.56.1:61774 (size: 22.1 KB, free: 900.6 MB)
16/12/19 15:47:56 INFO SparkContext: Created broadcast 0 from textFile at ShellTest.scala:31
16/12/19 15:47:57 INFO FileInputFormat: Total input paths to process : 1

RDD partition size:76

 

统计分区元素数量

def main(args: Array[String]): Unit = {
  //默认分区12个
  val sc = new SparkContext(new SparkConf().setMaster("local").setAppName("test").set("spark.default.parallelism", "12"))
  var rdd1 = sc.makeRDD(1 to 1128)
  rdd1.mapPartitions{x => Iterator(x.length)}.collect.foreach(println(_))
}

16/12/20 11:24:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/12/20 11:24:46 INFO DAGScheduler: Job 0 finished: collect at ShellTest.scala:22, took 1.109165 s
94
94
94
94
94
94
94
94
94
94
94
94

16/12/20 11:24:46 INFO SparkContext: Invoking stop() from shutdown hook

 

 

转载于:https://my.oschina.net/chensanti234/blog/808431

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值