Spark 算子- Actions

17 篇文章 0 订阅
12 篇文章 0 订阅

Spark 算子- Actions

Actions介绍

操作介绍翻译
reduce(func)Aggregate the elements of the dataset using a function func (which takes two arguments and returns one). The function should be commutative and associative so that it can be computed correctly in parallel.使用函数func聚合数据集的元素(函数func接受两个参数并返回一个参数)
collect()Return all the elements of the dataset as an array at the driver program. This is usually useful after a filter or other operation that returns a sufficiently small subset of the data.将数据集的所有元素作为数组返回到driver,返回的数据小可以,大了会OOM
count()Return the number of elements in the dataset.返回数据的个数
first()Return the first element of the dataset (similar to take(1)).返回数据集的第一个元素
take(n)Return an array with the first n elements of the dataset.返回数据集的前n个元素
foreach(func)Run a function func on each element of the dataset. This is usually done for side effects such as updating an Accumulator or interacting with external storage systems. Note: modifying variables other than Accumulators outside of the foreach() may result in undefined behavior. See Understanding closures for more details.遍历每个元素
takeSample(withReplacement, num, [seed])Return an array with a random sample of num elements of the dataset, with or without replacement, optionally pre-specifying a random number generator seed.返回一个随机样本
takeOrdered(n, [ordering])Return the first n elements of the RDD using either their natural order or a custom comparator.返回排序后的前N个

DEMO


object ActionApp {
  def main(args: Array[String]): Unit = {
    val sparkConf= new SparkConf().setAppName("ActionApp").setMaster("local[2]")
    val sc = new SparkContext(sparkConf)
    val data1 = sc.parallelize(Array(1,2,3,4,5))
    val reduceData=data1.reduce(_+_)
    println(reduceData)//15
    //需要注意 -操作对每个分区进行相减,想得到预期值需要设置1个分区
    //2个分区时(1,2)(3,4,5)=> (-1),(-6) => -1 -(-6)=5 or -6 -(-1)=5
    println(data1.reduce(_-_))//5 or -5
    //所有数据汇集到driver
    //只有当结果数组很小时才应使用此方法,因为所有数据都加载driver内存中。
    data1.collect().foreach(println(_))
    //返回数据个数
    println( data1.count())//5
    //返回第一个元素
    println( data1.first())//1
    //只有当结果数组很小时才应使用此方法,因为所有数据都加载driver内存中。
    //返回前2个
    data1.take(2).foreach(println(_))//1 2
    //
    //this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.
    //只有当结果数组很小时才应使用此方法,因为所有数据都加载driver内存中。
    data1.takeSample(true,2).foreach(println(_))//
    //只有当结果数组很小时才应使用此方法,因为所有数据都加载driver内存中。
    data1.takeOrdered(2).foreach(println(_))//1,2
    val data2 = sc.parallelize(Array("a","b","c","d","e"))

    sc.stop()
  }
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值