Actions触发操作

最新推荐文章于 2024-05-09 22:08:46 发布

喜洋洋学习

最新推荐文章于 2024-05-09 22:08:46 发布

阅读量619

点赞数

本文链接：https://blog.csdn.net/weixin_45097166/article/details/103922265

版权

Spark小练习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

package Scala

import org.apache.spark._
import org.apache.spark.rdd.RDD

object Actions {
def main(args: Array[String]): Unit = {

val conf = new SparkConf()
  .setAppName("WordCount2000")
  .setMaster("local")


val sc = new SparkContext(conf)

//textFile一个 block 对应一个Spark 的一个分区，一个分区对应一个 task
val lines = sc.textFile(“hdfs://192.168.0.101:8020/goods.txt”)
//函数的三要素
//1> 参数 2> 函数名 3> 参数返回值
//println(_)
// f = x=>{println(x)} 他也是一个函数，x是参数，返回值是unit[void] 匿名函数
//new User().Longin() 匿名对象

//action 触发

//foreach

    lines.foreach(print(_))

遍历输出以\t分隔去索引第一个拼接aaa

    lines.foreach(x =>{
      println(x.split("\t")(0)+"aaa")
    })

//reduce 汇聚

//将文本的字符串用t转换为整数(toInt)，进行合并相加(reduce)

    val intRDD = lines.map(_.toInt)
    val sum = intRDD.reduce((x,y) =>{
      x + y
    })
    println(sum)

//collect 调试程序的时候使用比较多用在比较少量的数据上面，一个单机版结果在一个集群上的

     arr = {1,2,3,4,5}
    val arr = lines.collect().foreach(println(_))

//count 我懂

//输出结果为5

    val total = intRDD.count()
    println(total)

//first 取第一是take封装的

    val fir = lines.first()
    println(fir)

//take 取前三行遍历输出

    val take100 = lines.take(3).foreach(println(_))

//saveAsTextFile 并行的往 hdfs 上输出数据

// lines.saveAsTextFile(“输出数据路径”)

//countByKey()

//将数据\t拆分，取出第一个，取出第一个key的值为1

    val tuple = lines.map(_.split("\t")(0)).map((x =>(x,1)))
    //将key重叠计算出现的次数
    val map = tuple.countByKey()
    map.foreach(x =>{
      println(x._1 )
      println(x._2)
    })

//top 返回元素(key)100个最大的值

    lines.top(100).foreach(println(_))

lines.collect() //单机版

}

喜洋洋学习

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Actions触发操作

package Scalaimport org.apache.spark._import org.apache.spark.rdd.RDDobject Actions {def main(args: Array[String]): Unit = {val conf = new SparkConf() .setAppName("WordCount2000") .setMaster(...
复制链接

扫一扫