spark场景练习题(一)

注意:

数据自己简单造一点就行,写完把代码复制在下面即可!

第一题:

有数据文件test.txt,分隔符为“\t”,字段有id、time、url,用SparkCore实现分组取topn。求搜索引擎被使用最多的前三名

样例数据如下:

id  time         url
2	11:08:23	google
3	12:09:11	baidu
1	08:45:56	sohu
2	16:42:17	yahoo
1	23:10:34	baidu
5	06:23:05	google
6	07:45:56	sohu
4	18:42:17	yahoo
5	24:10:34	baidu
1	04:23:05	google
7	16:42:17	yahoo
8	23:10:34	baidu
10	06:23:05	google
11	07:45:56	sohu
def main(args: Array[String]): Unit = {
    val sc = new SparkContext(new SparkConf().setMaster("local[*]").setAppName("test"))
    val rdd: RDD[String] = sc.textFile("datas/test.txt")
    val tuples: Array[(String, Int)] = rdd.map(x => ((x.split("\t")) (2), 1)).reduceByKey(_ + _).sortBy(_._2, false).collect().take(3)
    tuples.foreach(println)
  }

第二题:

统计下面语句中

1)Spark的出现次数

2)哪个单词出现的次数最多

Get Spark from the [downloads page](http://spark.apache.org/downloads.html) of the project website This documentation is for Spark version Spark uses Hadoop s client libraries for HDFS and YARN.Downloads are pre packaged for a handful of popular Hadoop versions Users can also download a Hadoop free binary and run Spark with any Hadoop version [by augmenting Spark s classpath](http://spark.apache.org/docs/latest/hadoop-provided.html) Scala and Java users can include Spark in their projects using its Maven coordinates and in the future Python users can also install Spark from PyPI
val sc = new SparkContext(new SparkConf().setMaster("local[*]").setAppName("test"))
    val rdd: RDD[String] = sc.textFile("datas/words.txt")
    val count: Long = rdd.flatMap(_.split(" |/|\\[|\\]|:|\\.|\\(|\\)|-")).filter(_.equalsIgnoreCase("spark")).count()
    println(count)
def main(args: Array[String]): Unit = {
    al sc = new SparkContext(new SparkConf().setMaster("local[*]").setAppName("test"))
    val rdd: RDD[String] = sc.textFile("datas/words.txt")
    val tuples: Array[(String, Int)] = rdd.flatMap(_.split(" |/|\\[|\\]|:|\\.|\\(|\\)|-")).filter(!_.equals("")).map((_, 1)).reduceByKey((x, y) => {
      x + y
    }).sortBy(_._2, false).take(1)
    tuples.foreach(println)
  }

第三题

hdfs目录/data下的数据文件peopleinfo.txt,该文件包含了序号、性别和身高三个列,形式如下:

1    F    170
2    M    178
3    M    174
4    F    165
5    M    179
6    F    160

编写Spark应用程序,该程序对HDFS文件中的数据文件peopleinfo.txt进行统计,计算得到男性总数、女性总数、男性最高身高、女性最高身高、男性最低身高、女性最低身高。

 def main(args: Array[String]): Unit = {
    val sc = new SparkContext(new SparkConf().setMaster("local[*]").setAppName("test"))
    val rdd: RDD[String] = sc.textFile("datas/peopleinfo.txt")
    val result: Array[(String, Int, Int, Int)] = rdd.map(
      x => {
        val strings: Array[String] = x.split("    ")
        (strings(1), strings(2))
      }).groupBy(_._1).map(x => {
      var count = 0
      var max = -1
      var min = 300
      x._2.map(y => {
        count += 1
        if (max < y._2.toInt)
          max = y._2.toInt
        if (min > y._2.toInt)
          min = y._2.toInt
      })
      (x._1, count, max, min)
    }).collect()
    result.foreach(println)
  }

第四题:

假设有股票数据(csv文件)如下:股票代码(sid)、日期(time)、成交价(price)(表名t1)

sh600794,2019-09-02 09:00:10,25.51
sh603066,2019-09-02 09:00:10,15.51
sh600794,2019-09-02 09:00:20,25.72
sh603066,2019-09-02 09:00:20,15.72
sh600794,2019-09-02 09:00:30,25.83
sh603066,2019-09-02 09:00:30,15.83
sh600794,2019-09-02 09:00:40,25.94
sh603066,2019-09-02 09:00:40,15.94
sh600794,2019-09-02 09:00:50,26.00
sh603066,2019-09-02 09:00:50,16.00
sh600794,2019-09-02 09:10:00,25.98
sh603066,2019-09-02 09:10:00,25.98
sh600794,2019-09-02 15:00:00,25.50
sh603066,2019-09-02 15:00:00,16.00

用Spark-core实现:找每支股票每日所有的波峰、波谷值

val sc = new SparkContext(new SparkConf().setMaster("local[*]").setAppName("test"))
    val rdd: RDD[String] = sc.textFile("datas/test.csv")
    rdd.map(str=>{
      val strings: Array[String] = str.split(",")
      (strings(0),strings(1).split(" ")(0),strings(2).toDouble)
    }).groupBy(_._1).map(x=>{
      (x._1,x._2.maxBy(_._3)._3,x._2.minBy(_._3)._3)
    }).foreach(println)
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值