Spark经典案例之求top值

需求分析
orderid,userid,payment,productid
求topN的payment值
a.txt
1,9819,100,121
2,8918,2000,111
3,2813,1234,22
4,9100,10,1101
5,3210,490,111
6,1298,28,1211
7,1010,281,90
8,1818,9000,20

b.txt
100,3333,10,100
101,9321,1000,293
102,3881,701,20
103,6791,910,30
104,8888,11,39

scala代码

package ClassicCase

import org.apache.spark.{SparkConf, SparkContext}

/**
  * 业务场景:求top值
  * Created by YJ on 2017/2/8.
  */


object case6 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local").setAppName("reduce")
    val sc = new SparkContext(conf)
    sc.setLogLevel("ERROR")
    val six = sc.textFile("hdfs://192.168.109.130:8020//user/flume/ClassicCase/case6/*", 2)
    var idx = 0;
    val res = six.filter(x => (x.trim().length > 0) && (x.split(",").length == 4))
      .map(_.split(",")(2))
      .map(x => (x.toInt, ""))
      .sortByKey(false)    //fasle ->倒序
      .map(x => x._1).take(5)
      .foreach(x => {
        idx = idx + 1
        println(idx + "\t" + x)
      })
  }

}

方法2

package com.neusoft

import org.apache.spark.{SparkConf, SparkContext}

/**
  * Created by Administrator on 2019/3/4.
  */
object FileTopN {
  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setAppName("FileOrder").setMaster("local")

    val sc = new SparkContext(sparkConf)

    val rdd = sc.textFile("demo5/*")

    var idx = 0
    rdd.map(x => x.split(",")(2)).map(x => (x.toInt, "")).sortByKey(false).take(5).map(x => x._1).foreach(x => {
      idx+=1
      println(idx + " " + x)
    })
  }
}

结果输出:
1 9000
2 2000
3 1234
4 1000
5 910



 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值