【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter1 Secondary Sort

:最近看了《Data Algorithms_Recipes for Scaling up with Hadoop and Spark》,其中的算法采用Java实现,下载路径为

源码下载https://github.com/mahmoudparsian/data-algorithms-book/

:本着学习的目的,现提供scala版本的算法Secondary Sort

package com.bbw5.dataalgorithms.spark

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.Logging

/**
 * SparkSecondarySort class implemets the secondary sort design pattern
 * by sorting reducer values in memory/RAM.
 *
 *
 * Input:
 *
 *    name, time, value
 *    x,2,9
 *    y,2,5
 *    x,1,3
 *    y,1,7
 *    y,3,1
 *    x,3,6
 *    z,1,4
 *    z,2,8
 *    z,3,7
 *    z,4,0
 *
 * Output: generate a time-series looking like this:
 *
 *       t1 t2 t3 t4
 *  x => [3, 9, 6]
 *  y => [7, 5, 1]
 *  z => [4, 8, 7, 0]
 *
 *  x => [(1,3), (2,9), (3,6)]
 *  y => [(1,7), (2,5), (3,1)]
 *  z => [(1,4), (2,8), (3,7), (4,0)]
 *
 *
 * @author bbw5
 *
 */
case class TestObj(name: String, time: Int, value: Int)

object SparkSecondarySort extends Logging {
  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("SparkSecondarySort")
    val sc = new SparkContext(sparkConf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)

    val filename = "D:/temp/data/ss2.txt"
    val textFile = sc.textFile(filename)
    //使用RDD进行排序
    val outputs = textFile.map { line =>
      val array = line.split(",")
      (array(0), (array(1), array(2)))
    }.groupByKey().mapValues(iter => iter.toArray.sortBy(a => a._1).toList)
    outputs.collect.foreach(println)

    // this is used to implicitly convert an RDD to a DataFrame.
    import sqlContext.implicits._
    val df = textFile.map(_.split(",")).map { t => TestObj(t(0), t(1).toInt, t(2).toInt) }.toDF()
    df.show()
    df.printSchema()
    df.groupBy("name").count().show()
    
    df.registerTempTable("test")
    //使用DataFrame进行排序
    val ssDf = sqlContext.sql("SELECT name,time,value FROM test order by name,time ")
    ssDf.map (r => (r(0),(r(1),r(2)))).groupByKey().collect.foreach(println)  
  }
}




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值