Spark的左外连接解决方案

Hadoop/MapReduce的左外连接解决方案


1、Spark的左外连接解决方案之不使用letfOutJoin()


import org.apache.spark.{SparkConf, SparkContext}

object LeftOutJoinTest {
  def main(args: Array[String]): Unit = {
    //连接SparkMaster
    val conf = new SparkConf().setAppName("Chenjie's first spark App").setMaster("local")
    val sc = new SparkContext(conf)
    //从HDFS中读取输入文件并创建RDD
    val users = sc.textFile("hdfs://pc1:9000/input/users.txt")
    val user_map = users.map(line=>Tuple2(line.split("\t")(0),Tuple2("L",line.split("\t")(1))))

    val transactions = sc.textFile("hdfs://pc1:9000/input/transactions.txt")
    val transaction_map = transactions.map(line=>Tuple2(line.split("\t")(2),Tuple2("P",line.split("\t")(1))))

    val all = transaction_map.union(user_map)

    val groupedRDD = all.groupByKey()


    val productLocationsRDD = groupedRDD.flatMap{tuple=>
      val pairs = tuple._2
      var location = "UNKOWN"
      val products = new scala.collection.mutable.ArrayBuffer[String]()
      pairs.foreach{t2=>
        if(t2._1.equals("L")){
          location = t2._2
        }
        else{
          products.+=(t2._2)
        }
      }
      val kvlist = new scala.collection.mutable.ArrayBuffer[Tuple2[String,String]]()
      for(product <- products){
        kvlist.+=((new Tuple2(product,location)))
      }
      kvlist
    }
    productLocationsRDD.distinct().groupByKey().map{pair=>
      val key = pair._1
      val locations = pair._2
      val length = locations.size
      Tuple2(key,Tuple2(locations,length))
    }.saveAsTextFile("hdfs://pc1:9000/output/leftoutjoin_1")
  }
}

运行结果:





2、Spark的左外连接解决方案之使用letfOutJoin():避免标志位等麻烦

import org.apache.spark.{SparkConf, SparkContext}

object LeftOutJoinTest {
  def main(args: Array[String]): Unit = {
    //连接SparkMaster
    val conf = new SparkConf().setAppName("Chenjie's first spark App").setMaster("local")
    val sc = new SparkContext(conf)
    //从HDFS中读取输入文件并创建RDD
    val users = sc.textFile("hdfs://pc1:9000/input/users.txt")
    val user_map = users.map(line=>Tuple2(line.split("\t")(0),line.split("\t")(1)))

    val transactions = sc.textFile("hdfs://pc1:9000/input/transactions.txt")
    val transaction_map = transactions.map(line=>Tuple2(line.split("\t")(2),line.split("\t")(1)))

    val joined = transaction_map.leftOuterJoin(user_map)
    joined.map(line=>Tuple2(line._2._1,line._2._2.get)).distinct().groupByKey().map{pair=>
      val key = pair._1
      val locations = pair._2
      val length = locations.size
      Tuple2(key,Tuple2(locations,length))
    }.saveAsTextFile("hdfs://pc1:9000/output/leftoutjoin_2")

  }
}




  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值