RDD Transform
- join
- union
- groupByKey
RDD Action
- reduce
- lookup
join、union和groupByKey是RDD中Transform部分的API;而reduce和lookup是RDD中Action部分的API
Union
Union是将两个RDD中数据取并集,然后得到一个新的RDD
scala> var rdd1 = sc.parallelize(List("MQ", "Zookeeper"));
rdd1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[27] at parallelize at <console>:12
scala> var rdd2 = sc.parallelize(List("Redis", "MongoDB"));
rdd2: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[28] at parallelize at <console>:12
scala> val result = rdd1 union rdd2
result: org.apache.spark.rdd.RDD[String] = UnionRDD[30] at union at <console>:16
scala> result.collect
///结果
res15: Array[String] = Array(MQ, Zookeeper, Redis, MongoDB)
scala>result.count
///结果
res17: Long = 2
Join
如下所示,join操作是把两个RDD的数据进行了连接操作,类似于SQL,这个链接操作的依据是Key