作者:大数据技术研究人员--谢彪
一、运行案例代码如下:
package com.dt.spark.sparksteaming
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
importorg.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Seconds
object OnlineBlackListFilter {
defmain(args: Array[String]){
valblackList = Array(("hadoop",true),("mahout",true))
valblackListRDD = ssc.sparkContext.parallelize(blackList,8)
valadsClickStream = ssc.socketTextStream("Master",9999)
valadsClickStreamFormatted = adsClickStream.map { ads=>(ads.split(" ")(1), ads) }
adsClickStreamFormatted.transform(userClickRDD=> {
//通过leftOuterJoin操作既保留了左侧用户广告点击内容的RDD的所有内容,又获得了相应点击内容是否在黑名单中
valjoinedBlackListRDD =userClickRDD.leftOuterJoin(blackListRDD)
valvalidClicked = joinedBlackListRDD.filter(joinedItem =>{
if(joinedItem._2._2.getOrElse(false))
{
false
}else {
true
}
})
validClicked.map(validClick=> {validClick._2._1})
ssc.start()
ssc.awaitTermination()
}
}
二、通过运行上述案例代码,从web界面中可知其背后的RDD的依赖关系
SparkStreaming运行机制和架构解析
-
资料来源于:DT_大数据梦工厂(Spark发行版本定制)
-
DT大数据梦工厂微信公众号:DT_Spark
-
新浪微博:http://www.weibo.com/ilovepains
-
王家林老师每晚20:00免费大数据实战
YY直播:68917580