Transform
转换算子
1 map
val streamMap = stream.map { x => x * 2 } |
2 flatMap
val streamFlatMap = stream.flatMap{ x => x.split(” “) } |
3 Filter
val streamFilter = stream.filter{ x => x == 1 } |
4 KeyBy
DataStream → KeyedStream:输入必须是Tuple类型,逻辑地将一个流拆分成不相交的分区,每个分区包含具有相同key的元素,在内部以hash的形式实现的。
5 Reduce
KeyedStream → DataStream:一个分组数据流的聚合操作,合并当前的元素和上次聚合的结果,产生一个新的值,返回的流中包含每一次聚合的结果,而不是只返回最后一次聚合的最终结果。
//求各个渠道的累计个数 val startUplogDstream: DataStream[StartUpLog] = dstream.map{ JSON.parseObject(_,classOf[StartUpLog])} val keyedStream: KeyedStream[(String, Int), Tuple] = startUplogDstream.map(startuplog=>(startuplog.ch,1)).keyBy(0) //reduce //sum keyedStream.reduce{ (ch1,ch2)=> (ch1._1,ch1._2+ch2._2) } .print().setParallelism(1) |
6 Split 和 Select
Split
图 Split
DataStream → SplitStream:根据某些特征把一个DataStream拆分成两个或者多个DataStream。
Select
图 Select
SplitStream→DataStream:从一个SplitStream中获取一个或者多个DataStream。
需求:把appstore和其他的渠道的数据单独拆分出来,做成两个流
// 将appstore与其他渠道拆分拆分出来 成为两个独立的流 val splitStream: SplitStream[StartUpLog] = startUplogDstream.split { startUplog => var flags:List[String] = null if (“appstore” == startUplog.ch) { flags = List(startUplog.ch) } else { flags = List(“other” ) } flags } val appStoreStream: DataStream[StartUpLog] = splitStream.select(“appstore”) appStoreStream.print(“apple:”).setParallelism(1) val otherStream: DataStream[StartUpLog] = splitStream.select(“other”) otherStream.print(“other:”).setParallelism(1) |