CC00022.flink——|Hadoop&Flink.V06|——|Flink.v06|API详解|Flink DataStream|

一、Flink DataStream常用API:Transformation
### --- Transformation

~~~     Flink针对DataStream提供了大量的已经实现的算子
二、Flink DataStream常用API:Transformation算子
### --- Map
~~~     DataStream → DataStream

~~~     # DataStream → DataStream
~~~     DataStream → DataStream
~~~     Takes one element and produces one element. A map function that doubles the values of the input stream:
DataStream<Integer> dataStream = //...
dataStream.map(new MapFunction<Integer, Integer>() {
    @Override
    public Integer map(Integer value) throws Exception {
        return 2 * value;
    }
});
### --- FlatMap

~~~     # DataStream → DataStream
~~~     Takes one element and produces zero, one, or more elements. A flatmap function that splitssentences to words:
dataStream.flatMap(new FlatMapFunction<String, String>() {
    @Override
    public void flatMap(String value, Collector<String> out)
        throws Exception {
        for(String word: value.split(" ")){
            out.collect(word);
        }
    }
});
### --- Filter

~~~     # DataStream → DataStream
~~~     Evaluates a boolean function for each element and retains those for which the function returns
~~~     true. A filter that filters out zero values:
dataStream.filter(new FilterFunction<Integer>() {
    @Override
    public boolean filter(Integer value) throws Exception {
        return value != 0;
    }
});
### --- KeyBy

~~~     # DataStream → KeyedStream
~~~     Logically partitions a stream into disjoint partitions. All records with the same key are assigned to
~~~     the same partition. Internally, keyBy() is implemented with hash partitioning. There are different ways to specify keys.

~~~     This transformation returns a KeyedStream, which is, among other things, required to use keyedstate.
dataStream.keyBy(value -> value.getSomeKey()) // Key by field "someKey"
dataStream.keyBy(value -> value.f0) // Key by the first element of a Tuple
~~~     # Attention A type cannot be a key if:

~~~     it is a POJO type but does not override the hashCode() method and relies on the Object.hashCode() implementation.
~~~     it is an array of any type.
5、Reduce

# --- KeyedStream → DataStream
//  A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value.
//  A reduce function that creates a stream of partial sums:
keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});
### --- 6、Fold

~~~     # KeyedStream → DataStream
~~~     A "rolling" fold on a keyed data stream with an initial value. Combines the current element with the last folded value and emits the new value.
~~~     A fold function that, when applied on the sequence (1,2,3,4,5), emits the sequence "start-1","start-1-2", "start-1-2-3", ...
DataStream<String> result =
  keyedStream.fold("start", new FoldFunction<Integer, String>() {
    @Override
    public String fold(String current, Integer value) {
        return current + "-" + value;
    }
});
### --- 7、Aggregations

~~~     # KeyedStream → DataStream
~~~     Rolling aggregations on a keyed data stream. The difference between min and minBy is that min returns the minimum value, whereas minBy returns the element that has the minimum value in this field (same for max and maxBy).
keyedStream.sum(0);
keyedStream.sum("key");
keyedStream.min(0);
keyedStream.min("key");
keyedStream.max(0);
keyedStream.max("key");
keyedStream.minBy(0);
keyedStream.minBy("key");
keyedStream.maxBy(0);
keyedStream.maxBy("key");
### --- 8、Window

~~~     # KeyedStream → WindowedStream
~~~     Windows can be defined on already partitioned KeyedStreams. Windows group the data in each key according to some characteristic (e.g., the data that arrived within the last 5 seconds). See windows for a complete description of windows.
dataStream.keyBy(value ->
value.f0).window(TumblingEventTimeWindows.of(Time.seconds(5))); // Last 5
seconds of data
### --- 9、WindowAll

~~~     # DataStream → AllWindowedStream
~~~     Windows can be defined on regular DataStreams. Windows group all the stream events according to some characteristic (e.g., the data that arrived within the last 5 seconds). See windows for a complete description of windows.
~~~     WARNING: This is in many cases a non-parallel transformation. All records will be gathered in one task for the windowAll operator.
dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5))); // Last 5

seconds of data
### --- 10、Window Apply

~~~     # WindowedStream → DataStream
~~~     AllWindowedStream → DataStream
~~~     Applies a general function to the window as a whole. Below is a function that manually sums the elements of a window.
~~~     Note: If you are using a windowAll transformation, you need to use an AllWindowFunction instead.
windowedStream.apply (new WindowFunction<Tuple2<String,Integer>, Integer, Tuple,
Window>() {
    public void apply (Tuple tuple,
        Window window,
        Iterable<Tuple2<String, Integer>> values,
        Collector<Integer> out) throws Exception {
    int sum = 0;
    for (value t: values) {
        sum += t.f1;
    }
    out.collect (new Integer(sum));
  }
});

// applying an AllWindowFunction on non-keyed window stream allWindowedStream.apply (new AllWindowFunction<Tuple2<String,Integer>, Integer,Window>() {
    public void apply (Window window,
        Iterable<Tuple2<String, Integer>> values,
        Collector<Integer> out) throws Exception {
    int sum = 0;
    for (value t: values) {
        sum += t.f1;
    }
    out.collect (new Integer(sum));
  }
});
### --- 11、Window Reduce

~~~     # WindowedStream → DataStream
~~~     Applies a functional reduce function to the window and returns the reduced value.
windowedStream.reduce (new ReduceFunction<Tuple2<String,Integer>>() {
    public Tuple2<String, Integer> reduce(Tuple2<String, Integer> value1,
Tuple2<String, Integer> value2) throws Exception {
    return new Tuple2<String,Integer>(value1.f0, value1.f1 + value2.f1);
  }
});
### --- 12、Window Fold

~~~     # WindowedStream → DataStream
~~~     Applies a functional fold function to the window and returns the folded value. The example function, when applied on the sequence (1,2,3,4,5), folds the sequence into the string "start-1-2-3-4-5":
windowedStream.fold("start", new FoldFunction<Integer, String>() {
    public String fold(String current, Integer value) {
        return current + "-" + value;
    }
});
### --- 13、Aggregations on windows

~~~     # WindowedStream → DataStream
~~~     Aggregates the contents of a window. The difference between min and minBy is that min returns the minimum value, whereas minBy returns the element that has the minimum value in this field (same for max and maxBy).
windowedStream.sum(0);
windowedStream.sum("key");
windowedStream.min(0);
windowedStream.min("key");
windowedStream.max(0);
windowedStream.max("key");
windowedStream.minBy(0);
windowedStream.minBy("key");
windowedStream.maxBy(0);
windowedStream.maxBy("key");
### --- 14、Union

~~~     # DataStream → DataStream
~~~     Union of two or more data streams creating a new stream containing all the elements from all the streams. Note: If you union a data stream with itself you will get each element twice in the resulting stream.
dataStream.union(otherStream1, otherStream2, ...);
### --- 15、Window Join

~~~     # DataStream,DataStream → DataStream
~~~     Join two data streams on a given key and a common window.
dataStream.join(otherStream)
    .where(<key selector>).equalTo(<key selector>)
    .window(TumblingEventTimeWindows.of(Time.seconds(3)))
    .apply (new JoinFunction () {...});
### --- 16、Interval Join

~~~     # KeyedStream,KeyedStream → DataStream
~~~     Join two elements e1 and e2 of two keyed streams with a common key over a given time interval, so that e1.timestamp + lowerBound <= e2.timestamp <= e1.timestamp + upperBound
~~~     # this will join the two streams so that

~~~     # key1 == key2 && leftTs - 2 < rightTs < leftTs + 2 keyedStream.intervalJoin(otherKeyedStream)
    .between(Time.milliseconds(-2), Time.milliseconds(2)) // lower and upperbound
    .upperBoundExclusive(true) // optional
    .lowerBoundExclusive(true) // optional
    .process(new IntervalJoinFunction() {...});
### --- 17、Window CoGroup

~~~     #  DataStream,DataStream → DataStream
~~~     Cogroups two data streams on a given key and a common window.
dataStream.coGroup(otherStream)
    .where(0).equalTo(1)
    .window(TumblingEventTimeWindows.of(Time.seconds(3)))
    .apply (new CoGroupFunction () {...});
### --- 18、Connect

~~~     # DataStream,DataStream → ConnectedStreams
~~~     "Connects" two data streams retaining their types. Connect allowing for shared state between the two streams.
DataStream<Integer> someStream = //...
DataStream<String> otherStream = //...
    
ConnectedStreams<Integer, String> connectedStreams = someStream.connect(otherStream);
### --- 19、CoMap, CoFlatMap

~~~     # ConnectedStreams → DataStream
~~~     Similar to map and flatMap on a connected data stream
connectedStreams.map(new CoMapFunction<Integer, String, Boolean>() {
    @Override
    public Boolean map1(Integer value) {
        return true;
    }

@Override
public Boolean map2(String value) {
    return false;
}
});
connectedStreams.flatMap(new CoFlatMapFunction<Integer, String, String>() {
    @Override
    public void flatMap1(Integer value, Collector<String> out) {
        out.collect(value.toString());
    }

    @Override
    public void flatMap2(String value, Collector<String> out) {
        for (String word: value.split(" ")) {
            out.collect(word);
        }
    }
});
### --- 20、Split

~~~     # DataStream → SplitStream
~~~     Split the stream into two or more streams according to some criterion.
SplitStream<Integer> split = someDataStream.split(new OutputSelector<Integer>()
{
    @Override
    public Iterable<String> select(Integer value) {
        List<String> output = new ArrayList<String>();
        if (value % 2 == 0) {
            output.add("even");
        }
        else {
            output.add("odd");
        }
        return output;
    }
});
### --- 21、Select

~~~     # SplitStream → DataStream
~~~     Select one or more streams from a split stream.
SplitStream<Integer> split;
DataStream<Integer> even = split.select("even");
DataStream<Integer> odd = split.select("odd");
DataStream<Integer> all = split.select("even","odd");
### --- 22、Iterate

~~~     # DataStream → IterativeStream → DataStream
~~~     Creates a "feedback" loop in the flow, by redirecting the output of one operator to some previous operator. This is especially useful for defining algorithms that continuously update a model. The following code starts with a stream and applies the iteration body continuously. Elements that are greater than 0 are sent back to the feedback channel, and the rest of the elements are forwarded downstream. See iterations for a complete description.
IterativeStream<Long> iteration = initialStream.iterate();
DataStream<Long> iterationBody = iteration.map (/*do something*/);
DataStream<Long> feedback = iterationBody.filter(new FilterFunction<Long>(){
    @Override
    public boolean filter(Long value) throws Exception {
        return value > 0;
    }
});
iteration.closeWith(feedback);
DataStream<Long> output = iterationBody.filter(new FilterFunction<Long>(){
    @Override
    public boolean filter(Long value) throws Exception {
    return value <= 0;
    }
});
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

yanqi_vip

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值