flink---5 dataStream api (2)transformation和parallel和sink

常见transformation

TransformationDescription
Map
DataStream → DataStream

输入一个元素,然后返回一个元素,中间可以做一些清洗转换等操作

DataStream<Integer> dataStream = //...
dataStream.map(new MapFunction<Integer, Integer>() {
    @Override
    public Integer map(Integer value) throws Exception {
        return 2 * value;
    }
});
    
FlatMap
DataStream → DataStream

输入一个元素,可以返回零个,一个或者多个元素

dataStream.flatMap(new FlatMapFunction<String, String>() {
    @Override
    public void flatMap(String value, Collector<String> out)
        throws Exception {
        for(String word: value.split(" ")){
            out.collect(word);
        }
    }
});
    
Filter
DataStream → DataStream

过滤函数,对传入的数据进行判断,符合条件的数据会被留下

dataStream.filter(new FilterFunction<Integer>() {
    @Override
    public boolean filter(Integer value) throws Exception {
        return value != 0;
    }
});
    
KeyBy
DataStream → KeyedStream

根据某个key值进行分组

dataStream.keyBy("someKey") // Key by field "someKey"
dataStream.keyBy(0) // Key by the first element of a Tuple
    

以下两种类型是没法作为key的

1.一个实体类对象,没有重写hashCode方法,并且依赖object的hasCode

2.任意形式的数组类型

3.基本数据类型,比如int long

Reduce
KeyedStream → DataStream

l对数据进行聚合操作,结合当前元素和上一次reduce返回的值进行聚合操作,然后返回一个新的值

keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});
            

 

Aggregations
KeyedStream → DataStream

聚合操作

keyedStream.sum(0);
keyedStream.sum("key");
keyedStream.min(0);
keyedStream.min("key");
keyedStream.max(0);
keyedStream.max("key");
keyedStream.minBy(0);
keyedStream.minBy("key");
keyedStream.maxBy(0);
keyedStream.maxBy("key");
    
Aggregations on windows
WindowedStream → DataStream

Aggregates the contents of a window. The difference between min and minBy is that min returns the minimum value, whereas minBy returns the element that has the minimum value in this field (same for max and maxBy).

windowedStream.sum(0);
windowedStream.sum("key");
windowedStream.min(0);
windowedStream.min("key");
windowedStream.max(0);
windowedStream.max("key");
windowedStream.minBy(0);
windowedStream.minBy("key");
windowedStream.maxBy(0);
windowedStream.maxBy("key");
    
Union
DataStream* → DataStream

合并两个流。注意两个流的类型必须是一致的

 

dataStream.union(otherStream1, otherStream2, ...);
    
Connect
DataStream,DataStream → ConnectedStreams

和union类似,但是只能连接两个流,但是两种流的类型可以不一样

DataStream<Integer> someStream = //...
DataStream<String> otherStream = //...

ConnectedStreams<Integer, String> connectedStreams = someStream.connect(otherStream);
    
CoMap, CoFlatMap
ConnectedStreams → DataStream

Similar to map and flatMap on a connected data stream

该方法通常用于流collect之后

DataStream<String> dataStreamSource2 = streamExecutionEnvironment.addSource(new MyNoParalleSource()).map(t->{
    return String.valueOf(t+"str");
});
ConnectedStreams<Long,String> connect = dataStreamSource1.connect(dataStreamSource2);
SingleOutputStreamOperator<Object> env2 =  connect.map(new CoMapFunction<Long, String, Object>() {
    @Override
    public Object map1(Long aLong) throws Exception {
        return aLong;
    }

    @Override
    public Object map2(String s) throws Exception {
        return s;
    }
});
env2.print();
streamExecutionEnvironment.execute();
Split
DataStream → SplitStream

根据规则把一个流切分成多个流

SplitStream<Integer> split = someDataStream.split(new OutputSelector<Integer>() {
    @Override
    public Iterable<String> select(Integer value) {
        List<String> output = new ArrayList<String>();
        if (value % 2 == 0) {
            output.add("even");
        }
        else {
            output.add("odd");
        }
        return output;
    }
});
                
 DataStream<Long> even = split.select("even","odd");
Select
SplitStream → DataStream

Select one or more streams from a split stream.

SplitStream<Integer> split;
DataStream<Integer> even = split.select("even");
DataStream<Integer> odd = split.select("odd");
DataStream<Integer> all = split.select("even","odd");
                

 

 parallel操作

TransformationDescription
Custom partitioning
DataStream → DataStream

Uses a user-defined Partitioner to select the target task for each element.

dataStream.partitionCustom(partitioner, "someKey");
dataStream.partitionCustom(partitioner, 0);
            

 

Random partitioning
DataStream → DataStream

随机分配

dataStream.shuffle();
            

 

Rebalancing (Round-robin partitioning)
DataStream → DataStream

对数据集进行再平衡,重分区,消除数据倾斜

dataStream.rebalance();
            

 

Rescaling
DataStream → DataStream

 

 

如果上游操作有两个并发,而下游操作有4个并发,那么上还有的一个并发结果会分配给下游的两个并发操作,另外一的一个并发结果分配给了下游的另外两个并发操作,如果上游操作并发数目是4个,下游是两个,那么那么上面两个操作结果分配给一下游一个。

如果不同的并行性不是彼此的倍数,那么一个或多个下游操作将具有不同数量的上游操作输入。

Rescaling和Rebalance的不同点在于Rebalance会全量重新分区,而Rescaling不会

dataStream.rescale();
            

 

Broadcasting
DataStream → DataStream

Broadcasts elements to every partition.

dataStream.broadcast();
            

 

一个使用redis作为sink的例子 

  StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment();
        DataStream<String> text = env.socketTextStream("localhost",9000, "\n");
        DataStream<Tuple2<String,String>> l_word = text.map(new MapFunction<String, Tuple2<String, String>>() {
            public Tuple2<String, String> map(String s) throws Exception {
                return new Tuple2<String, String>("l_word",s);
            }
        });
         FlinkJedisPoolConfig localhost = new FlinkJedisPoolConfig.Builder().setHost("localhost").setPort(6379)
            .build();
        final RedisSink<Tuple2<String, String>> tuple2RedisSink = new RedisSink<Tuple2<String, String>>(localhost,
            new MyRedisMapper());
        l_word.addSink(tuple2RedisSink);
        env.execute();

    }

    public static class MyRedisMapper implements RedisMapper<Tuple2<String,String>>{

        public RedisCommandDescription getCommandDescription() {
            return new RedisCommandDescription(RedisCommand.LPUSH);
        }

        public String getKeyFromData(Tuple2<String, String> stringStringTuple2) {
            return stringStringTuple2.f0;
        }

        public String getValueFromData(Tuple2<String, String> stringStringTuple2) {
            return stringStringTuple2.f1;
        }
    }
}

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值