1. 获得 execution 环境
getExecutionEnvironment()
createLocalEnvironment()
createRemoteEnvironment(String host, int port, String... jarFiles)`
批处理示例:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> text = env.readTextFile("file:///e:\\wordcounts.txt");
text.print();
流处理示例:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.readTextFile("file:///e:\\wordcounts.txt");
text.print();
env.execute();
2. 一对一转换操作
一对一操作主要是对源数据进行一对一转换,如map、flatmap和filter等。
批处理示例:
DataSet<Tuple2<String, Integer>> words = text.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
return new Tuple2<>(s, 1);
}
});
words.print();
流处理示例:
DataStream<Tuple2<String, Integer>> words = text.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
return new Tuple2<>(s, 1);
}
});
words.print()
3. KeyBy/GroupBy指定键值分类
KeyBy为流处理接口,GroupBy为批处理接口。
KeyedStream<Tuple2<String, Integer>, Tuple> keyed = words.keyBy(0); //0 代表 Tuple2 (二元组)中第一个元素
KeyedStream<Tuple2<String, Integer>, Tuple> keyed = words.keyBy(0,1); //0,1 代表二元组中第一个和第二个元素作为 key\
DataStream<Tuple3<Tuple2<Integer, Float>,String,Long>> ds;ds.keyBy(0) 将会把 Tuple2<Integer, Float> 整体作为 key