需求实现:在nc -lk 8888窗口下执行输入数据,把输入的单词转为大写在控制台输出
具体代码:
package cn._51doit.flink.day02;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
/**
* Transformation的map使用
*/
public class MapDemo1 {
public static void main(String[] args) throws Exception {
// StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//查看本地的并行度
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(new Configuration());
//spark操作
DataStreamSource<String> words = env.socketTextStream("Master", 8888);
//将输入的单词变大写
SingleOutputStreamOperator<String> upperWords = words.map(new MapFunction<String, String>() {
@Override
public String map(String value) throws Exception {
return value.toUpperCase();
}
});
upperWords.print();
env.execute();
}
}
map底层的实现(源码解析)
(1)点击map进去查看,需要传进去一个MapFunction,它定义时一个标准的接口、抽象类,经过处理后会返回o类型,而T跟o可以时同一个类型,也可以是不同类型
(2)在MapFunction接口中的TypeInformation方法,它需要返回一个clean,需要进行对闭包引用类型序列化的检测
(3)Map底层调的是transform方法,有个outputType和new StreamMap<>(clean(mapper)),就是把一个函数的逻辑传全
Lambda表达式的使用
package cn._51doit.flink.day02;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
/**
* Transformation的Lambda表达式的使用
* java8支持Lambda表达式
*
* 需求实现:在nc -lk 8888窗口下执行输入数据,把输入的单词转为大写在控制台输出
*
*/
public class LambdaDemo1 {
public static void main(String[] args) throws Exception {
// StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//查看本地的并行度
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(new Configuration());
//spark操作
DataStreamSource<String> words = env.socketTextStream("Master", 8888);
SingleOutputStreamOperator<String> upperWord = words.map(w -> w.toUpperCase());
upperWord.print();
env.execute();
}
}
打印输出:
Transformation的map底层的实现
底层重写map(mapper, outType),其中,mapper是MapFunction的实例,outType是反射的类型【通过反射方式获取返回类型】
代码实现:有界流
package cn._51doit.flink.day02;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.operators.StreamMap;
/**
* Transformation的map底层的实现
* 底层重写map(mapper, outType),其中,mapper是MapFunction的实例,outType是反射的类型【通过反射方式获取返回类型】
*
* 需求实现:根据自定义的元素数据采用map算子进行操作
*/
public class MapDemo2 {
public static void main(String[] args) throws Exception {
// StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//查看本地的并行度
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(new Configuration());
//定义元素
DataStreamSource<Integer> nums = env.fromElements(1, 2, 3, 4, 5, 6, 7, 8, 9);
//对上面的元素进行操作【第二个参数返回数据类型,第三个参数处理数据】
SingleOutputStreamOperator<Integer> doubled = nums.transform("MyMap", TypeInformation.of(Integer.class), new StreamMap<>(new MapFunction<Integer, Integer>() {
@Override
public Integer map(Integer value) throws Exception {
return value * 2;
}
}));
doubled.print();
env.execute();
}
}
控制台打印输出:
采用最底层方法
继承AbstractStreamOperator方法,后实现OneInputStreamOperator方法,再重写processElement方法
package cn._51doit.flink.day02;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
/**
* Transformation的map底层的实现
*
* 需求实现:根据自定义的元素数据采用map算子进行操作【采用最底层方法】(数据是有界流)
*
*/
public class MapDemo3 {
public static void main(String[] args) throws Exception {
// StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//查看本地的并行度
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(new Configuration());
//定义元素
DataStreamSource<Integer> nums = env.fromElements(1, 2, 3, 4, 5, 6, 7, 8, 9);
//对上面的元素进行操作【第二个参数返回数据类型,第三个参数处理数据】
SingleOutputStreamOperator<Integer> doubled = nums.transform("MyMap", TypeInformation.of(Integer.class),new MyStreamMap());
doubled.print();
env.execute();
}
public static class MyStreamMap extends AbstractStreamOperator<Integer> implements OneInputStreamOperator<Integer,Integer>{
//重写processElement方法
@Override
public void processElement(StreamRecord<Integer> element) throws Exception {
//输入方法
Integer i = element.getValue();
Integer j = i * 2;
//将要输出的数据放入到element【但是没有输出】
element.replace(j);
//输出数据
output.collect(element);
}
}
}
控制台打印输出: