在Flink中,filter
算子用于对数据流进行过滤,只有满足指定条件的元素才能通过这个算子
filter()转换操作,顾名思义是对数据流执行一个过滤,通过一个布尔条件表达式设置过滤 条件,对于每一个流内元素进行判断,若为 true 则元素正常输出,若为 false 则元素被过滤掉
进行 filter()转换之后的新数据流的数据类型与原数据流是相同的。filter()转换需要传入的 参数需要实现 FilterFunction 接口,而 FilterFunction 内要实现 filter()方法,就相当于一个返回 布尔类型的条件表达式。
实例1,计算出偶数
package flink.transform.filter;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
/**
* 过滤出偶数
*/
public class Test {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Integer> numbers = env.fromElements(1, 2, 3, 4, 5, 6);
DataStream<Integer> filteredNumbers = numbers.filter(new FilterFunction<Integer>() {
@Override
public boolean filter(Integer number) throws Exception {
return number % 2 == 0;
}
});
filteredNumbers.print();
env.execute("Flink Filter Example");
}
}
实例2,有netcat输入,1,3,5,7,9,偶数输出,奇数不输出
/**
* 使用MAP将字符串数据先转换成整形数据
*/
public class Test2 {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> numbersString = env.socketTextStream("127.0.0.1",9999);
DataStream<Integer> mapNumbers = numbersString.map(new MapFunction<String, Integer>() {
@Override
public Integer map(String s) throws Exception {
return Integer.parseInt(s);
}
});
DataStream<Integer> filteredNumbers = mapNumbers.filter(new FilterFunction<Integer>() {
@Override
public boolean filter(Integer number) throws Exception {
boolean b = number % 2 == 0;
if(b==false){
System.out.println("不是偶数不输出");
}
return number % 2 == 0;
}
});
filteredNumbers.print();
env.execute("Flink Filter Example");
}
}
实例3:在netcat中输入数据,zhangsan,m,18
lisi,w,12
过滤出m为男性的数据并输出
/**
* 张三,m,12
* 李四,w,13
* 过滤出性别为男性的数据 m,为男性
*/
public class Test3 {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> peopleString = env.socketTextStream("127.0.0.1",9999);
DataStream<String> out = peopleString.filter(new FilterFunction<String>() {
@Override
public boolean filter(String s) throws Exception {
String sex = s.split(",")[1];
if(sex.toLowerCase().equals("m")){
return true;
}
return false;
}
});
out.print();
env.execute("Flink Filter Example");
}
}
实例4
过滤出字符串长度大于3的字符串输出 数据为abc,tomcj,lisi4,使用
package flink.transform.filter;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
/**
* abc,tomcj,lisi4
*
* filtermap
* flink.transform.filter
* 过滤出字符串长度大于3的字符串输出
*/
public class Test4 {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> peopleString = env.socketTextStream("127.0.0.1",9999);
SingleOutputStreamOperator<String> sds = peopleString.flatMap(new FlatMapFunction<String, String>() {
@Override
public void flatMap(String s, Collector<String> collector) throws Exception {
String[] array = s.split(",");
for (String t:array){
collector.collect(t);
}
}
});
DataStream<String> out = sds.filter(new FilterFunction<String>() {
@Override
public boolean filter(String s) throws Exception {
return s.length()>3==true;
}
});
out.print();
env.execute("Flink Filter Example");
}
}
flatMap,和filter结合