调用filter方法,rdd中的每个元素都会传入,然后只需要在call方法中写判断逻辑来判断这个元素是不是你想要的,如果是则返回true,否的话,返回false
private static void myFilter(){
List<Integer> list=Arrays.asList(1,2,3,4,5,6,7,8,9,10);SparkConf conf=new SparkConf()
.setMaster("local")
.setAppName("myFilter");
JavaSparkContext sc=new JavaSparkContext(conf);
JavaRDD<Integer> listRdd=sc.parallelize(list);
//对传入的每个值,在call方法中判断是不是想要的,如果不是返回false,如果是返回true
JavaRDD<Integer> listFileter=listRdd.filter(new Function<Integer,Boolean>(){
private static final long serialVersionUID = 1L;
@Override
public Boolean call(Integer num) throws Exception {
// TODO Auto-generated method stub
return num>5?true:false;
}
});
listFileter.foreach(new VoidFunction<Integer>(){
private static final long serialVersionUID = 1L;
@Override
public void call(Integer n) throws Exception {
// TODO Auto-generated method stub
System.out.println("n:"+n);
}
});
}
结果:
n:6
n:7
n:8
n:9
n:10