Flink(1.11) 核心编程——案例数据有所省略

五、核心编程

5.1 Environment

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vvja8Wrd-1606754072421)(https://i.loli.net/2020/11/30/tM8sr2XA1LUZiwV.jpg)]

#flink提交jar包的方式
bin/flink run 
-m hadoop102:6123
-c Flink02_WordCount_BoundStream  #类名
/opt/module/data/flink-wc.jar     #jar包路径

FlinkJob在提交执行计算时,需要首先建立和Flink框架之间的联系,也就是指的是当前的flink运行环境,只有获取了环境信息,才能将task调度到不同的taskManager执行,而且这个环境对象的获取方式相对比较简单:

//批处理对象
ExecutionEnvironment env = ExecutionEnvironmnet.getxecutionEnvironment( );

//流式数据处理环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment( );

5.2 Source —— 从哪获取(消费)数据

img

5.2.1 读取文件数据

--main
    1.创建环境
    StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);  //设置并行度

	2.Source
        //readCollection  读取文件数据
        DataStreamSource<String> filsDS = env.readCollection("input/word.txt");
        //fromCollection   读取一个集合数据
	   DataStreamSource<String> collections = env.fromCollection(Array.asList("1","2","3"));
        //fromElement   读取多个集合数据
	   DataStreamSource<Integer> elementDS = env.fromElement(1,2,3,4,5,6,7,8);

	   env.executor("source job");

5.2.2 读取kafka数据

--main
    1.创建环境
    StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);  //设置并行度
	
	2.读取kafka的数据
		Properties properties = new Properties();
		properties.setProperty("bootstrap.server","hadoop102:9092");
		properties.setProperty("group.id","aaaaaa");

//添加数据源,new一个kafka的消费者对象,设置从最早开始消费
		DataStreamSource<String> kafkaSource = env.addSource(
        	new FlinkKafkaConsumer1<String>(
            	"sensor0621",  //主题
                new SimpleStringScheme(),   
                properties  //连接对象
            ).setStartFromEarliest()	
        );

		kafkaSource.print();

		env.execute();

5.2.3 自定义数据源

--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.自定义数据源
		DataStreamSource<WaterSensor> inputDS = env.addSource(new MySourceFunction());

		inputDS.print();
		
		env.execute();

--自定义类实现SourceFunction方法,泛型为自定义类
    public static class MySourceFunction implements SourceFunction<WaterSensor> {
        //可见性问题
        private volatile boolean isRunning = true;
        
        @Override
        public void run(SourceContext<WaterSensor> ctx) throws Exception {
            Random random = new RAndom();
            while (isRunning) {
                ctx.collect(
                	new WaterSensor("sensor_" + random.nextInt(3)),   //id取3以内的随机数
                    				System.currentTimeMillis(),
                    				random.nextInt(10)+40 )
                );
                Thread.sleep(1000);   //睡一秒
            }
        }
        
        @Override
        public void cancel(){
            isRunning = false;
        }
    }

5.3 Transform —— 处理数据

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XOd2En5a-1606754072425)(https://i.loli.net/2020/11/30/YtQFk4oXsxT1en8.jpg)]

5.3.1 Map

普通函数   MapFunction
富函数    RichMapFunction

--富函数
1.生命周期:可以用于外部环境的管理
	初始化时 用open
	没数据时 调用close
    读文件要关闭两次

2.运行时上下文
	可以获取环境信息,状态
--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
	DataStreamSource<Integer> numDS = env.fromElements(1,2,3,4,5);

	//TODO  Map
	SingleOutStreamOperator<String> resultDS = numDS.map(new MapFunction());

	resultDS.print();

	env.execute();

--自定义类
    public static class MyMapFunction implements MapFunction<Integer , String> {
    	@Override
        public String map(Integer value) throw Exception {
    		return String.valueOf(value * 2) + " =========== ";
		}
    }

5.3.2 MapRichFunction

--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
	DataStreamSource<Integer> numDS = env.fromElements(1,2,3,4,5);
	/*
	读文件会有特殊处理
		env.readTextFile("input/word.txt")
			.map(new MyRichMapFunction())
			.print();
	*/

	//TODO  RichFunction   富函数
	//1.生命周期方法:open、close   => 可以用于外部连接  管理
	//2.运行时上下文:RuntimeContext  =>   可以获取  环境信息,状态。。。
	SingOutputStreamOperator<String> resultDs = numDs.map(new MyRichMapFunction());

	resultDS.print();

	env.execute();

--自定义方法
    public static class MyRichMapFunction extends RichMapFunction<Integer,String> {
        
         @Override
        public void open(Configuration parameters) throws Exception {
            System.out.println("open ...");
        }
        
         @Override
        public void close() throw Exception {
            System.out.println("close ...")
        }
        
         @Override
        public String map(Integer value) throws Exception {
            return value + "==========="+getRuntimeContext().getTaskNameWithSubtasks();
        }
    }

5.3.3 FlatMap

1.输出数据的时候,使用采集器,collect
2.可以实现类似过滤的效果,不满足条件不用采集器往下游发送
--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
	//TODO flatmap  压平:一进一出、一进零出
    	//1.可以试想类似过滤的效果,不满足条件,就不用采集器网下游发送
        
       	env 
        	.fromElement(
    			Array.alList(1,2,3,4)
    		)
        	.flatMap(new FlatMapFunction< List<Integer>,String >(){
                @Override
                public void flatmap(List<Integer> value, Collector<String> out) throw Exception {
                    for (Integer num : value) {
                        if (num % 2 == 0 ) {
                            out.collect(num + " ");
                        }
                    }
                }
            });
			.print();
		
		env.execute();

5.3.4 Filter

--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
    DataStreamSource<Integer> numDS = env.fromElements(1, 2, 3, 4, 5);

	//TODO Filter	
	numDS
        /*
        .fliter(new FilterFunction<Integer>(){
            @Override
            public boolean filter(Integer value) throw Exception {
                return value % 2 == 0;
            }
        })
        */
        .filter(data   ->   data % 2 == 0 )
        .print();

	env.execute();
        

5.3.5 Keyby

返回类型有两个泛型,固定返回一个Tuple
使用位置索引方式,只能确定位置,程序无法确定类型,所以统一给的一个Tuple
使用属性名的方式,只能确定名字,程序无法确定类型

--正确的使用:
		KeyedStream<WaterSensor,String> sensorKS = sensorDS.keyBy(sensor  -> sensor.getId());

--源码简介:
        	key做了两次hash:
        			第一次:自己调用hashcode方法
        			第二次:mermerhash
        	两次hash之后,对默认值128   取模 ,得到一个ID值
        	ID值 * 并行度  /  默认值128  得到selectChannel 的Channel值

--keyby是一个逻辑上的分组,跟资源没有强绑定
--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
     SingleOutputStreamOperator<WaterSensor> sensorDS = env 
        .readTextFile("input/seneor-date.log")
        .map(new MapFunction<String value> throws Exception {
            String[] datas = value.split(",");
            return new WaterSensor(data[0], 
                                   Long.valueOf(datas[1]),
                                   Integer.valueOf(datas[2]))
        })
        
	//TODO Keyby
        /*
        keyby对数据进行 分组 => 同一组的数据在一起
        keyby是一个逻辑上的分组,跟资源没有强绑定
        */
        
        /*
        源码简介:
        	key做了两次hash:
        			第一次:自己调用hashcode方法
        			第二次:mermerhash
        	两次hash之后,对默认值128   取模 ,得到一个ID值
        	ID值 * 并行度  /  默认值128  得到selectChannel 的Channel值
        */
        
	SingleOutputStreamOperator<Tuple3<String,Long.Integer>> sensorTupleDS = sensorDS
        .map(new MapFunction<WaterSensor,Tuple3<String,Long,Integer>>(){
            @Override
            public Tuple3<String, Long, Integer> map(WaterSensor value) throws Exception {
            	return Tuple3.of(value.getId(), value.getTs(), value.getVc());
            	}	
    }); 

	
	//使用位置索引的方式,只能确定位置,程序无法确定类型,所以统一给Tuple
	KeyedStream<Tuple3<String,Long,Integer>,Tuple> sensorKs = sensorTupleDS.keyBy(0);
	//使用属性名的方式,只能确定名字,程序无法确定类型,所以统一给Tuple
	KeyedStream<WaterSensor, Tuple>	sensorKS  = seneorDS.keyBy("id");


/*
正确写法
*/
//写法一:
	KeyedStream<WaterSensor,String> sensorKS = sensorDS.keyBy(new KeySelector<WaterSensor,String>(){
        //从数据里提取(指定)key
        @Override
        public String getKey(WaterSensor value) throw Exception {
            return value.getId();
        }
    });
//写法二:
	KeyedStream<WaterSensor,String> sensorKS = sensorDS.keyBy(sensor  -> sensor.getId());


	sensorKS.print();

	env.execute();

--自定类
    public static class MyMapFunction implements MapFunction<Integer , String> {
        @Override
        public String map (Integer value) throws Exception {
            return String.valueOf(value * 2 ) + " === ";
        }
    }

5.3.6 shuffle

--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
        /*
        读取文件数据
        使用map方法
        将数据封装成样例类
        */
     SingleOutputStreamOperator<WaterSensor> sensorDS = env
        .readTextFile("input/sensor-date.log")
        .map(new MapFunction<String,WaterSensor>(){
            @Override
            public WaterSensor map(String value) throws Exception {
                String[] datas = value.split(",");
                return new WaterSensor(datas[0],
                                       Long.valueOf(datas[1]),
                                       Integer.valueOf(datas[2]) );
            }
        });
	
	sensorDS.print("sensor");

	//TODO   Shuffle
	DataStream<WaterSensor> shuffleDS =sensorDS.shuffle();

	shuffleDS.print("shuffle");

	env.execute();

--自定义类
      public static class MyMapFunction implements MapFunction<Integer, String> {

        @Override
        public String map(Integer value) throws Exception {
            return String.valueOf(value * 2) + "===================";
        }
    }

5.3.7 Split —— new OutputSelect

--将数据分流打标签,逻辑上切分,打赏标签,
 
--实际上在一个流里,要用时,通过标签取
--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
     SingleOutputStreamOperator<WaterSensor> sensorDS = env
        .readTextFile("input/seneor-data.log")
        .map(new MapFunction<String,WaterSEnsor>() {
            @Override
            public WaterSensor map(String value) throws Exception {
                String[] datas = value.split(",");
                return new WAterSensor(datas[0],Long.Valueof(datas[1]),Integer.valueOf(date[2]));
            }
        });

	//TODO Split  &   Select
	//逻辑上做切分   打上标签,实际上还在一天流里面
	//要用的时候,通过select (标签名) 取出特定的数据

	SplitStream<WaterSensor> sensorSS = sensorDs.split(new OutputSelect<WaterSensor>(){
        @Override
        public Interable<String> select(WaterSensor value) {
           if (value.getVc() < 5){
               return Array.asList("low","hahaha");
           }else if (value.getVc() < 8) {
               return Array.asList("middle","hahaha");
           }else {
               return Arrays.asList("hight");
           }
        }
    });

	sensorSS.select("hahaha").print();

	env.execute();

合流

5.3.8 Connect

--connect  连接两条类型不一致的流
	1.只能连接  两条流
	2.两条流的 数据类型  可以不一样
--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
     SingleOutputStreamOperator<WaterSensor> sensorDS = env
        .readTextFile("input/sensor-date.log")
        .map(new MapFunction<String,WaterSensor>(){
            @Override
            public WaterSensor map(String value) throws Exception {
                String[] datas =value.split(",");
                return new WaterSensor(datas[0],
                                       Long.valueOf(datas[1]),
                                       Long.valueOf(datas[2]) );
            }
        }).setParallelism(2);

	DataStreamSource<Integer> numDS = env.fromElements(1,2,3,4)
        
     //connect
	ConnectedStreams<WaterSensor,Integer> sensorNumCS = sensorDS.connect(NUmDS);

	//TODO connect  连接两条流
	//1.只能连接  两条流
	//2.两条流的 数据类型  可以不一样

	sensorNumCS
        .map(new CoMapFunction<WaterSensor, Integer, Object>(){
            @Override
            public Object map1(WaterSensor value) throws Exception {
                return "我是WaterSensor" + value ;
            }
            
            @Override
            public Object map2(Integer value) throws Exception {
                return "我是小x" + value ;
            }
        })
        .print("aaa");

	env.execute();

5.3.9 Union

 --union  连接多条类型一致的流
	1.可以合并多条流
	2.每条流 数据类型  必须一致
--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

    2.读数据
    SingleOutputStreamOperator<WaterSensor> sensorDS = env
            .readTextFile("input/sensor-data.log")
            .map(new MapFunction<String, WaterSensor>() {
                @Override
                public WaterSensor map(String value) throws Exception {
                    String[] datas = value.split(",");
                    return new WaterSensor(datas[0], Long.valueOf(datas[1]), Integer.valueOf(datas[2]));
                }
            });

 	DataStreamSource<Integer> numDS = env.fromElements(1, 2, 3, 4);
    DataStreamSource<Integer> numDS1 = env.fromElements(11, 12, 13, 14);
    DataStreamSource<Integer> numDS2 = env.fromElements(111, 112, 113, 114);


	//connect   连接两条类型不一致的流
	ConnectedStreams<WaterSensor,Integer> sensorNumCS = sensorDs.connect(numDS);

	//TODO union
	//1.可以合并多条流
	//2.每条流 数据类型  必须一致
	DataStream<Integer> resultDS =numDS.union(numDS1).union(numDS2);

	resultDS.print();

	env.executor();

5.4 Operator —— 计算

5.4.1 RollAgg

--这些聚合必须是分组之后调用

--只更新指定的字段,其它字段以第一条为准
--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
    SingleOutputStreamOperator<WaterSensor> sensorDS = env
            .readTextFile("input/sensor-data.log")
            .map(new MapFunction<String, WaterSensor>() {
                @Override
                public WaterSensor map(String value) throws Exception {
                    String[] datas = value.split(",");
                    return new WaterSensor(datas[0], Long.valueOf(datas[1]), Integer.valueOf(datas[2]));
                }
    });

	3.分组
     KeyedStream<WaterSensor,String> sensorKS = sensorDS.keyBy(sensor -> sensor.getId());

	4.求和,求最大值,求最小值
    sensorKS.sum("vc").print("sum");
	sensorKS.max("vc").print("max");
	sensorKS.min("vc").print("min");

	env.execute();
        

5.4.2 Reduce —— new ReduceFunction

--返回值类型,原来数据怎么样,返回类型就怎么样

1.【返回类型与输入类型一致】

2.同一组的数据,进行reduce

3.每一组的第一条数据不会进入reduce方法
--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
    SingleOutputStreamOperator<WaterSensor> sensorDS = env
        .readTextFile("input/sensor-data.log")
            .map(new MapFunction<String, WaterSensor>() {
                @Override
                public WaterSensor map(String value) throws Exception {
                    String[] datas = value.split(",");
                    return new WaterSensor(datas[0], Long.valueOf(datas[1]), Integer.valueOf(datas[2]));
                }
            });

	3.分组
     KeyedStream<WaterSensor,String> sensorKS = sensorDS.keyBy(sensor -> sensor.getId());

	//TODO  Reduce
	/*
	返回类型  必须  与输入类型   一致
	同一组的数据,进行reduce
	同一组的第一条数据,不会进入reduce方法
	*/
	SingleOutputStreamOperator<WaterSensor> resultDS = sensorKS.reduce (
    	new ReduceFunction<Watersensor>(){
            @Override
            public WaterSensor reduce(WaterSensor value1,WaterSensor value2) throws Exception{
                System.out.print(value1 + " -------- " + value2);
            }
        }
    );

	resuleDS.print();

	env.execute();

5.4.3 Process

泛型:

<key值,输入类型,输出类型>

--env调用source

--流调用sink
--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
    SingleOutputStreamOperator<WaterSensor> sensorDS = env
        .readTextFile("input/sensor-data.log")
            .map(new MapFunction<String, WaterSensor>() {
                @Override
                public WaterSensor map(String value) throws Exception {
                    String[] datas = value.split(",");
                    return new WaterSensor(datas[0], Long.valueOf(datas[1]), Integer.valueOf(datas[2]));
                }
            });

	3.分组
     KeyedStream<WaterSensor,String> sensorKS = sensorDS.keyBy(sensor -> sensor.getId());

	//TODO  Process
	sensorKS
        .process(new MyKeyedProcessFunction())
        .print();

	env.execute();

--自定义类
    //泛型<输入key值,输入数据类型,输出数据类型>
    public static class MyKeyedProcessFunction extends KeyedProcessFunction<String, WaterSensor, String> {
        @Override
        
        public void processElement(WaterSensor value,Context ctx, Collector<String> out) throws Exception {
            out.collect(value + ",key=" + ctx.getCurrentKey());
        }
    }

5.5 Sink —— 将数据存到哪

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Im6JnppW-1606754072427)(https://i.loli.net/2020/11/30/UGSAwI8un2ydOH3.jpg)]

5.5.1 Kafka

--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
     SingleOutputStreamOperator<String> sensorDS = env
        .readTextFile("input/sensor-data.log");

	sensorDS.addSink(
        //new一个kafka的生产者
    	new FlinkKafkaProducer02<String> (
            //连接对象
        	"hadoop102:9092,hadoop103:9092",
            //主题 
            "sensor0621",
             new SimpleStringSchema())
        );

	env.execute();

5.5.2 Redis

--main
    1.创建执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.setParallelism(1);   //设置并行度

	2.读数据
    SingleOutputStreamOperator<WaterSensor> sensorDS = env.readTextFile("input/sensor-data.log")
            .map(new MapFunction<String, WaterSensor>() {
                @Override
                public WaterSensor map(String value) throws Exception {
                    String[] datas = value.split(",");
                    return new WaterSensor(datas[0],
                            Long.valueOf(datas[1]),
                            Integer.valueOf(datas[2]));
                }
            });

//TODO Sink Redis
        //第一个参数
        FlinkJedisPoolConfig config = new FlinkJedisPoolConfig.Builder()
                .setHost("hadoop102")
                .setPort(6379)
                .build();

        //第二个参数
        MyRedisMapper myRedisMapper = new MyRedisMapper();

        sensorDS.addSink(
                new RedisSink<WaterSensor>(config,myRedisMapper)
        );

    }

--自定义类
    public static class MyRedisMapper implements RedisMapper<WaterSensor> {

        @Override
        public RedisCommandDescription getCommandDescription() {
            //主题
            return new RedisCommandDescription(RedisCommand.HSET,"sensor0621");
        }

        // 如果是Hash,那么获取的就是 hash的 key
        @Override
        public String getKeyFromData(WaterSensor data) {
            return data.getTs().toString();
        }

        // 如果是Hash,那么获取的就是 hash的 value
        @Override
        public String getValueFromData(WaterSensor data) {
            return data.getVc().toString();
        }
    }

5.5.3 ES

--main
	1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

	2.读数据
        SingleOutputStreamOperator<WaterSensor> sensorDS = env.socketTextStream("localhost", 9999)
                .map(new MapFunction<String, WaterSensor>() {
                    @Override
                    public WaterSensor map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new WaterSensor(datas[0],
                                Long.valueOf(datas[1]),
                                Integer.valueOf(datas[2]));
                    }
                });

        //TODO Sink ElasticSearch
	3.Builder的第一个参数
        List<HttpHost> httpHosts = new ArrayList<>();
        httpHosts.add(new HttpHost("hadoop102",9092));
        httpHosts.add(new HttpHost("hadoop103",9092));
        httpHosts.add(new HttpHost("hadoop104",9092));

	4.Builder的第二个参数
        MyElasticSearchSinkSFunction myElasticSearchSinkSFunction = new MyElasticSearchSinkSFunction();

        ElasticsearchSink.Builder<WaterSensor> esBuilder = new ElasticsearchSink.Builder<>(httpHosts, myElasticSearchSinkSFunction);

        //设置bulk的容量,1条就刷写
        //TODO 生产环境不要设置为1,影响性能,这里只是为了快速看到 无界流写入ES的结果
        esBuilder.setBulkFlushMaxActions(1);

        sensorDS.addSink(esBuilder.build());

        env.execute();

    }

--自定义类
    public static class MyElasticSearchSinkSFunction implements ElasticsearchSinkFunction<WaterSensor> {

        @Override
        public void process(WaterSensor element, RuntimeContext ctx, RequestIndexer indexer) {
            Map<String, String> sourceMap = new HashMap<>();
            sourceMap.put("data",element.toString());
            //创建一个Request
            IndexRequest indexRequest = Requests.indexRequest("sensor0621")
                    .type("read")
                    .source(sourceMap);

            //放入indexer
            indexer.add(indexRequest);

        }
    }

5.5.4 MySQL

--main
	1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

	2.读数据
        SingleOutputStreamOperator<WaterSensor> sensorDS = env
                .readTextFile("input/sensor-data.log")
    		// .socketTextStream("localhost", 9999)
                .map(new MapFunction<String, WaterSensor>() {
                    @Override
                    public WaterSensor map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new WaterSensor(datas[0],
                                Long.valueOf(datas[1]),
                                Integer.valueOf(datas[2]));
                    }
                });

        //TODO Sink MySQL
        sensorDS.addSink(new MySQLSink());

    }

--自定义类
    public static class MySQLSink extends RichSinkFunction<WaterSensor>{
        Connection conn = null;
        PreparedStatement pstmt = null;

        @Override
        public void open(Configuration parameters) throws Exception {
            // 创建mysql连接
            conn = DriverManager.getConnection("jdbc:mysql://hadoop102:3306/test", "root", "000000");
            pstmt = conn.prepareStatement("INSERT INTO sensor VALUES (?,?,?)");
        }
        @Override
        public void close() throws Exception {
            pstmt.close();
            conn.close();
        }
    }

5.6 案例实操

5.6.1 基于埋点日志数据的网络流量统计

5.6.1.1 Case_PV

--main
1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        
2.读数据
        SingleOutputStreamOperator<UserBehavior> userbehaviorDS = env
                .readTextFile("input/UserBehavior.csv")
                .map(new MapFunction<String, UserBehavior>() {
                    @Override
                    public UserBehavior map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new UserBehavior(
                                Long.valueOf(datas[0]),
                                Long.valueOf(datas[1]),
                                Integer.valueOf(datas[2]),
                                datas[3],
                                Long.valueOf(datas[4])
                        );
                    }
                });
        
3.处理数据
3.1 能过滤就先过滤
    SingleOutputStreamOperator<UserBehavior> pvDS = userbehaviorDS.filter(sensor -> "pv".equals(sensor.getBehavior()));
3.2 参考wordcount的思路,转换成(pv,1)
    SingleOutputStreamOperator<Tuple2<String, Integer>> pvAndOneDS = pvDS.map(new MapFunction<UserBehavior, Tuple2<String, Integer>>() {
        @Override
        public Tuple2<String, Integer> map(UserBehavior userBehavior) throws Exception {
            return Tuple2.of("pv", 1);
        }
    });

3.3 按照pv 行为分组
        KeyedStream<Tuple2<String, Integer>, String> pvAndOneKS = pvAndOneDS.keyBy(data -> data.f0);

3.4 聚合统计
        SingleOutputStreamOperator<Tuple2<String, Integer>> pv = pvAndOneKS.sum(1);
        
4.输出
        pv.print();
        
        env.execute();

5.6.1.2 Case_PVByFlatMap

--main
1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

2.处理
        env .readTextFile("input/UserBehavior.csv")
                .flatMap(new FlatMapFunction<String, Tuple2<String,Integer>>() {
                    @Override
                    public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
                        String[] datas = value.split(",");
                        //如果是pv的数据,转换成(pv,1),往下游发送
                        if ("pv".equals(datas[3])){
                            out.collect(Tuple2.of("pv",1));
                        }
                    }
                })
                .keyBy(r -> r.f0)
                .sum(1)
                .print();
        
        env.execute();

5.6.1.3 Case_PVByProcess

--main
1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(10);

2.读数据
        SingleOutputStreamOperator<UserBehavior> userbehaviorDS = env
                .readTextFile("input/UserBehavior.csv")
                .map(new MapFunction<String, UserBehavior>() {
                    @Override
                    public UserBehavior map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new UserBehavior(
                                Long.valueOf(datas[0]),
                                Long.valueOf(datas[1]),
                                Integer.valueOf(datas[2]),
                                datas[3],
                                Long.valueOf(datas[4])
                        );
                    }
                });

3.处理数据
3.1 能过滤就先过滤
        SingleOutputStreamOperator<UserBehavior> pvDS = userbehaviorDS.filter(sensor -> "pv".equals(sensor.getBehavior()));
3.2 按照 pv 分组
        KeyedStream<UserBehavior, String> userbehavioKS = pvDS.keyBy(r -> r.getBehavior());
3.3 计数
        SingleOutputStreamOperator<Long> pv = userbehavioKS.process(new KeyedProcessFunction<String, UserBehavior, Long>() {

            private long pvCount = 0L;

            @Override
            public void processElement(UserBehavior value, Context ctx, Collector<Long> out) throws Exception {
                // 来一条,计一条
                pvCount++;
                out.collect(pvCount);
            }
        });

        pv.print();

        env.execute();

5.6.1.4 Case_PVByAcc

--main
1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(3);

2.读数据
        SingleOutputStreamOperator<UserBehavior> userbehaviorDS = env
//                .readTextFile("input/UserBehavior.csv")
                .socketTextStream("localhost", 9999)
                .map(new MapFunction<String, UserBehavior>() {
                    @Override
                    public UserBehavior map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new UserBehavior(
                                Long.valueOf(datas[0]),
                                Long.valueOf(datas[1]),
                                Integer.valueOf(datas[2]),
                                datas[3],
                                Long.valueOf(datas[4])
                        );
                    }
                });

3.处理数据
3.1 能过滤就先过滤
        SingleOutputStreamOperator<UserBehavior> pvDS = userbehaviorDS.filter(sensor -> "pv".equals(sensor.getBehavior()));
3.3 使用 process实现 计数
        pvDS
                .map(
                        new RichMapFunction<UserBehavior, UserBehavior>() {

                            // TODO 1.创建累加器
                            private LongCounter pvCount = new LongCounter();

                            @Override
                            public void open(Configuration parameters) throws Exception {
                                // TODO 2.注册累加器
                                getRuntimeContext().addAccumulator("pvCount", pvCount);
                            }

                            @Override
                            public UserBehavior map(UserBehavior value) throws Exception {
                                // TODO 3.使用 累加器 计数
                                pvCount.add(1L);
                                System.out.println(value + " <-------------------> " + pvCount.getLocalValue());
                                return value;
                            }
                        });

4.从Job的执行结果,取出累加器的值
        JobExecutionResult result = env.execute();
        Object pvCount = result.getAccumulatorResult("pvCount");
        System.out.println("统计的PV值为:" + pvCount);
    }


5.6.1.5 Case_UV

--需要根据用户id去重
--main
1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

2.读数据
        SingleOutputStreamOperator<UserBehavior> userbehaviorDS = env
                .readTextFile("input/UserBehavior.csv")
                .map(new MapFunction<String, UserBehavior>() {
                    @Override
                    public UserBehavior map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new UserBehavior(
                                Long.valueOf(datas[0]),
                                Long.valueOf(datas[1]),
                                Integer.valueOf(datas[2]),
                                datas[3],
                                Long.valueOf(datas[4])
                        );
                    }
                });

3.处理数据
3.1 能过滤就先过滤
        SingleOutputStreamOperator<UserBehavior> pvDS = userbehaviorDS.filter(sensor -> "pv".equals(sensor.getBehavior()));

        // 对 userId 进行去重
3.2 转换成 (uv,userId)
        // => 第一个元素,给个固定的字符串 “uv” => 用来做 keyby
        // => 第二个元素,是 用户ID, 用来 添加到 Set里,进行去重, 后续可以用  Set.size 得到 uv值
        SingleOutputStreamOperator<Tuple2<String, Long>> uvDS = pvDS.map(new MapFunction<UserBehavior, Tuple2<String, Long>>() {
            @Override
            public Tuple2<String, Long> map(UserBehavior value) throws Exception {
                return Tuple2.of("uv", value.getUserId());
            }
        });
3.3 按照 "uv" 分组
        KeyedStream<Tuple2<String, Long>, String> uvKS = uvDS.keyBy(r -> r.f0);
3.4 把 userId添加到 Set里
        SingleOutputStreamOperator<Long> uvCount = uvKS.process(
                new KeyedProcessFunction<String, Tuple2<String, Long>, Long>() {
                    // 定义一个Set,用来 存放 userId,实现去重
                    Set<Long> uvSet = new HashSet();

                    @Override
                    public void processElement(Tuple2<String, Long> value, Context ctx, Collector<Long> out) throws Exception {
                        // 取出 userId
                        Long userId = value.f1;
                        uvSet.add(userId);
                        out.collect(Long.valueOf(uvSet.size()));
                    }
                }
        );

4. 输出
        uvCount.print();

        env.execute();

5.6.2 市场营销商业指标统计分析

5.6.2.1 Case_APPMarketingAnalysis ——分渠道

--main
1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

2.读取数据
        DataStreamSource<MarketingUserBehavior> appDS = env.addSource(new AppMarketingLog());

3.处理数据
3.1 按照 统计维度(行为、渠道) 分组 => 求和
        appDS
                .map(new MapFunction<MarketingUserBehavior, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(MarketingUserBehavior value) throws Exception {
                        return Tuple2.of(value.getBehavior() + "_" + value.getChannel(), 1);
                    }
                })
                .keyBy(r -> r.f0)
                .sum(1)
                .print();


        env.execute();

--自定义类
    public static class AppMarketingLog implements SourceFunction<MarketingUserBehavior> {

        private volatile boolean isRunning = true;
        private List<String> behaviorList = Arrays.asList("DOWNLOAD", "INSTALL", "UPDATE", "UNINSTALL");
        private List<String> channelList = Arrays.asList("XIAOMI", "HUAWEI", "OPPO", "VIVO", "APPSTORE");

        @Override
        public void run(SourceContext<MarketingUserBehavior> ctx) throws Exception {
            Random random = new Random();
            while (isRunning) {
                ctx.collect(
                        new MarketingUserBehavior(
                                random.nextLong(),
                                behaviorList.get(random.nextInt(behaviorList.size())),
                                channelList.get(random.nextInt(channelList.size())),
                                System.currentTimeMillis())
                );
                Thread.sleep(1000L);
            }
        }

        @Override
        public void cancel() {
            isRunning = false;
        }
    }

5.6.2.2 Case_APPMarketingAnalysisWithoutChannel ——不分渠道

--main
1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

2.读取数据
        DataStreamSource<MarketingUserBehavior> appDS = env.addSource(new AppMarketingLog());

3.处理数据
3.1 按照 统计维度(行为) 分组 => 求和
        appDS
                .map(new MapFunction<MarketingUserBehavior, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(MarketingUserBehavior value) throws Exception {
                        return Tuple2.of(value.getBehavior(), 1);
                    }
                })
                .keyBy(r -> r.f0)
                .sum(1)
                .print();


        env.execute();

--自定义类
    public static class AppMarketingLog implements SourceFunction<MarketingUserBehavior> {

        private volatile boolean isRunning = true;
        private List<String> behaviorList = Arrays.asList("DOWNLOAD", "INSTALL", "UPDATE", "UNINSTALL");
        private List<String> channelList = Arrays.asList("XIAOMI", "HUAWEI", "OPPO", "VIVO", "APPSTORE");

        @Override
        public void run(SourceContext<MarketingUserBehavior> ctx) throws Exception {
            Random random = new Random();
            while (isRunning) {
                ctx.collect(
                        new MarketingUserBehavior(
                                random.nextLong(),
                                behaviorList.get(random.nextInt(behaviorList.size())),
                                channelList.get(random.nextInt(channelList.size())),
                                System.currentTimeMillis())
                );
                Thread.sleep(1000L);
            }
        }

        @Override
        public void cancel() {
            isRunning = false;
        }
    }

5.6.3 页面广告分析

5.6.3.1 Case_AdClickAnalysis

--不同省份,不同广告的点击量

1.获取环境

2.读取数据,获取的数据封装成样例类return

3.分析数据

	​ 使用mapFuncation封装成tuple

	​ keyby

	​ 求和

	​ 输出

4.执行


--main
1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

2.读取数据
        SingleOutputStreamOperator<AdClickLog> adClickDS = env
                .readTextFile("input/AdClickLog.csv")
                .map(new MapFunction<String, AdClickLog>() {
                    @Override
                    public AdClickLog map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new AdClickLog(
                                Long.valueOf(datas[0]),
                                Long.valueOf(datas[1]),
                                datas[2],
                                datas[3],
                                Long.valueOf(datas[4])
                        );
                    }
                });


3.处理数据
3.1 按照 统计维度 (省份、广告) 分组
        adClickDS
                .map(new MapFunction<AdClickLog, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(AdClickLog value) throws Exception {
                        return Tuple2.of(value.getProvince() + "_" + value.getAdId(), 1);
                    }
                })
                .keyBy(r -> r.f0)
                .sum(1)
                .print();

        env.execute();

5.6.4 订单支付实时监控

5.6.4.1 Case_OrderTxAnalysis

1.创建环境
2.获取数据,两条流,都封装成样例类return返回
3.处理数据  process(new 连接后的数据流对象)
	连接两条流 collect
	keyBy(key1,key2)
	====================================
	这里可以先keyby再连接(connect====================================
	使用process,
	coProcessFunction<第一条流输入类型,第二条流输入类型,输出类型>
	谁调用process方法 谁就是第一条流
	重写两个方法
		方法一:processElement1
			判断交易系统数据是否来过
				来过,对账成功,删除缓存=>remove(key值)
				未来过,把业务数据存起来=>put(key值【value.getTxid】,数据)
		
		方法二:processElemrnt2
		判断对遗憾交易码的业务数据是否来过
		来过 => 对账成功,清楚业务数据的缓存remove(交易码)
		没来过 => 把自己(郊野数据),缓存起来put(交易码,value
--main
1.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(2);

2.读取数据
2.1 读取 业务系统 数据
        SingleOutputStreamOperator<OrderEvent> orderDS = env
                .readTextFile("input/OrderLog.csv")
                .map(new MapFunction<String, OrderEvent>() {
                    @Override
                    public OrderEvent map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new OrderEvent(
                                Long.valueOf(datas[0]),
                                datas[1],
                                datas[2],
                                Long.valueOf(datas[3]));
                    }
                });

2.2 读取 交易系统 数据
        SingleOutputStreamOperator<TxEvent> txDS = env
                .readTextFile("input/ReceiptLog.csv")
                .map(new MapFunction<String, TxEvent>() {
                    @Override
                    public TxEvent map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new TxEvent(
                                datas[0],
                                datas[1],
                                Long.valueOf(datas[2]));
                    }
                });

3. 处理数据: 对两条流进行关联,匹配 交易码
3.1 连接两条流, 正常 使用 connect,要考虑 做 keyby
        ConnectedStreams<OrderEvent, TxEvent> orderTxCS = orderDS
                .keyBy(order -> order.getTxId())
                .connect(txDS.keyBy(tx -> tx.getTxId()));

3.2 使用 Process
        SingleOutputStreamOperator<String> resultDS = orderTxCS
	//   .keyBy(order -> order.getTxId(), tx -> tx.getTxId())
                .process(new OrderTxDetect());

4.输出
        resultDS.print();

        env.execute();
    }

--自定义类
    public static class OrderTxDetect extends CoProcessFunction<OrderEvent, TxEvent, String> {

        // 缓存 交易系统 的数据
        private Map<String, TxEvent> txEventMap = new HashMap();
        // 缓存 业务系统 的数据
        private Map<String, OrderEvent> orderEventMap = new HashMap();

        /**
         * 处理 业务系统 数据
         *
         * @param value
         * @param ctx
         * @param out
         * @throws Exception
         */
        @Override
        public void processElement1(OrderEvent value, Context ctx, Collector<String> out) throws Exception {
            // 说明来的是 业务数据
            // 判断 对应交易码的交易数据 是否来过?
            if (value.getTxId() != null) {
                if (txEventMap.containsKey(value.getTxId())) {
                    // 1.交易数据 来过 => 对账成功,清除 交易码对应的 交易数据 的缓存
                    out.collect("订单" + value.getOrderId() + "对账成功!!!");
                    txEventMap.remove(value.getTxId());
                } else {
                    // 2.交易数据 没来过 => 把 自己(业务数据) 缓存起来
                    orderEventMap.put(value.getTxId(), value);
                }
            }
        }

        /**
         * 处理 交易系统 数据
         *
         * @param value
         * @param ctx
         * @param out
         * @throws Exception
         */
        @Override
        public void processElement2(TxEvent value, Context ctx, Collector<String> out) throws Exception {
            // 说明来的是 交易数据
            // 判断 对应交易码的业务数据 是否来过
            if (orderEventMap.containsKey(value.getTxId())) {
                // 1.说明 业务数据 来过 => 对账成功, 清除 业务数据的缓存
                out.collect("订单" + orderEventMap.get(value.getTxId()).getOrderId() + "对账成功!!!");
                orderEventMap.remove(value.getTxId());
            } else {
                // 2.说明 业务数据 没来过 => 把自己(交易数据)缓存起来
                txEventMap.put(value.getTxId(), value);
            }
        }
    }
已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 游动-白 设计师:上身试试 返回首页