Flink第二次练习(含流式数据初试)

Flink的官网:https://flink.apache.org/
使用的软件:IntelliJ IDEA Community Edition
CoreAPI:

DataSet:专门处理离线数据,给离线数据处理设计了更多有针对性的API. env:ExecutionEnvironment
DataStream:一般用于处理流式数据,也可以处理离线数据env:StreamExecutionEnvironment
【这一次用的是DataStream】

创建SourceTest

package cn.tedu.datastream;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceTest {
    public static void main(String[] args) throws Exception {
        //1.获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //2.获取数据源
        DataStreamSource<Integer> source = env.fromElements(1, 2, 3, 4, 5);
        //3.转化数据
        source.map(x -> x*10)   //把传进来的每个数都乘10
        //4.输出结果
        .print();
        //5.触发程序执行
        env.execute();
    }
}

需要注意的点:

  • 与使用DataSet输出的不同是在每行结果前多了一个数字和->,这个是关于电脑的线程,不用太关注

如果想把输出的数据保存成txt需要把第四步改一下

.writeAsText("a.txt").setParallelism(1);//setParallelism(1)是设置运行度,使只有一个线程运行,如果没写这个就会输出一个名为a的文件夹

创建TransformationTest.class

想要得到数据中的偶数并乘10输出

package cn.tedu.datastream;

import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class TransformationTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<Integer> source = env.fromElements(1, 2, 3, 4);
        source.filter(new FilterFunction<Integer>() {
            @Override
            public boolean filter(Integer value) throws Exception {
                return value % 2 == 0;//取余数为0}
        }).map(x -> x * 10).print();
        env.execute();
    }
}

处理流式数据

练习一

前提:同开三个窗口,将kafka、生产者、消费者开启
要求:生产者中输入,消费者和ideal中的窗口都能接收到,在生产者中每输入一个数字ideal中输出的是就它们相加的和

package cn.tedu.datastream;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;

import java.util.Properties;

public class ConnKafkaTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //2.获取数据源  从Kafka读取数据
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "192.168.65.161:9092");
        properties.setProperty("group.id", "test");
        DataStream<String> source = env
                .addSource(new FlinkKafkaConsumer<>("flux", new SimpleStringSchema(), properties));
        //3.转化数据
        source.map(new MapFunction<String, Tuple2<String,Integer>>() {//转化成元祖,因为keyby需要用到f0、f1,这些只有元祖有
            @Override
            public Tuple2<String, Integer> map(String value) throws Exception {
                return new Tuple2<>("num",Integer.parseInt(value));
                //("num",1)
                //("num",2)
                //("num",3)
                //("num",4)
            }
        }).keyBy(0).sum(1)//keyby用的是0"num",如果用1的"1234“的话分区会分成四个区
        //4.输出结果
        .print();
        //5.触发程序执行
        env.execute();
    }
}

但是如果这么写的话,如果输入字符串的话就会报错,所以可以优化一下
在map前面再加一个过滤

//3.转化数据
        source.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String value) throws Exception {
                return value.matches("[0-9]+");//不能用*,不然输入空格会报错
            }
        })
练习二

使用窗口内计算:

package cn.tedu.datastream;

import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;

import java.util.Properties;

public class ConnKafkaTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //2.获取数据源  从Kafka读取数据
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "192.168.65.161:9092");
        properties.setProperty("group.id", "test");
        DataStream<String> source = env
                .addSource(new FlinkKafkaConsumer<>("flux", new SimpleStringSchema(), properties));
        //3.转化数据
        source.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String value) throws Exception {
                return value.matches("[0-9]+");//不能用*,不然输入空格会报错
            }
        })

        .map(new MapFunction<String, Tuple2<String,Integer>>() {//转化成元祖,因为keyby需要用到f0、f1,这些只有元祖有
            @Override
            public Tuple2<String, Integer> map(String value) throws Exception {
                return new Tuple2<>("num",Integer.parseInt(value));
                //("num",1)
                //("num",2)
                //("num",3)
                //("num",4)
            }
        }).keyBy(0).timeWindow(Time.seconds(5))//5秒滚动窗口,窗口内的计算
                .sum(1)
        //4.输出结果
        .print();
        //5.触发程序执行
        env.execute();
    }
}

可以看到,当在生产者中输入数字,会等五秒出来,且前后的数字不会再相加了,除非很快地输入两个数字
不会相加是因为前后窗口不想加

练习三

要求:奇数相加、偶数相加

package cn.tedu.datastream;

import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;

import java.util.Properties;

public class ConnKafkaTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //2.获取数据源  从Kafka读取数据
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "192.168.65.161:9092");
        properties.setProperty("group.id", "test");
        DataStream<String> source = env
                .addSource(new FlinkKafkaConsumer<>("flux", new SimpleStringSchema(), properties));
        //3.转化数据
        source.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String value) throws Exception {
                return value.matches("[0-9]+");//不能用*,不然输入空格会报错
            }
        })

        .map(new MapFunction<String, Tuple2<String,Integer>>() {//转化成元祖,因为keyby需要用到f0、f1,这些只有元祖有
            @Override
            public Tuple2<String, Integer> map(String value) throws Exception {
                int v =Integer.parseInt(value); //要先将String转化为int才能除
                if (v % 2 == 0){
                    return new Tuple2<>("even",v);
                }else {
                    return new Tuple2<>("odd",v);
                }
            }
        }).keyBy(0).timeWindow(Time.seconds(5))//5秒滚动窗口,窗口内的计算
                .sum(1)
        //4.输出结果
        .print();
        //5.触发程序执行
        env.execute();
    }
}
练习四

将输入的数据中相同的地名的数字加起来

张飞|河北|1500
孙悟空|湖北|1550
唐僧|河北|2200
辛普森|河南|1900
奥特曼|河南|5000
蜘蛛侠|湖北|2200

package cn.tedu.datastream;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;

import java.util.Properties;

public class KafkaTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "192.168.65.161:9092");
        properties.setProperty("group.id", "test");
        DataStream<String> source = env
                .addSource(new FlinkKafkaConsumer<>("flux", new SimpleStringSchema(), properties));

        source.map(new MapFunction<String, Tuple2<String,Integer>>() {
            @Override
            public Tuple2<String,Integer> map(String value) throws Exception {
                String[] s = value.split("\\|");
                return new Tuple2<>(s[1],Integer.parseInt(s[2]));
            }
        })
//                .keyBy(0).sum(1)
        .print();
        env.execute();


    }
}

在生产者中输入数据,在ideal中输出

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值