接下来我们学习Flink的WordCount

接下来我们学习Flink的WordCount

首先,我们需要在pom文件中添加需要的依赖

<properties>
    <flink.version>1.12.0</flink.version>
    <java.version>1.8</java.version>
    <scala.binary.version>2.11</scala.binary.version>
    <slf4j.version>1.7.30</slf4j.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-clients_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-runtime-web_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>${slf4j.version}</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>${slf4j.version}</version>
         </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-to-slf4j</artifactId>
        <version>2.14.0</version>
    </dependency>
</dependencies>

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>3.3.0</version>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

下一步,我们需要在src/main/resources添加文件:log4j.properties

log4j.rootLogger=error, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

那我们来看看批处理的WordCout该怎么写吧

		//1.创建执行环境
        ExecutionEnvironment executionEnvironment = ExecutionEnvironment.getExecutionEnvironment();
        //2.从文件中读取数据,按行读取数据
        DataSource<String> line = executionEnvironment.readTextFile("input/words.txt");
        //3.转换数据格式(String类型转换为二元组)
        FlatMapOperator<String, Tuple2<String,Long>> wordAndNum = line.flatMap((String lines, Collector<Tuple2<String,Long>> out) -> {
            String[] split = lines.split(" ");
                    for (String word : split) {
                        out.collect(Tuple2.of(word,1L));
                    }
        })
                //当lambda表达式使用java泛型时,由于会存在泛型的擦除,需要声明类型信息
                .returns(Types.TUPLE(Types.STRING,Types.LONG));
         //4.按照单词进行分组
        UnsortedGrouping<Tuple2<String, Long>> tuple2UnsortedGrouping = wordAndNum.groupBy(0);
        //5.进行聚合操作
        AggregateOperator<Tuple2<String, Long>> sum = tuple2UnsortedGrouping.sum(1);
        //6.打印结果
        sum.print();

运行结果:
在这里插入图片描述
了解了批处理的WordCount,咱们再看看流处理的WordCount,其中流式数据分为有界流和无界流

有界流:

		//1.我们首先要创建流式数据的执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //2.读取文件
        DataStreamSource<String> lines = env.readTextFile("input/words.txt");
        //3.转换数据格式
        SingleOutputStreamOperator<Tuple2<String, Long>> words = lines.flatMap((String line, Collector<String> out) -> {
            Arrays.stream(line.split(" ")).forEach(out::collect);
        }).returns(Types.STRING)
                .map(word -> Tuple2.of(word, 1L))
                .returns(Types.TUPLE(Types.STRING, Types.LONG));
        //4.分组
        KeyedStream<Tuple2<String, Long>, String> tuple2StringKeyedStream = words.keyBy(t -> t.f0);
        //5.按照key聚合
        SingleOutputStreamOperator<Tuple2<String, Long>> sum = tuple2StringKeyedStream.sum(1);
        //6.打印
        DataStreamSink<Tuple2<String, Long>> print = sum.print();
        //7.执行
        env.execute();

运行结果:
在这里插入图片描述

无界流:

 		//1.创建流数据执行的环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //2.读取数据(从端口读取数据)
        DataStreamSource<String> test101 = env.socketTextStream("test101", 9080);
        //3.转换数据结构
        SingleOutputStreamOperator<Tuple2<String, Long>> returns = test101.flatMap((String line, Collector<String> words) -> {
            Arrays.stream(line.split(" ")).forEach(words::collect);
        })
                .returns(Types.STRING)
                .map(word -> Tuple2.of(word, 1L))
                .returns(Types.TUPLE(Types.STRING, Types.LONG));
        //4.按照key分组
        KeyedStream<Tuple2<String, Long>, String> tuple2StringKeyedStream = returns.keyBy(t -> t.f0);
        //5.按照key聚合
        SingleOutputStreamOperator<Tuple2<String, Long>> sum = tuple2StringKeyedStream.sum(1);
        //6.打印
        sum.print();
        //7.执行
        env.execute();

!!!可能会出现的问题

在这里插入图片描述
这个时候就要检查检查测试所用txt文件所在的位置,如果不确定的话,可以改成绝对路径试一试。
2.代码中泛型出现的比较多,需要多注意数据的类型与格式。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值