Flink WordCount Java 版本

文章展示了如何使用ApacheFlink进行批处理和流处理的WordCount程序。批处理示例通过`ExecutionEnvironment`读取文本文件,而流处理示例则通过`StreamExecutionEnvironment`读取文本文件或从NC(网络控制台)接收数据。程序中包含了数据读取、转换、分组和求和操作。
摘要由CSDN通过智能技术生成

1 pom

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.10.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>1.10.1</version>
        </dependency>
    </dependencies>

2 word.txt

flink flink flink flink flink flink flink flink flink flink flink flink flink
spark spark spark spark spark spark spark spark spark spark spark spark spark spark spark spark
hadoop hadoop hadoop hadoop hadoop hadoop hadoop hadoop hadoop hadoop hadoop
yarn yarn yarn yarn yarn yarn yarn
kafka kafka kafka kafka kafka kafka kafka kafka
flume flume flume flume flume flume

3 WordCount

package com.rosh.flink.wc;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.AggregateOperator;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;

public class WordCount {

    public static void main(String[] args) throws Exception {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        String inputPath = "data/input/word.txt";
        DataSource<String> dataSource = env.readTextFile(inputPath);
        AggregateOperator<Tuple2<String, Integer>> result = dataSource.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String line, Collector<Tuple2<String, Integer>> out) throws Exception {
                String[] words = line.split(" ");
                for (String word : words) {
                    out.collect(new Tuple2<>(word, 1));
                }
            }
        }).groupBy(0).sum(1);

        //打印
        result.print();
    }


}

在这里插入图片描述

4 StreamWordCount

package com.rosh.flink.wc;


import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class StreamWordCount {

    public static void main(String[] args) throws Exception {

        //创建流处理执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        System.out.println("并行度 :" + env.getParallelism());
        //文件路径
        String inputPath = "data/input/word.txt";
        DataStreamSource<String> dataStreamSource = env.readTextFile(inputPath);
        //转换操作
        SingleOutputStreamOperator<Tuple2<String, Integer>> result = dataStreamSource.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public void flatMap(String line, Collector<Tuple2<String, Integer>> out) throws Exception {
                        String[] words = line.split(" ");
                        for (String word : words) {
                            out.collect(new Tuple2<>(word, 1));
                        }
                    }
                }).keyBy(0)
                .sum(1);
        //打印
        result.print();

        //执行任务
        env.execute();
    }

}

在这里插入图片描述

5 NC

#win10 执行命令
nc -L -p 9999



package com.rosh.flink.wc;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class NCWordCount {


    public static void main(String[] args) throws Exception {

        //创建流处理执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        System.out.println("并行度 :" + env.getParallelism());
        //NC中读取
        DataStreamSource<String> dataStreamSource = env.socketTextStream("127.0.0.1", 9999);
        //转换操作
        SingleOutputStreamOperator<Tuple2<String, Integer>> result = dataStreamSource.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public void flatMap(String line, Collector<Tuple2<String, Integer>> out) throws Exception {
                        String[] words = line.split(" ");
                        for (String word : words) {
                            out.collect(new Tuple2<>(word, 1));
                        }
                    }
                }).keyBy(0)
                .sum(1);
        //打印
        result.print();

        //执行任务
        env.execute();
    }
}

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

响彻天堂丶

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值