Flink学习系列之一 Flink基本原理及安装以及WordCount程序

最新推荐文章于 2023-09-01 14:05:44 发布

大佛拈花

最新推荐文章于 2023-09-01 14:05:44 发布

阅读量442

点赞数

分类专栏：大数据

本文链接：https://blog.csdn.net/GoSaint/article/details/102731991

版权

大数据专栏收录该内容

28 篇文章 0 订阅

订阅专栏

1 Flink简介

Apache Flink是一个开源的分布式，高性能，高可用，准确的额流处理框架。
主要由Java实现
支持实时流(Stream)处理和批处理（Batch）,批数据只是流数据的一个极限特例。
Flink原生的支持了迭代计算，内存管理和程序优化。

上图是Flink的特点。

关于批处理和流处理的理解可以参照我之前的博文:https://blog.csdn.net/GoSaint/article/details/100085835

2 Flink的安装

下载安装包
解压到/usr/local下面
启动Flink

caozg@caozg-PC:~/Desktop$ cd /usr/local/flink-1.9.0/bin/
caozg@caozg-PC:/usr/local/flink-1.9.0/bin$ ls
config.sh           flink-daemon.sh         mesos-taskmanager.sh       start-cluster.bat          stop-zookeeper-quorum.sh
find-flink-home.sh  historyserver.sh        pyflink-gateway-server.sh  start-cluster.sh           taskmanager.sh
flink               jobmanager.sh           pyflink-shell.sh           start-scala-shell.sh       yarn-session.sh
flink.bat           mesos-appmaster-job.sh  sql-client.sh              start-zookeeper-quorum.sh  zookeeper.sh
flink-console.sh    mesos-appmaster.sh      standalone-job.sh          stop-cluster.sh     
caozg@caozg-PC:/usr/local/flink-1.9.0/bin$ ./start-cluster.sh

4 访问http://localhost:8081

3 WordCount程序

pom依赖

<dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>1.9.0</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-java -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.9.0</version>
        </dependency>

        <dependency>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-nop</artifactId>
                <version>1.7.2</version>
            </dependency>

</dependencies>

2 Java代码：

import org.apache.flink.api.common.JobExecutionResult;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;


public class WordCount {

    public static void main(String[] args) {
        //定义socket的端口号
        int port;
        try{
            ParameterTool parameterTool = ParameterTool.fromArgs(args);
            port = parameterTool.getInt("port");
        }catch (Exception e){
            port = 9998;
        }

        //获取运行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //连接socket获取输入的数据
        DataStreamSource<String> text = env.socketTextStream("localhost", port, "\n");

        //计算数据
        DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
            public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
                String[] splits = value.split("\\s");
                for (String word:splits) {
                    out.collect(new WordWithCount(word,1L));
                }
            }
        })//打平操作，把每行的单词转为<word,count>类型的数据
                .keyBy("word")//针对相同的word数据进行分组
                .timeWindow(Time.seconds(2),Time.seconds(1))//指定计算数据的窗口大小和滑动窗口大小
                .sum("count");

        //把数据打印到控制台
        windowCount.print()
                .setParallelism(1);//使用一个并行度
        //注意：因为flink是懒加载的，所以必须调用execute方法，上面的代码才会执行
        try {
            JobExecutionResult streaming_word_count = env.execute("streaming word count");
            System.out.println(streaming_word_count);
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    /**
     * 主要为了存储单词以及单词出现的次数
     */
    public static class WordWithCount{
        public String word;
        public long count;
        public WordWithCount(){}
        public WordWithCount(String word, long count) {
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString() {
            return "WordWithCount{" +
                    "word='" + word + '\'' +
                    ", count=" + count +
                    '}';
        }
    }

3 在服务器上nc -l 9998,然后输入单词

[root@caozg]# nc -l 9998
hello world
hello Flink

控制台结果如下：

WordWithCount{word='world', count=1}
WordWithCount{word='hello', count=1}
WordWithCount{word='world', count=1}
WordWithCount{word='hello', count=1}
WordWithCount{word='hello', count=1}
WordWithCount{word='Flink', count=1}
WordWithCount{word='hello', count=1}
WordWithCount{word='Flink', count=1}