【零基础学flink】flink安装和wordcount实战

最新推荐文章于 2024-05-28 14:45:49 发布

古老的屋檐下

最新推荐文章于 2024-05-28 14:45:49 发布

阅读量1k

点赞数

分类专栏：零基础学大数据文章标签：大数据 flink 流计算框架学习

本文链接：https://blog.csdn.net/liewen_/article/details/89530457

版权

零基础学大数据专栏收录该内容

9 篇文章 1 订阅

订阅专栏

一、安装

目前最新的flink版本是：1.8.0
下载地址：
https://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-1.8.0/flink-1.8.0-bin-scala_2.11.tgz

大家可以去flink官网下载自己需要的flink版本，这里以目前最新的版本为例
flink版本列表：https://flink.apache.org/zh/downloads.html

1下载文件

选择相应的flink版本下载即可

2解压

进入flink文件所在目录，下载后的flink文件名字：
执行下面命令解压

cd ~/Downloads        # Go to download directory
tar -xzvf flink-1.8.0-bin-scala_2.11.tgz
cd flink-1.8.0

二、安装测试

1
执行下面命令，启动flink

./bin/start-cluster.sh  # Start Flink

2
访问以下网址，查看flink的状态：
如果flink是在本地机器启动的：
http://localhost:8081
或者
如果flink是在远程机器上启动的：
http://ip:8081
正常页面如下图所示：

三、word count 实战

正如学习编程的第一课，一定会写一个“hello world”一样，在学flink的时候，第一个大多是“word count”小项目
具体而言：word count指的是：实时计算出单词统计情况。比如百度会有一个近一个小时内的搜索指数，这其实就可以简化为一个wordcount：计算每一个小时内用户搜索的“关键词”，然后按照次数排名即可。

1 数据来源
数据来源很多：可以是flume日志收集器，也可以是件kafka等消息中间，本文的数据使用netcat演示：netcat是linux中一个比较强大的工具，本文使用netcat往指定端口发送数据，然后使用flink动态统计出5秒内出现的单词次数，每轮次统计之间移动1秒，即：
flink第一次统计的是：1,2,3,4,5秒中出现的单词次数，第二次统计的是2,3,4,5,6秒时间内出现的单词次数，即窗长为5秒，窗移为1秒，这个值可以指定。

2 数据准备
注：在往下前需要先按照netcat，具体按照可以百度教程，比较简单
这步只需要启动netcat并指定端口即可：

$ nc -l 6100

3. 启动flink word count
这个案例代码flink已经写好了，并打成了jar包，我们只需要运行jar包即可

$ ./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 6100

4.查看结果
打卡结果文件：

$ tail -f log/flink-*-taskexecutor-*.out

5,产生数据
使用netcat向端口发送数据：
在步骤二的终端中输入粘贴任意文字，比如：

菜鸟名企梦 大菜鸟
电子 科技 大学 电子 学院

结果如下：
输入：

输出：

那么，word count究竟是怎么实现的呢？下面对wordCount的核心代码进行分析，完整代码可以再公众号**菜鸟名企梦**后台发送word count获取：
wordCount核心代码：

public class SocketWindowWordCount {

    public static void main(String[] args) throws Exception {

        // the port to connect to
        final int port;
        try {
            final ParameterTool params = ParameterTool.fromArgs(args);
            port = params.getInt("port");
        } catch (Exception e) {
            System.err.println("No port specified. Please run 'SocketWindowWordCount --port <port>'");
            return;
        }

        // get the execution environment
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // get input data by connecting to the socket
        DataStream<String> text = env.socketTextStream("localhost", port, "\n");

        // parse the data, group it, window it, and aggregate the counts
        DataStream<WordWithCount> windowCounts = text
            .flatMap(new FlatMapFunction<String, WordWithCount>() {
                @Override
                public void flatMap(String value, Collector<WordWithCount> out) {
                    for (String word : value.split("\\s")) {
                        out.collect(new WordWithCount(word, 1L));
                    }
                }
            })
            .keyBy("word")
            .timeWindow(Time.seconds(5), Time.seconds(1))
            .reduce(new ReduceFunction<WordWithCount>() {
                @Override
                public WordWithCount reduce(WordWithCount a, WordWithCount b) {
                    return new WordWithCount(a.word, a.count + b.count);
                }
            });

        // print the results with a single thread, rather than in parallel
        windowCounts.print().setParallelism(1);

        env.execute("Socket Window WordCount");
    }

    // Data type for words with count
    public static class WordWithCount {

        public String word;
        public long count;

        public WordWithCount() {}

        public WordWithCount(String word, long count) {
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString() {
            return word + " : " + count;
        }
    }
}

上述代码比较简单，主要是以下几个步骤：

获取端口，从端口中读取数据
设置窗口时间，即每次统计那个时间段中的数据
以空格切分读取到的数据，并把切分后的数据转换成(word,1)形式的键值对
通过键值对中的word字段聚合键值对，相同word进行次数叠加，即reduce过程

扫描下方二维码，及时获取更多互联网求职面经、java、python、爬虫、大数据等技术，和海量资料分享：
公众号**菜鸟名企梦后台发送“csdn”即可免费领取【csdn】和【百度文库】下载服务；
公众号菜鸟名企梦后台发送“资料”:即可领取5T精品学习资料**、java面试考点和java面经总结，以及几十个java、大数据项目，资料很全，你想找的几乎都有
扫码关注，及时获取更多精彩内容。（博主今日头条大数据工程师）

古老的屋檐下

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【零基础学flink】flink安装和wordcount实战

一、安装目前最新的flink版本是：1.8.0下载地址：https://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-1.8.0/flink-1.8.0-bin-scala_2.11.tgz大家可以去flink官网下载自己需要的flink版本，这里以目前最新的版本为例flink版本列表：https://flink.apache.org/...
复制链接

扫一扫

专栏目录