Flink学习系列之一 Flink基本原理及安装 以及WordCount程序

1 Flink简介

  1. Apache Flink是一个开源的分布式,高性能,高可用,准确的额流处理框架。
  2. 主要由Java实现
  3. 支持实时流(Stream)处理和批处理(Batch),批数据只是流数据的一个极限特例。
  4. Flink原生的支持了迭代计算,内存管理和程序优化。

上图是Flink的特点。

关于批处理和流处理的理解可以参照我之前的博文:https://blog.csdn.net/GoSaint/article/details/100085835

2 Flink的安装

  1. 下载安装包
  2. 解压到/usr/local下面
  3. 启动Flink
caozg@caozg-PC:~/Desktop$ cd /usr/local/flink-1.9.0/bin/
caozg@caozg-PC:/usr/local/flink-1.9.0/bin$ ls
config.sh           flink-daemon.sh         mesos-taskmanager.sh       start-cluster.bat          stop-zookeeper-quorum.sh
find-flink-home.sh  historyserver.sh        pyflink-gateway-server.sh  start-cluster.sh           taskmanager.sh
flink               jobmanager.sh           pyflink-shell.sh           start-scala-shell.sh       yarn-session.sh
flink.bat           mesos-appmaster-job.sh  sql-client.sh              start-zookeeper-quorum.sh  zookeeper.sh
flink-console.sh    mesos-appmaster.sh      standalone-job.sh          stop-cluster.sh     
caozg@caozg-PC:/usr/local/flink-1.9.0/bin$ ./start-cluster.sh 

   4 访问http://localhost:8081

3 WordCount程序

  1. pom依赖
    <dependencies>
            <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java -->
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-streaming-java_2.12</artifactId>
                <version>1.9.0</version>
                <!--<scope>provided</scope>-->
            </dependency>
    
            <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-java -->
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-java</artifactId>
                <version>1.9.0</version>
            </dependency>
    
            <dependency>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-nop</artifactId>
                    <version>1.7.2</version>
                </dependency>
    
    </dependencies>

    2 Java代码:

    import org.apache.flink.api.common.JobExecutionResult;
    import org.apache.flink.api.common.functions.FlatMapFunction;
    import org.apache.flink.api.java.utils.ParameterTool;
    import org.apache.flink.streaming.api.datastream.DataStream;
    import org.apache.flink.streaming.api.datastream.DataStreamSource;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.streaming.api.windowing.time.Time;
    import org.apache.flink.util.Collector;
    
    
    public class WordCount {
    
        public static void main(String[] args) {
            //定义socket的端口号
            int port;
            try{
                ParameterTool parameterTool = ParameterTool.fromArgs(args);
                port = parameterTool.getInt("port");
            }catch (Exception e){
                port = 9998;
            }
    
            //获取运行环境
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    
            //连接socket获取输入的数据
            DataStreamSource<String> text = env.socketTextStream("localhost", port, "\n");
    
            //计算数据
            DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
                public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
                    String[] splits = value.split("\\s");
                    for (String word:splits) {
                        out.collect(new WordWithCount(word,1L));
                    }
                }
            })//打平操作,把每行的单词转为<word,count>类型的数据
                    .keyBy("word")//针对相同的word数据进行分组
                    .timeWindow(Time.seconds(2),Time.seconds(1))//指定计算数据的窗口大小和滑动窗口大小
                    .sum("count");
    
            //把数据打印到控制台
            windowCount.print()
                    .setParallelism(1);//使用一个并行度
            //注意:因为flink是懒加载的,所以必须调用execute方法,上面的代码才会执行
            try {
                JobExecutionResult streaming_word_count = env.execute("streaming word count");
                System.out.println(streaming_word_count);
            } catch (Exception e) {
                e.printStackTrace();
            }
    
        }
    
        /**
         * 主要为了存储单词以及单词出现的次数
         */
        public static class WordWithCount{
            public String word;
            public long count;
            public WordWithCount(){}
            public WordWithCount(String word, long count) {
                this.word = word;
                this.count = count;
            }
    
            @Override
            public String toString() {
                return "WordWithCount{" +
                        "word='" + word + '\'' +
                        ", count=" + count +
                        '}';
            }
        }
    
    
    

    3 在服务器上nc -l 9998,然后输入单词

    [root@caozg]# nc -l 9998
    hello world
    hello Flink
    

    控制台结果如下:

    WordWithCount{word='world', count=1}
    WordWithCount{word='hello', count=1}
    WordWithCount{word='world', count=1}
    WordWithCount{word='hello', count=1}
    WordWithCount{word='hello', count=1}
    WordWithCount{word='Flink', count=1}
    WordWithCount{word='hello', count=1}
    WordWithCount{word='Flink', count=1}

     

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值