Flink 一种大数据计算引擎,和其他计算引擎不同的是,它同时支持流处理和批处理的特点; 那么首先介绍下,这两点的概念。流处理,想象成水流,长江之水,自西而向东流,终汇入大海,源远流长。我们把它类比到处理数据上,那么可以这么理解,数据源源不断地产生,无界限;批处理,想象成一湖水,天然形成,静态,类比到处理数据上,就像处理静态数据集,那么这个数据集就是有界限;
学习Flink的目的,一方面是针对业务产生的日志,可以就其做成监控系统,对软件业务运行过程就行监测,提高解决问题的效率;其次一方面,结合AI领域和大数据领域技术学习,可以应对未来更多结合型产品的业务开发和实现提供更好的解决思路和方案。
如图,https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/libs/ml/quickstart.html 很好的提供了AI领域相关处理的第三方类库,对于结合学习还是大有裨益。
-
安装flink 因为我用的是mac 所以在有brew情况下, 直接 terminal 命令: brew install flink 我本机安装完成后的目录位置如下
-
**启动单节点flink ** 切换到目录libexec下,输入: ./start-cluster.sh 即可,然后访问: http://localhost:8081 ,看到如下界面:
-
第三步,编写flink处理的job,我用的是JAVA,代码依赖如下:
package com.flink.demo; import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.util.Collector; /** * @author fxl * @version 1.0.0 * @createTime 2019年07月17日 17:23:00 * @Description */ public class App { public static void main(String[] args) throws Exception { //参数检查 if (args.length != 2) { System.err.println("USAGE:\nSocketTextStreamWordCount <hostname> <port>"); return; } String hostname = args[0]; Integer port = Integer.parseInt(args[1]); // set up the streaming execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //获取数据 DataStreamSource<String> stream = env.socketTextStream(hostname, port); //计数 SingleOutputStreamOperator<Tuple2<String, Integer>> sum = stream.flatMap(new LineSplitter()) .keyBy(0) .sum(1); sum.print(); env.execute("Java WordCount from SocketTextStream Example"); } public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) { String[] tokens = s.toLowerCase().split("\\W+"); for (String token : tokens) { if (token.length() > 0) { collector.collect(new Tuple2<String, Integer>(token, 1)); } } } } }
maven依赖如下:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.flink.demo</groupId> <artifactId>flinkDemo</artifactId> <version>1.0-SNAPSHOT</version> <name>flinkDemo</name> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-core</artifactId> <version>1.8.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.12</artifactId> <version>1.8.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.12</artifactId> <version>1.8.1</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.8.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-examples</artifactId> <version>1.8.1</version> <type>pom</type> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <version>1.18.8</version> <scope>provided</scope> </dependency> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>23.0</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.5.1</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build> </project>
第四步,切换到flink版本文件夹下bin,执行命令: flink run -c com.flink.demo.App *.jar ip port
-c 指定程序入口类 我的运行类是: com.flink.demo.App
ip 本机IP地址 127.0.0.1
port 端口号为job启动的指定端口号
完整命令如下:
fxl-2:bin fxl$ flink run -c com.flink.demo.App /Users/fxl/IdeaProjects/flinkDemo/target/flinkDemo-1.0-SNAPSHOT.jar 127.0.0.1 9008
操作完成后,使用命令: nc -l 9008 监听job程序端口,后面需要输入测试数据入口
第二张图便是统计输入单词的一个数量输出结果。