https://blog.csdn.net/boling_cavalry/article/details/85059168
在《Flink1.7从安装到体验》一文中,我们安装和体验了Flink,今天就用java来一起开发一个简单的Flink应用;
步骤列表
本次实战经历以下步骤:
- 创建应用;
- 编码;
- 构建;
- 提交任务到Flink,验证功能;
环境信息
- Flink:1.7;
- Flink所在机器的操作系统:CentOS Linux release 7.5.1804;
- 开发环境JDK:1.8.0_181;
- 开发环境Maven:3.5.0;
应用功能简介
在《Flink1.7从安装到体验》一文中,我们在Flink运行SocketWindowWordCount.jar,实现的功能是从socket读取字符串,将其中的每个单词的数量统计出来,今天我们就来编码开发这个应用,实现此功能;
创建应用
- 应用基本代码是通过mvn命令创建的,在命令行输入以下命令:
<span style="color:#000000"><code class="language-shell">mvn archetype:generate -DarchetypeGroupId<span style="color:#669900">=</span>org.apache.flink -DarchetypeArtifactId<span style="color:#669900">=</span>flink-quickstart-java -DarchetypeVersion<span style="color:#669900">=</span>1.7.0
</code></span>
- 1
- 按控制台的提示输入groupId、artifactId、version、package等信息,一路回车确认后,会生成一个和你输入的artifactId同名的文件夹,里面是个maven工程:
<span style="color:#000000"><code class="language-shell">Define value <span style="color:#c678dd">for</span> property <span style="color:#669900">'groupId'</span><span style="color:#c678dd">:</span> com.bolingcavalry
Define value <span style="color:#c678dd">for</span> property <span style="color:#669900">'artifactId'</span><span style="color:#c678dd">:</span> socketwordcountdemo
Define value <span style="color:#c678dd">for</span> property <span style="color:#669900">'version'</span> 1.0-SNAPSHOT: <span style="color:#c678dd">:</span>
Define value <span style="color:#c678dd">for</span> property <span style="color:#669900">'package'</span> com.bolingcavalry: <span style="color:#c678dd">:</span>
Confirm properties configuration:
groupId: com.bolingcavalry
artifactId: socketwordcountdemo
version: 1.0-SNAPSHOT
package: com.bolingcavalry
</code></span>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 用IEDA导入这个maven工程,如下图,已经有了两个类:BatchJob和StreamingJob,BatchJob是用于批处理的,本次实战用不上,因此可以删除,只保留流处理的StreamingJob:
应用创建成功,接下来可以开始编码了;
编码
您可以选择直接从GitHub下载这个工程的源码,地址和链接信息如下表所示:
名称 | 链接 | 备注 |
---|---|---|
项目主页 | https://github.com/zq2599/blog_demos | 该项目在GitHub上的主页 |
git仓库地址(https) | https://github.com/zq2599/blog_demos.git | 该项目源码的仓库地址,https协议 |
git仓库地址(ssh) | git@github.com:zq2599/blog_demos.git | 该项目源码的仓库地址,ssh协议 |
这个git项目中有多个文件夹,本章源码在socketwordcountdemo这个文件夹下,如下图红框所示:
接下来开始编码:
- 在StreamingJob类中添加静态内部类WordWithCount,这是个PoJo,用来保存一个具体的单词及其出现频率:
<span style="color:#000000"><code class="language-java"> <span style="color:#5c6370">/**
* 记录单词及其出现频率的Pojo
*/</span>
<span style="color:#c678dd">public</span> <span style="color:#c678dd">static</span> <span style="color:#c678dd">class</span> WordWithCount <span style="color:#999999">{</span>
<span style="color:#5c6370">/**
* 单词内容
*/</span>
<span style="color:#c678dd">public</span> String word<span style="color:#999999">;</span>
<span style="color:#5c6370">/**
* 出现频率
*/</span>
<span style="color:#c678dd">public</span> <span style="color:#c678dd">long</span> count<span style="color:#999999">;</span>
<span style="color:#c678dd">public</span> <span style="color:#61aeee">WordWithCount</span><span style="color:#999999">(</span><span style="color:#999999">)</span> <span style="color:#999999">{</span>
<span style="color:#c678dd">super</span><span style="color:#999999">(</span><span style="color:#999999">)</span><span style="color:#999999">;</span>
<span style="color:#999999">}</span>
<span style="color:#c678dd">public</span> <span style="color:#61aeee">WordWithCount</span><span style="color:#999999">(</span>String word<span style="color:#999999">,</span> <span style="color:#c678dd">long</span> count<span style="color:#999999">)</span> <span style="color:#999999">{</span>
<span style="color:#c678dd">this</span><span style="color:#999999">.</span>word <span style="color:#669900">=</span> word<span style="color:#999999">;</span>
<span style="color:#c678dd">this</span><span style="color:#999999">.</span>count <span style="color:#669900">=</span> count<span style="color:#999999">;</span>
<span style="color:#999999">}</span>
<span style="color:#5c6370">/**
* 将单词内容和频率展示出来
* @return
*/</span>
<span style="color:#999999">@Override</span>
<span style="color:#c678dd">public</span> String <span style="color:#61aeee">toString</span><span style="color:#999999">(</span><span style="color:#999999">)</span> <span style="color:#999999">{</span>
<span style="color:#c678dd">return</span> word <span style="color:#669900">+</span> <span style="color:#669900">" : "</span> <span style="color:#669900">+</span> count<span style="color:#999999">;</span>
<span style="color:#999999">}</span>
<span style="color:#999999">}</span>
</code></span>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 把所有业务逻辑写在StreamJob类的main方法中,如下所示,关键位置都加了中文注释:
<span style="color:#000000"><code class="language-java"><span style="color:#c678dd">public</span> <span style="color:#c678dd">static</span> <span style="color:#c678dd">void</span> <span style="color:#61aeee">main</span><span style="color:#999999">(</span>String<span style="color:#999999">[</span><span style="color:#999999">]</span> args<span style="color:#999999">)</span> <span style="color:#c678dd">throws</span> Exception <span style="color:#999999">{</span>
<span style="color:#5c6370">//环境信息</span>
<span style="color:#c678dd">final</span> StreamExecutionEnvironment env <span style="color:#669900">=</span> StreamExecutionEnvironment<span style="color:#999999">.</span><span style="color:#61aeee">getExecutionEnvironment</span><span style="color:#999999">(</span><span style="color:#999999">)</span><span style="color:#999999">;</span>
<span style="color:#5c6370">//数据来源是本机9999端口,换行符分隔,您也可以考虑将hostname和port参数通过main方法的入参传入</span>
DataStream<span style="color:#61aeee"><span style="color:#999999"><</span>String<span style="color:#999999">></span></span> text <span style="color:#669900">=</span> env<span style="color:#999999">.</span><span style="color:#61aeee">socketTextStream</span><span style="color:#999999">(</span><span style="color:#669900">"localhost"</span><span style="color:#999999">,</span> <span style="color:#98c379">9999</span><span style="color:#999999">,</span> <span style="color:#669900">"\n"</span><span style="color:#999999">)</span><span style="color:#999999">;</span>
<span style="color:#5c6370">//通过text对象转换得到新的DataStream对象,</span>
<span style="color:#5c6370">//转换逻辑是分隔每个字符串,取得的所有单词都创建一个WordWithCount对象</span>
DataStream<span style="color:#61aeee"><span style="color:#999999"><</span>WordWithCount<span style="color:#999999">></span></span> windowCounts <span style="color:#669900">=</span> text<span style="color:#999999">.</span><span style="color:#61aeee">flatMap</span><span style="color:#999999">(</span><span style="color:#c678dd">new</span> FlatMapFunction<span style="color:#61aeee"><span style="color:#999999"><</span>String<span style="color:#999999">,</span> WordWithCount<span style="color:#999999">></span></span><span style="color:#999999">(</span><span style="color:#999999">)</span> <span style="color:#999999">{</span>
<span style="color:#999999">@Override</span>
<span style="color:#c678dd">public</span> <span style="color:#c678dd">void</span> <span style="color:#61aeee">flatMap</span><span style="color:#999999">(</span>String s<span style="color:#999999">,</span> Collector<span style="color:#61aeee"><span style="color:#999999"><</span>WordWithCount<span style="color:#999999">></span></span> collector<span style="color:#999999">)</span> <span style="color:#c678dd">throws</span> Exception <span style="color:#999999">{</span>
<span style="color:#c678dd">for</span><span style="color:#999999">(</span>String word <span style="color:#669900">:</span> s<span style="color:#999999">.</span><span style="color:#61aeee">split</span><span style="color:#999999">(</span><span style="color:#669900">"\\s"</span><span style="color:#999999">)</span><span style="color:#999999">)</span><span style="color:#999999">{</span>
collector<span style="color:#999999">.</span><span style="color:#61aeee">collect</span><span style="color:#999999">(</span><span style="color:#c678dd">new</span> WordWithCount<span style="color:#999999">(</span>word<span style="color:#999999">,</span> <span style="color:#98c379">1</span>L<span style="color:#999999">)</span><span style="color:#999999">)</span><span style="color:#999999">;</span>
<span style="color:#999999">}</span>
<span style="color:#999999">}</span>
<span style="color:#999999">}</span><span style="color:#999999">)</span>
<span style="color:#999999">.</span><span style="color:#61aeee">keyBy</span><span style="color:#999999">(</span><span style="color:#669900">"word"</span><span style="color:#999999">)</span><span style="color:#5c6370">//key为word字段</span>
<span style="color:#999999">.</span><span style="color:#61aeee">timeWindow</span><span style="color:#999999">(</span>Time<span style="color:#999999">.</span><span style="color:#61aeee">seconds</span><span style="color:#999999">(</span><span style="color:#98c379">5</span><span style="color:#999999">)</span><span style="color:#999999">)</span> <span style="color:#5c6370">//五秒一次的翻滚时间窗口</span>
<span style="color:#999999">.</span><span style="color:#61aeee">reduce</span><span style="color:#999999">(</span><span style="color:#c678dd">new</span> ReduceFunction<span style="color:#61aeee"><span style="color:#999999"><</span>WordWithCount<span style="color:#999999">></span></span><span style="color:#999999">(</span><span style="color:#999999">)</span> <span style="color:#999999">{</span> <span style="color:#5c6370">//reduce策略</span>
<span style="color:#999999">@Override</span>
<span style="color:#c678dd">public</span> WordWithCount <span style="color:#61aeee">reduce</span><span style="color:#999999">(</span>WordWithCount a<span style="color:#999999">,</span> WordWithCount b<span style="color:#999999">)</span> <span style="color:#c678dd">throws</span> Exception <span style="color:#999999">{</span>
<span style="color:#c678dd">return</span> <span style="color:#c678dd">new</span> WordWithCount<span style="color:#999999">(</span>a<span style="color:#999999">.</span>word<span style="color:#999999">,</span> a<span style="color:#999999">.</span>count<span style="color:#669900">+</span>b<span style="color:#999999">.</span>count<span style="color:#999999">)</span><span style="color:#999999">;</span>
<span style="color:#999999">}</span>
<span style="color:#999999">}</span><span style="color:#999999">)</span><span style="color:#999999">;</span>
<span style="color:#5c6370">//单线程输出结果</span>
windowCounts<span style="color:#999999">.</span><span style="color:#61aeee">print</span><span style="color:#999999">(</span><span style="color:#999999">)</span><span style="color:#999999">.</span><span style="color:#61aeee">setParallelism</span><span style="color:#999999">(</span><span style="color:#98c379">1</span><span style="color:#999999">)</span><span style="color:#999999">;</span>
<span style="color:#5c6370">// 执行</span>
env<span style="color:#999999">.</span><span style="color:#61aeee">execute</span><span style="color:#999999">(</span><span style="color:#669900">"Flink Streaming Java API Skeleton"</span><span style="color:#999999">)</span><span style="color:#999999">;</span>
<span style="color:#999999">}</span>
</code></span>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
构建
- 在pom.xml文件所在目录下执行命令:
<span style="color:#000000"><code class="language-shell">mvn clean package -U
</code></span>
- 1
- 命令执行完毕后,在target目录下的socketwordcountdemo-1.0-SNAPSHOT.jar文件就是构建成功的jar包;
在Flink验证
- Flink的安装和启动请参考《Flink1.7从安装到体验》;
- 登录到Flink所在机器,执行以下命令:
<span style="color:#000000"><code class="language-shell">nc -l 9999
</code></span>
- 1
- 我这边Flink所在机器的IP地址是192.168.1.103,因此用浏览器访问的Flink的web地址为:http://192.168.1.103:8081;
- 选择刚刚生成的jar文件作为一个新的任务,如下图:
- 点击下图红框中的"upload",将文件提交:
- 目前还只是将jar文件上传了而已,接下来就是手工设置执行类并启动任务,操作如下图,红框2中填写的前面编写的StreamingJob类的完整名称:
- 提交后的页面效果如下图所示,可见一个job已经在运行中了:
- 回到Flink所在机器的控制台,在之前输入了nc -l 9999的窗口输入一些英文句子,然后按下回车键,例如:
<span style="color:#000000"><code class="language-shell"><span style="color:#999999">[</span>root@vostro flink-1.7.0<span style="color:#999999">]</span><span style="color:#5c6370"># ./bin/start-cluster.sh</span>
Starting cluster.
Starting standalonesession daemon on host vostro.
Starting taskexecutor daemon on host vostro.
<span style="color:#999999">[</span>root@vostro flink-1.7.0<span style="color:#999999">]</span><span style="color:#5c6370"># nc -l 9999</span>
Apache Flink is a framework and distributed processing engine <span style="color:#c678dd">for</span> stateful computations over unbounded and bounded data streams. Flink has been designed to run <span style="color:#c678dd">in</span> all common cluster environments, perform computations at in-memory speed and at any scale.
</code></span>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 接下来看看我们的job的执行效果,如下图,点击左侧的"Task Managers",在右边的列表中只有一个Task,点击它:
- 出现的页面有三个tab页,点击"Stdout"这个tab,就能见到我们的任务对之前句子中的单词的统计结果,如下图:
至此,第一个最简单Flink就完成了。