一、任务简述
本次实验是基础的Flink应用构建,运行代码是分布式计算领域的“Hello world”——“wordcount”。
具体内容是通过netcat在指定端口发布信息,由Flink应用程序对端口进行监听,在一定的时间窗口内接收数据并进行相应的词频率统计。实验代码地址[Link]
二、实验环境
主机:
OS:Linux Manjaro
java:openjdk1.8
scala:2.11.11
Maven:3.6.3(不重要)
IDEA:2020-3(不重要)
———————————————————
Flink集群:
Scala:2.11.11
Flink:1.11.2
三、实验过程与问题解决
实验分为本地测试和打包提交集群运行两个部分,下面分别进行实验:
1.本地测试
在正式提交集群之间我们应当在本地进行代码的测试。这里使用的IDE是IDEA,使用Maven构建项目。可以执行以下代码:
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.6.1 \
-DgroupId=my-flink-project \
-DartifactId=my-flink-project \
-Dversion=0.1 \
-Dpackage=myflink \
-DinteractiveMode=false
也可以直接在IDEA软件中进行配置,如下:
1.新建一个Maven项目
2.填写项目的名称等信息
完成上述配置之后基本的项目框架就有了。
3. 在java目录下新建一个package,这是一个包的目录,在这个包下新建java程序即可(导入前面所述的代码)。
4.pom文件的配置
注意在此之前所以IDEA的Maven环境应当配置完成,不再赘述。
pom文件修改如下:
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>my-flink-project</groupId>
<artifactId>my-flink-project</artifactId>
<version>0.1</version>
<packaging>jar</packaging>
<name>Flink Quickstart Job</name>
<url>http://www.myorganization.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.11.2</flink.version>
<java.version>1.8</java.version>
<scala.binary.version>2.11</scala.binary.version>
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
</properties>
<repositories>
<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<!-- Apache Flink dependencies -->
<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<!-- <scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>${flink.version}</version>
<!-- <scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!-- <scope>provided</scope>-->
</dependency>
<!-- Add connector dependencies here. They must be in the default scope (compile). -->
<!-- Example:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
-->
<!-- Add logging framework, to produce console output when running in the IDE. -->
<!-- These dependencies are excluded from the application JAR by default. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId<

本次博客围绕Flink应用构建展开实验,运行“wordcount”代码,通过netcat发布信息,Flink监听端口进行词频率统计。介绍了Linux环境下的实验配置,包括本地测试和提交集群的步骤及问题解决。还阐述了以商品售卖数据为源,统计热门商品类别的代码解析。
2.填写项目的名称等信息
完成上述配置之后基本的项目框架就有了。
最低0.47元/天 解锁文章
1419

被折叠的 条评论
为什么被折叠?



