第二章flink快速简单上手

最新推荐文章于 2022-07-04 14:23:10 发布

羅先生！

最新推荐文章于 2022-07-04 14:23:10 发布

阅读量133

点赞数 1

分类专栏：大数据flink学习文章标签： flink 大数据

本文链接：https://blog.csdn.net/qq_43990203/article/details/108576395

版权

大数据flink学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

搭建 maven 工程 Flink
pom 文件
添加 scala 框架和 scala 文件夹

批处理 wordcount
流处理 StreamWordCount

pom 文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.lxh</groupId>
    <artifactId>flink</artifactId>
    <version>1.0-SNAPSHOT</version>
    <repositories>
        <repository>
            <id>apache.snapshots</id>
            <name>Apache Development Snapshot Repository</name>
            <url>https://repository.apache.org/content/repositories/snapshots/</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
    </repositories>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <flink.version>1.10.0</flink.version>
        <scala.binary.version>2.12</scala.binary.version>
        <scala.version>2.12.12</scala.version>
    </properties>

    <dependencies>
        <!-- Apache Flink dependencies -->
        <!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <!--         <scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <!--         <scope>provided</scope>-->
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_2.12</artifactId>
            <version>1.10.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_2.12</artifactId>
            <version>1.10.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-csv</artifactId>
            <version>1.10.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.11_2.12</artifactId>
            <version>1.10.0</version>
        </dependency>


        <dependency>
            <groupId>org.apache.bahir</groupId>
            <artifactId>flink-connector-redis_2.11</artifactId>
            <version>1.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-elasticsearch7_2.12</artifactId>
            <version>1.10.0</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.44</version>
        </dependency>

        <!-- Scala Library, provided by Flink as well. -->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
            <!--         <scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-statebackend-rocksdb_2.12</artifactId>
            <version>1.10.0</version>
        </dependency>

        <!-- Add connector dependencies here. They must be in the default scope (compile). -->


        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <!-- Add logging framework, to produce console output when running in the IDE. -->
        <!-- These dependencies are excluded from the application JAR by default. -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.7</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.17</version>
            <scope>runtime</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <!-- 该插件用于将Scala代码编译成class文件 -->
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <!-- 声明绑定到maven的compile阶段 -->
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

        </plugins>
    </build>
</project>

添加 scala 框架和 scala 文件夹

批处理 wordcount

object wordCount {
  def main(args: Array[String]): Unit = {
    //创建执行环境
    val env = ExecutionEnvironment.getExecutionEnvironment

    //从文件读取数据
    var inputPath= "D:\\flink\\src\\main\\resources\\hello.txt"
    var inputDataSet=env.readTextFile(inputPath)

    //分词之后，对单词进行 groupby 分组，然后用 sum 进行聚合
    val wordCountDataSet =inputDataSet.flatMap(_.split(" "))
      .map( (_,1) )
      .groupBy(0)
      .sum(1)


    //打印输出
    wordCountDataSet.print()
  }
}

我的文件放在这里

运行wordCount 控制台打印内容

注意： Flink 程序支持 java 和 scala 两种语言，这里以scala语言为主

流处理 StreamWordCount

object StreamWordCount {
  def main(args: Array[String]): Unit = {
    val params =ParameterTool.fromArgs(args);
    val hostname:String = params.get("hostname")
    val port:Int = params.getInt("port")
    //创建一个流处理的执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment;


    //接收socket数据流
    val textDataStream = env.socketTextStream(hostname,port)

    //逐一读取数据流，打散之后进行wordcount
    val wordCountDateStream = textDataStream.flatMap(_.split("\\s"))
      .filter(_.nonEmpty)
      .map( (_,1) )
      .keyBy(0)
      .sum(1)

      //打印输出
    wordCountDateStream.print().setParallelism(1)
   /* wordCountDateStream.writeAsCsv()*/
    //执行任务
    env.execute("Stream word count job")
  }
}

设置启动参数

测试——在 linux 系统中用 netcat 命令进行发送测试。

nc -lk 7777

运行项目StreamWordCount 在linux 系统中输入

控制台打印内容

写博客记录学习之路还望指点

羅先生！

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
第二章flink快速简单上手

搭建 maven 工程 Flink pom 文件添加 scala 框架和 scala 文件夹批处理 wordcount 流处理 StreamWordCount pom 文件<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XML..
复制链接

扫一扫