Flink1.12 流批一体Hello-world

最新推荐文章于 2024-08-12 15:43:58 发布

保护我方胖虎

最新推荐文章于 2024-08-12 15:43:58 发布

阅读量6.5k

点赞数 1

分类专栏： flink 文章标签： java flink maven

本文链接：https://blog.csdn.net/leilei1366615/article/details/115363500

版权

flink 专栏收录该内容

40 篇文章 63 订阅

订阅专栏

本文介绍了Flink 1.12版本中流批一体的实现，通过一个简单的WordCount示例展示了如何在流处理和批处理模式下运行程序。在流模式下，数据逐条处理并输出中间结果；而在批模式下，数据以批处理形式进行转换，仅输出最终结果。Flink的这种特性使得开发者可以使用同一套代码应对流批处理场景，提高了开发效率。

摘要由CSDN通过智能技术生成

环境说明：

java: 1.8

flink: 1.12.2

编译器：IDEA MAVEN项目

要开发flink程序，首先，我们需要引入依赖,必要依赖POM.xml文件如下

（1）核心依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>flink-learn-1-demo</artifactId>
    <version>1.0</version>
    <!-- 指定仓库位置，依次为aliyun、apache和cloudera仓库 -->
    <repositories>
        <repository>
            <id>aliyun</id>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
        </repository>
        <repository>
            <id>apache</id>
            <url>https://repository.apache.org/content/repositories/snapshots/</url>
        </repository>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>

    <properties>
        <encoding>UTF-8</encoding>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <java.version>1.8</java.version>
        <scala.version>2.12</scala.version>
        <flink.version>1.12.2</flink.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
    </dependencies>

    <build>
        <sourceDirectory>src/main/java</sourceDirectory>
        <plugins>
            <!-- 编译插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <!--<encoding>${project.build.sourceEncoding}</encoding>-->
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.18.1</version>
                <configuration>
                    <useFile>false</useFile>
                    <disableXmlReport>true</disableXmlReport>
                    <includes>
                        <include>**/*Test.*</include>
                        <include>**/*Suite.*</include>
                    </includes>
                </configuration>
            </plugin>
            <!-- 打包插件(会包含所有依赖) -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <!-- 设置jar包的入口类(可选) -->
                                    <mainClass></mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

（2）流批一体测试

package com.leilei;

import org.apache.flink.api.common.RuntimeExecutionMode;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

/**
 * @author lei
 * @version 1.0
 * @date 2021/3/7 15:51
 * @desc 单词计数 DataStream 匿名内部类
 */
public class WordCountDataStream1 {
    public static void main(String[] args) throws Exception {
        // 1.准备环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        // 设置运行模式 为流
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING);
        // 2.准备数据源
        DataStreamSource<String> elementsSource = env.fromElements("java,scala,php,c++",
                "java,scala,php", "java,scala", "java");
        // 3.数据处理转换
        KeyedStream<Tuple2<String, Integer>, String> streamResult = elementsSource.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public void flatMap(String element, Collector<String> out) throws Exception {
                String[] wordArr = element.split(",");
                for (String word : wordArr) {
                    out.collect(word);
                }
            }
        }).map(new MapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public Tuple2<String, Integer> map(String word) throws Exception {
                return Tuple2.of(word, 1);
            }
        }).keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
            @Override
            public String getKey(Tuple2<String, Integer> value) throws Exception {
                return value.f0;
            }
        });
        SingleOutputStreamOperator<Tuple2<String, Integer>> sum = streamResult.sum(1);
        // 4.数据输出
        sum.print();
        // 5.执行程序
        env.execute("flink-hello-world");

    }
}

上文中，指定flink程序从自定义元素中加载数据源，且指定处理模式为流模式STREAMING（但是，我们都知道，我们这里的数据源数据本质上是有界的，因为个数我们都看得见数的清啊…）

env.setRuntimeMode(RuntimeExecutionMode.STREAMING);

结论：数据来一个处理一个，输出结果又中间依次累加计算过程，，，流处理方式无疑

设置运行模式为:批处理

env.setRuntimeMode(RuntimeExecutionMode.BATCH);

数据以批方式进行处理，每项数据合并转换后，均只输出了最终结果，无中途流程数据输出

设置运行模式为:自动识别

env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);

结论：数据以批方式进行处理，每项数据合并转换后，均只输出了最终结果，无中途流程数据输出****

综上，可以看出flink1.12已然实现了流批的自动切换！我们如果有流批处理相同逻辑处理场景的话，使用flink可节省需要开发时间，我们只需要一套代码，一套计算框架即可搞定！

保护我方胖虎

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录