Flink:程序打包与提交任务运行

记录下,同事在自己环境弄,出了一堆问题,搞个wordcount的流计算打包发布搞不通,,,网上资料对于flink版本层次不齐,想想还是记录下个人在1.11.x版本的处理,别在这事上浪费时间

对应节点贴上了官网文档位置,最好的文档就是官方文档

环境

maven V3.6.x

Flink 1.11.x

JDK 1.8

Scala 2.11.x 2.12.x都行

IDEA 2021的版本

工程

运行逻辑代码

几条SQL的运行,对接kafka,经过计算后控制台输出

package com.mym.api.tableapi;

import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.types.Row;


public class TestSQL_savepoint_restore {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        env.setStateBackend(new FsStateBackend("file:///opt/module/checkpoint/"));
        CheckpointConfig config = env.getCheckpointConfig();
    		  config.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
        config.setCheckpointInterval(1000);
        // 设置模式为exactly-once
        config.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        // 确保检查点之间有至少500 ms的间隔【checkpoint最小间隔】
        config.setMinPauseBetweenCheckpoints(500);
        // 检查点必须在一分钟内完成,或者被丢弃【checkpoint的超时时间】
        config.setCheckpointTimeout(60000);
        // 同一时间只允许进行一个检查点
        config.setMaxConcurrentCheckpoints(1);

        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

        tableEnv.sqlUpdate("CREATE TABLE t1 ( " +
                "    tid STRING, " +
                "    tname STRING, " +
                "    age INT, " +
                "    height BIGINT " +
                " ) WITH ( " +
                "    'connector' = 'kafka-0.11', " +
                "    'topic' = 'testsavepoint', " +
                "    'scan.startup.mode' = 'latest-offset', " + //optional: valid modes are "earliest-offset","latest-offset", "group-offsets",or "specific-offsets"
                "    'properties.group.id' = 'testGroup', " +
                "    'properties.zookeeper.connect' = '192.168.31.1:2181', " +
                "    'properties.bootstrap.servers' = '192.168.31.1:9092', " +
                "    'format' = 'json' " +
                ")"
        );

        String sql1 = "select * from t1 where age > 1 and tid = '1'";
        Table sql1Table = tableEnv.sqlQuery(sql1);
        tableEnv.toRetractStream(sql1Table, Row.class).print("sql1Table");

        tableEnv.createTemporaryView("t3", sql1Table);
        String sql3 = "select tid, age  from t3 where age > 1 and tid = '1' ";
        Table sql3Table = tableEnv.sqlQuery(sql3);
        tableEnv.toRetractStream(sql3Table, Row.class).print("sql3Table");

        tableEnv.createTemporaryView("t2", sql3Table);
        String sql2 = "select tid, count(*) as cnt from t2 where age > 10 group by tid ";
        Table sql2Table = tableEnv.sqlQuery(sql2);
        tableEnv.toRetractStream(sql2Table, Row.class).print("sql2Table");


        env.execute("test");
    }
}

pom文件

pom文件里重点几个地方:properties定义一些版本号;dependences定义依赖;build-plugin定义构建工程时打包插件等

依赖

这里要清楚的几个东西,每个依赖到底该provided还是不该呢?

如果打包后运行报错如下就说明依赖冲突了,你把Flink运行环境有的东西也打包进去了

Caused by: java.lang.ClassCastException: org.codehaus.janino.CompilerFactory cannot be cast to org.codehaus.commons.compiler.ICompilerFactory

那么依赖的scope如何设置呢,与Flink无关的依赖肯定要打包进去比如mysql驱动,作为Flink插件式扩展功能的也要打包进去,详细参考Flink对应版本官网说明

看这里:https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/project-configuration.html

以下是个人测试几个依赖的示例,可以参考,打包可以正常跑的

<properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.11.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>1.11.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.11_2.12</artifactId>
            <version>1.11.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.bahir</groupId>
            <artifactId>flink-connector-redis_2.11</artifactId>
            <version>1.0</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.44</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-elasticsearch6_2.12</artifactId>
            <version>1.11.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-json</artifactId>
            <version>1.11.1</version>
<!--            <scope>test</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-statebackend-rocksdb_2.12</artifactId>
            <version>1.11.1</version>
<!--            <scope>test</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_2.12</artifactId>
            <version>1.11.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_2.12</artifactId>
            <version>1.11.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-csv</artifactId>
            <version>1.11.1</version>
<!--            <scope>test</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>1.11.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-cep_2.11</artifactId>
            <version>1.11.1</version>
            <scope>provided</scope>
        </dependency>

打包插件

打包插件就俩东西,编译插件、集成打包插件

  • 编译插件。负责编译

  • 集成打包插件,负责把编译好的东西和依赖的东西打成jar

    这里集成打包插件其实有很多个,例如shade、assembly,这些都可以,各有各的操作方式。这里我就用一个能打包好的就行,而且怎么简单怎么来了,实际上assembly插件还可以有个assembly.xml配置文件,可以精细化配置哪些文件怎么打包,指定执行类啥的,Flink这边不需要指定执行类,暂时也没精细化打包配置需求,根据scope范围进行打包就好了,会把相关依赖都打进去,可以提交到Flink去运行

  • 要想精细化管理打包,去找assembly或这shade相关详细配置文档吧

<build>
        <plugins>
            <!--编译插件-->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <!--集成打包插件-->
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>
        </plugins>
    </build>

提交到Flink运行

这里说两种方式

首先贴官网文档位置:https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/yarn_setup.html#submit-job-to-flink

UI界面提交

Local Cluster模式,单节点自己测试这么玩

Flink集群服务启动后,进入UI界面,单节点的话端口是8081

进入页面后可以看到 Submit New Job,这里按照步骤上传Jar,填写执行类路径等配好就可以启动了,这里就不贴图了

命令行提交

这里就贴上几个常用的执行命令吧,更多命令去官网查就好了

提交任务

flink/bin/flink run -m localhost:8081 -c com.mym.api.tableapi.TestSQL_savepoint_restore -p 1 /opt/module/Flink_second-1.0-SNAPSHOT-jar-with-dependencies.jar

  • -m 指定master服务
  • -c 指定执行的类路径
  • -p 并行度

保存一个savepoint

命令格式:./bin/flink savepoint <jobId> [savepointDirectory]

示例:flink/bin/flink savepoint 5ab6b4bca845b83ca81d531f16da8ba9 /opt/module/savepoint/

从savepoint启动

flink/bin/flink run -s /opt/module/savepoint/savepoint-5ab6b4-5231ed7090f0 -m localhost:8081 -c com.mym.api.tableapi.TestSQL_savepoint_restore -p 1 /opt/module/xxx-jar-with-dependencies.jar

-s表示从savepoint启动,后面跟savepoint位置

可选参数-n,全称参数名是--allowNonRestoredState。表示允许删除了的节点允许状态恢复时跳过它,此参数可跟在savepoint路径后

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值