记录下,同事在自己环境弄,出了一堆问题,搞个wordcount的流计算打包发布搞不通,,,网上资料对于flink版本层次不齐,想想还是记录下个人在1.11.x版本的处理,别在这事上浪费时间
对应节点贴上了官网文档位置,最好的文档就是官方文档
环境
maven V3.6.x
Flink 1.11.x
JDK 1.8
Scala 2.11.x 2.12.x都行
IDEA 2021的版本
工程
运行逻辑代码
几条SQL的运行,对接kafka,经过计算后控制台输出
package com.mym.api.tableapi;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.types.Row;
public class TestSQL_savepoint_restore {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setStateBackend(new FsStateBackend("file:///opt/module/checkpoint/"));
CheckpointConfig config = env.getCheckpointConfig();
config.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
config.setCheckpointInterval(1000);
// 设置模式为exactly-once
config.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
// 确保检查点之间有至少500 ms的间隔【checkpoint最小间隔】
config.setMinPauseBetweenCheckpoints(500);
// 检查点必须在一分钟内完成,或者被丢弃【checkpoint的超时时间】
config.setCheckpointTimeout(60000);
// 同一时间只允许进行一个检查点
config.setMaxConcurrentCheckpoints(1);
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
tableEnv.sqlUpdate("CREATE TABLE t1 ( " +
" tid STRING, " +
" tname STRING, " +
" age INT, " +
" height BIGINT " +
" ) WITH ( " +
" 'connector' = 'kafka-0.11', " +
" 'topic' = 'testsavepoint', " +
" 'scan.startup.mode' = 'latest-offset', " + //optional: valid modes are "earliest-offset","latest-offset", "group-offsets",or "specific-offsets"
" 'properties.group.id' = 'testGroup', " +
" 'properties.zookeeper.connect' = '192.168.31.1:2181', " +
" 'properties.bootstrap.servers' = '192.168.31.1:9092', " +
" 'format' = 'json' " +
")"
);
String sql1 = "select * from t1 where age > 1 and tid = '1'";
Table sql1Table = tableEnv.sqlQuery(sql1);
tableEnv.toRetractStream(sql1Table, Row.class).print("sql1Table");
tableEnv.createTemporaryView("t3", sql1Table);
String sql3 = "select tid, age from t3 where age > 1 and tid = '1' ";
Table sql3Table = tableEnv.sqlQuery(sql3);
tableEnv.toRetractStream(sql3Table, Row.class).print("sql3Table");
tableEnv.createTemporaryView("t2", sql3Table);
String sql2 = "select tid, count(*) as cnt from t2 where age > 10 group by tid ";
Table sql2Table = tableEnv.sqlQuery(sql2);
tableEnv.toRetractStream(sql2Table, Row.class).print("sql2Table");
env.execute("test");
}
}
pom文件
pom文件里重点几个地方:properties定义一些版本号;dependences定义依赖;build-plugin定义构建工程时打包插件等
依赖
这里要清楚的几个东西,每个依赖到底该provided还是不该呢?
如果打包后运行报错如下就说明依赖冲突了,你把Flink运行环境有的东西也打包进去了
Caused by: java.lang.ClassCastException: org.codehaus.janino.CompilerFactory cannot be cast to org.codehaus.commons.compiler.ICompilerFactory
那么依赖的scope如何设置呢,与Flink无关的依赖肯定要打包进去比如mysql驱动,作为Flink插件式扩展功能的也要打包进去,详细参考Flink对应版本官网说明
看这里:https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/project-configuration.html
以下是个人测试几个依赖的示例,可以参考,打包可以正常跑的
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.11.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.11.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_2.12</artifactId>
<version>1.11.1</version>
</dependency>
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>flink-connector-redis_2.11</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.44</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-elasticsearch6_2.12</artifactId>
<version>1.11.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-json</artifactId>
<version>1.11.1</version>
<!-- <scope>test</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb_2.12</artifactId>
<version>1.11.1</version>
<!-- <scope>test</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.12</artifactId>
<version>1.11.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_2.12</artifactId>
<version>1.11.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-csv</artifactId>
<version>1.11.1</version>
<!-- <scope>test</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.12</artifactId>
<version>1.11.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-cep_2.11</artifactId>
<version>1.11.1</version>
<scope>provided</scope>
</dependency>
打包插件
打包插件就俩东西,编译插件、集成打包插件
-
编译插件。负责编译
-
集成打包插件,负责把编译好的东西和依赖的东西打成jar
这里集成打包插件其实有很多个,例如shade、assembly,这些都可以,各有各的操作方式。这里我就用一个能打包好的就行,而且怎么简单怎么来了,实际上assembly插件还可以有个assembly.xml配置文件,可以精细化配置哪些文件怎么打包,指定执行类啥的,Flink这边不需要指定执行类,暂时也没精细化打包配置需求,根据scope范围进行打包就好了,会把相关依赖都打进去,可以提交到Flink去运行
-
要想精细化管理打包,去找assembly或这shade相关详细配置文档吧
<build>
<plugins>
<!--编译插件-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<!--集成打包插件-->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
提交到Flink运行
这里说两种方式
首先贴官网文档位置:https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/yarn_setup.html#submit-job-to-flink
UI界面提交
Local Cluster模式,单节点自己测试这么玩
Flink集群服务启动后,进入UI界面,单节点的话端口是8081
进入页面后可以看到 Submit New Job,这里按照步骤上传Jar,填写执行类路径等配好就可以启动了,这里就不贴图了
命令行提交
这里就贴上几个常用的执行命令吧,更多命令去官网查就好了
提交任务
flink/bin/flink run -m localhost:8081 -c com.mym.api.tableapi.TestSQL_savepoint_restore -p 1 /opt/module/Flink_second-1.0-SNAPSHOT-jar-with-dependencies.jar
-m
指定master服务-c
指定执行的类路径-p
并行度
保存一个savepoint
命令格式:./bin/flink savepoint <jobId> [savepointDirectory]
示例:flink/bin/flink savepoint 5ab6b4bca845b83ca81d531f16da8ba9 /opt/module/savepoint/
从savepoint启动
flink/bin/flink run -s /opt/module/savepoint/savepoint-5ab6b4-5231ed7090f0 -m localhost:8081 -c com.mym.api.tableapi.TestSQL_savepoint_restore -p 1 /opt/module/xxx-jar-with-dependencies.jar
-s
表示从savepoint启动,后面跟savepoint位置
可选参数-n
,全称参数名是--allowNonRestoredState
。表示允许删除了的节点允许状态恢复时跳过它,此参数可跟在savepoint路径后