背景
最近公司项目在做实时计算相关,也是使用了flink做实时计算的引擎,是以flink on yarn的方式进行任务的调度,过程中也踩了一些坑,没有完整记录下来,就记录一些自己印象比较深刻的问题或者坑,希望对大家也有所帮助,问题如下:
1、jar包集群lib中管理? or maven打fat jar?
2、运行直接报错 Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for ‘org.apache.flink.table.factories.TableSourceFactory’ in the classpath.
3、运行JobManager报错 Could not resolve ResourceManager address akka tcp://flink@xxxxx, retrying in 10000 ms : Count not connect to rpc endpoint under address akka tcp://flink@xxxxx
下面我们一个个来说明。
一、jar包集群lib中管理? or maven打fat jar?
因为我们是第一次做flink相关的实时计算,对于某些方面的最佳实践可不是特别清楚,一开始的做法是在flink的lib中把所有的jar包扔进去,不过结合后面实践过程中的思考,也问过一些经验丰富的flink大牛,最终还是确定下来最佳实践是maven中打fat jar,下面我们看看官方的说法:
官方文档关于该问题的说法:flink官方文档关于依赖的说明
可以看到官方已经对这两种方式进行了讨论,而且更推荐是fat jar的方式。其实简单的想想是这样的道理,如果什么依赖都往lib下放的话,一个集群可能相同的组件就有好几套,对应的版本也都不同,比如我们公司就同时存在hbase1.x和2.x,势必会造成依赖冲突。而lib下仅仅依赖flink的公共组件,定制化的依赖都由应用程序自己提供的话,可以达到一个依赖隔离的效果。
二、运行直接报错 Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for ‘org.apache.flink.table.factories.TableSourceFactory’ in the classpath.
这个问题可以参考下下面的链接:
1、https://blog.csdn.net/zhanghuolei/article/details/105767190
2、https://stackoverflow.com/questions/52500048/flink-could-not-find-a-suitable-table-factory-for-org-apache-flink-table-facto
解决方法就是pom的里shade打包加上下面这段:
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
ps : 奇怪的是,为什么官方的自动生成flink代码的脚手架不主动帮你生成这个呢,有没有大佬回答下
三、运行JobManager报错 Could not resolve ResourceManager address akka tcp://flink@xxxxx, retrying in 10000 ms : Count not connect to rpc endpoint under address akka tcp://flink@xxxxx
报错的具体表现就是flink on yarn per-job模式任务能提交上去,但是一直是ACCEPTED状态,JobManager有部分报错日志,但是不够明显,ResourceManager也没起来。
JobManager具体报错如下图
这个问题一开始我以为是flink的部署问题,后面用flink on k8s的方式跑了一下也是这个问题,然后用flink官方的demo跑了下on yarn的方式却能正常的跑,然后怀疑是pom的问题,查了一天,问了一天,最后咨询了下flink大牛zhisheng,居然被他秒解了哈哈。具体原因就是:pom里依赖了flink-connector-hbase,而我们启动flink的时候,环境变量也会加载集群里的hadoop包,跟flink-connector-hbase里的hadoop包冲突了,后面我把hadoop相关的依赖包exclude之后确实是能正常运行了。 幸好有zhisheng啊,这个问题报错信息也不够明显,经验不够丰富还真不好排查出来。
最后,附上我们flink on yarn的pom可以供大家参考下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.getui.flink</groupId>
<artifactId>flink-sql-submit</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<flink.version>1.11.2</flink.version>
<java.version>1.8</java.version>
<scala.binary.version>2.12</scala.binary.version>
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
</properties>
<dependencies>
<!-- Flink modules -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-json</artifactId>
<version>1.11.2</version>
<scope>provided</scope>
</dependency>
<!-- Add logging framework, to produce console output when running in the IDE. -->
<!-- These dependencies are excluded from the application JAR by default. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<!-- CLI dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- connector -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hbase_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
</exclusion>
<exclusion>
<artifactId>hadoop-common</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-client</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-yarn-common</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-auth</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-jdbc_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Java Compiler -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
<!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
<!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.apache.flink:force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.getui.flink.core.SqlSubmit</mainClass>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. -->
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<versionRange>[3.0.0,)</versionRange>
<goals>
<goal>shade</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<versionRange>[3.1,)</versionRange>
<goals>
<goal>testCompile</goal>
<goal>compile</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>