一、问题背景
最近在做行车数据实时分析,为了后续批流一体化的开发,前期先做技术铺垫。目前使用Flink作为批流一体切入方案。以下是基于yarn模式提交 flinksql job时出现端口冲突的问题
二、问题复述
1、我目前使用的是flink-1.12.0版本。配置文件如下
flink-conf.yaml
master和worker配置
vi master
bj-pan.com-04:11057
vi worker
bj-pan.com-02
bj-pan.com-03
bj-pan.com-04
2、启动flink集群
3.基于yarn模式提交flinksql job任务
4、异常日志详情
这异常日志 flink-1.12.0/log/flink-root-client-bj-pan.com-04.log 下,直接体现的日志是:表面看是内存问题。
2021-04-01 09:37:46,081 ERROR com.flink.streaming.core.JobApplication [] - 任务执行失败:
org.apache.flink.table.api.TableException: Failed to execute sql
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:696) ~[flink-table_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.table.api.internal.StatementSetImpl.execute(StatementSetImpl.java:97) ~[flink-table_2.11-1.12.0.jar:1.12.0]
at com.flink.streaming.core.JobApplication.main(JobApplication.java:66) ~[flink-streaming-core-1.2.0.RELEASE.jar:1.2.0.RELEASE]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_141]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_141]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_141]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_141]
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:316) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:743) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:242) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:971) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_141]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_141]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.0.jar:?]
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) [flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047) [flink-dist_2.11-1.12.0.jar:1.12.0]
Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster.
at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:460) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1940) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:128) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:57) ~[flink-table-blink_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:680) ~[flink-table_2.11-1.12.0.jar:1.12.0]
... 18 more
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1616133445012_0210 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1616133445012_0210_000001 exited with exitCode: 1
Failing this attempt.Diagnostics: [2021-04-01 09:37:45.847]Exception from container-launch.
Container id: container_1616133445012_0210_01_000001
Exit code: 1
[2021-04-01 09:37:45.848]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
[2021-04-01 09:37:45.849]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
For more detailed output, check the application tracking page: http://bj-pan.com-02:8088/cluster/app/application_1616133445012_0210 Then click on links to logs of each attempt.
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1616133445012_0210
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1078) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:558) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:453) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1940) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:128) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:57) ~[flink-table-blink_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:680) ~[flink-table_2.11-1.12.0.jar:1.12.0]
... 18 more
2021-04-01 09:37:46,095 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Cancelling deployment from Deployment Failure Hook
2021-04-01 09:37:46,096 INFO org.apache.hadoop.yarn.client.RMProxy [] - Connecting to ResourceManager at bj-pan.com-02/172.17.112.108:8032
2021-04-01 09:37:46,101 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Killing YARN application
2021-04-01 09:37:46,117 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl [] - Killed application application_1616133445012_0210
2021-04-01 09:37:46,218 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deleting files in hdfs://bj-pan.com-01:8020/user/root/.flink/application_1616133445012_0210.
使用命令查看yarn logs -applicationId application_1616133445012_0210
2021-04-01 09:37:45,490 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Could not start cluster entrypoint YarnJobClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:191) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:529) [flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:95) [flink-dist_2.11-1.12.0.jar:1.12.0]
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:257) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:220) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:173) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_141]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_141]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:172) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
... 2 more
Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 11057
at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:222) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:162) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:220) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:173) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_141]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_141]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:172) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
... 2 more
End of LogType:jobmanager.log
问题描述:
提示端口被占用。因为已经启动standalone模式。端口rest.port端口以及使用。如若在基于yarn模式提交job。位置文件明确说明。 The port to which the REST client connects to. If rest.bind-port has
# not been specified, then the server will bind to this port as well.因为我没有使用rest.bind-port (第一张图的第二个红色框)。无法基于使用端口创建yarn client()
三、问题解决方案
配置rest.bind-port 范围端口(最好配置大点的范围)
再打提交job任务,可以可以看到当前节点YarnJobClusterEntrypoint
11952 YarnJobClusterEntrypoint
8449 StandaloneSessionClusterEntrypoint
12889 YarnJobClusterEntrypoint
8873 TaskManagerRunner
18126 NodeManager
29151 Jps