问题描述:
org.apache.flink.client.program.rest.RestClusterClient:Could not retrieve the web interface URL for the cluster.
详细日志如下
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at com.dtstack.flinkx.launcher.Launcher.main(Launcher.java:131)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:400)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.co
下载flink 1.13进行编译(注意一定要全部编译,如果单独编译可能会出现各种问题):
在RestClusterClient假如日志:
CompletableFuture<URL> getWebMonitorBaseUrl() {
LOG.info(
"------------------getWebMonitorBaseUrl {}, {},",restClusterClientConfiguration.getAwaitLeaderTimeout() , TimeUnit.MILLISECONDS);
return FutureUtils.orTimeout(
webMonitorLeaderRetriever.getLeaderFuture(),
restClusterClientConfiguration.getAwaitLeaderTimeout(),
TimeUnit.MILLISECONDS)
.thenApplyAsync(
leaderAddressSessionId -> {
final String url = leaderAddressSessionId.f0;
LOG.info("------------------getWebMonitorBaseUrl url is {}", url);
try {
return new URL(url);
} catch (MalformedURLException e) {
throw new IllegalArgumentException(
"Could not parse URL from " + url, e);
}
},
executorService);
}
注意如果报下面错误:
Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check (spotless-check) on project flink-clients_2.11: The following files had format violations:
src\main\java\org\apache\flink\client\program\rest\RestClusterClient.java
@@ -1,895 +1,895 @@
-/*\n
需要使用 mvn spotless:apply 先格式化一下代码
编译后将含有日志的打印出来,效果如下:
可以看到超时时间是30秒,这个是正常情况.
接着往下定位:
异常环境中为false,可能是因为调度方式的问题。
初步原因是这两个地方
org.apache.flink.runtime.concurrent.FutureUtils中的orTimeOut方法。
返回标识此CompletableFuture的字符串及其完成状态。括号中的状态包含字符串 “Completed Normally”(“正常完成”)或字符串 “Completed Exceptionally”(“异常完成”),或字符串 “Not completed”(“未完成”),其后是取决于完成情况的CompletableFuture数量(如果有)。
引起此原因更深入的原因,查找完成后,看后续文章。
原因已经定位:
此问题为 flink启动yarn-session.sh方式,但是flink-conf.yaml配置文件中没有配置zookeeper高可用。
配置后重启 flink问题解决