最初学习Flink,写了一个简单的wordcount运行一下,发现报错,异常信息如下:
The program finished with the following exception:
java.lang.RuntimeException: Error deploying the YARN cluster
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createCluster(FlinkYarnSessionCli.java:556)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createCluster(FlinkYarnSessionCli.java:72)
at org.apache.flink.client.CliFrontend.createClient(CliFrontend.java:962)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:243)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1086)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)
at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1130)
Caused by: java.lang.RuntimeException: Couldn't deploy Yarn cluster
at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:443)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createCluster(FlinkYarnSessionCli.java:554)
... 12 more
Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1505452408837_0056 failed 1 times due to AM Container for appattempt_1505452408837_0056_000001 exited with exitCode: 31
For more detailed output, check application tracking page:http://node:8099/cluster/app/application_1505452408837_0056Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1505452408837_0056_01_000001
Exit code: 31
Stack trace: ExitCodeException exitCode=31:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Container exited with a non-zero exit code 31
Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
由于Flink资料比较少,google很久也没有找到,后来一直以为是Flink Bug所致了,故而没有太重视,后来随着版本升级发现还是报这个错误,开始怀疑自己了。然后开始进行错误排查,只能说这个错误信息真能蒙蔽人。经过一点点探索,通过查看yarn命令可以发现如下比较有意义的信息:
命令行:
yarn logs -applicationId application_1505452408837_0056
异常信息为:
2017-11-22 22:07:35,539 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to daxin (auth:SIMPLE)
2017-11-22 22:07:35,551 ERROR org.apache.flink.yarn.YarnApplicationMasterRunner - YARN Application Master initialization failed
java.lang.NoSuchMethodError: org.apache.flink.runtime.util.ExecutorThreadFactory.<init>(Ljava/lang/String;)V
at org.apache.flink.yarn.YarnApplicationMasterRunner.runApplicationMaster(YarnApplicationMasterRunner.java:223)
at org.apache.flink.yarn.YarnApplicationMasterRunner$1.call(YarnApplicationMasterRunner.java:195)
at org.apache.flink.yarn.YarnApplicationMasterRunner$1.call(YarnApplicationMasterRunner.java:192)
at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.yarn.YarnApplicationMasterRunner.run(YarnApplicationMasterRunner.java:192)
at org.apache.flink.yarn.YarnApplicationMasterRunner.main(YarnApplicationMasterRunner.java:116)
End of LogType:jobmanager.log
LogType:jobmanager.out
Log Upload Time:Wed Nov 22 22:07:37 +0800 2017
LogLength:0
Log Contents:
End of LogType:jobmanager.out
啊,这个异常信息终于可以提示一些有意义的信息了!如上异常提示ExecutorThreadFactory没有带有String类型的构造器,经过远程debug发现也有该类,使用反射检查构造器个数的确是0个。反复解压jar包,使用反编译查看class文件都没有发现问题,而且使用-verbose:class显示class加载信息也没有发现问题。如果有人发现问题请指点。
由于程序jar包是由idea直接export的,不是maven打包。后来将原有程序不动,使用maven打包运行既然可以运行。
问题解决了,却没有找到原因的确挺尴尬的。但是收获还是有的:
1:可以编码方式显示每一个class文件来至于哪一个jar包,除了使用maven分析冲突之外,这种方式可以分析非maven项目的jar冲突。程序源码:
public static List<String> getJarName(String className) throws Exception {
//boot类加载器加载的jar路径
String bootPath = System.getProperty("sun.boot.class.path");
//ext类加载器加载的jar路径
String extPath = System.getProperty("java.ext.dirs");
//所有的jars
String allPath = System.getProperty("java.class.path");
//windows平台是; linux平台是: ,最好使用File.pathSeparatorChar做跨平台
String[] jarnames = allPath.split(";");
List<String> jars = new ArrayList<>();
for (int i = 0; i < jarnames.length; i++) {
File file = new File(jarnames[i]);
if (file.isFile() && file.exists()) {
JarFile jar = new JarFile(file);
Enumeration<JarEntry> enums = jar.entries();
while (enums.hasMoreElements()) {
JarEntry entry = enums.nextElement();
String qulifierName = entry.getName().replace("/", ".");
if (qulifierName.equals(className)) {
jars.add("class name = "+qulifierName + " , jar filename = " + file.getName());
}
}
}
}
return jars;
}
例如调用:
getJarName("java.util.concurrent.ThreadPoolExecutor.class")
输出:
class name = java.util.concurrent.ThreadPoolExecutor.class , jar filename = rt.jar
输出部分信息如:
3:关于Flink调试JVM参数文档地址:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#common-options
配置参数的类:
org.apache.flink.configuration.ConfigConstants
org.apache.flink.configuration.CoreOptions
所有有关配置的工具类都在org.apache.flink.configuration包中。