Flink On Yarn 异常排除过程以及根据字节码名字获取jar文件名字

最初学习Flink,写了一个简单的wordcount运行一下,发现报错,异常信息如下:

 The program finished with the following exception:

java.lang.RuntimeException: Error deploying the YARN cluster
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createCluster(FlinkYarnSessionCli.java:556)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createCluster(FlinkYarnSessionCli.java:72)
	at org.apache.flink.client.CliFrontend.createClient(CliFrontend.java:962)
	at org.apache.flink.client.CliFrontend.run(CliFrontend.java:243)
	at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1086)
	at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)
	at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)
	at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
	at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1130)
Caused by: java.lang.RuntimeException: Couldn't deploy Yarn cluster
	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:443)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createCluster(FlinkYarnSessionCli.java:554)
	... 12 more
Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. 
Diagnostics from YARN: Application application_1505452408837_0056 failed 1 times due to AM Container for appattempt_1505452408837_0056_000001 exited with  exitCode: 31
For more detailed output, check application tracking page:http://node:8099/cluster/app/application_1505452408837_0056Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1505452408837_0056_01_000001
Exit code: 31
Stack trace: ExitCodeException exitCode=31: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
	at org.apache.hadoop.util.Shell.run(Shell.java:456)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)


Container exited with a non-zero exit code 31
Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:

由于Flink资料比较少,google很久也没有找到,后来一直以为是Flink Bug所致了,故而没有太重视,后来随着版本升级发现还是报这个错误,开始怀疑自己了。然后开始进行错误排查,只能说这个错误信息真能蒙蔽人。经过一点点探索,通过查看yarn命令可以发现如下比较有意义的信息:

命令行:

yarn logs -applicationId application_1505452408837_0056

异常信息为:

2017-11-22 22:07:35,539 INFO  org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set to daxin (auth:SIMPLE)
2017-11-22 22:07:35,551 ERROR org.apache.flink.yarn.YarnApplicationMasterRunner             - YARN Application Master initialization failed
java.lang.NoSuchMethodError: org.apache.flink.runtime.util.ExecutorThreadFactory.<init>(Ljava/lang/String;)V
	at org.apache.flink.yarn.YarnApplicationMasterRunner.runApplicationMaster(YarnApplicationMasterRunner.java:223)
	at org.apache.flink.yarn.YarnApplicationMasterRunner$1.call(YarnApplicationMasterRunner.java:195)
	at org.apache.flink.yarn.YarnApplicationMasterRunner$1.call(YarnApplicationMasterRunner.java:192)
	at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
	at org.apache.flink.yarn.YarnApplicationMasterRunner.run(YarnApplicationMasterRunner.java:192)
	at org.apache.flink.yarn.YarnApplicationMasterRunner.main(YarnApplicationMasterRunner.java:116)
End of LogType:jobmanager.log

LogType:jobmanager.out
Log Upload Time:Wed Nov 22 22:07:37 +0800 2017
LogLength:0
Log Contents:
End of LogType:jobmanager.out


啊,这个异常信息终于可以提示一些有意义的信息了!如上异常提示ExecutorThreadFactory没有带有String类型的构造器,经过远程debug发现也有该类,使用反射检查构造器个数的确是0个。反复解压jar包,使用反编译查看class文件都没有发现问题,而且使用-verbose:class显示class加载信息也没有发现问题。如果有人发现问题请指点。



由于程序jar包是由idea直接export的,不是maven打包。后来将原有程序不动,使用maven打包运行既然可以运行。



问题解决了,却没有找到原因的确挺尴尬的。但是收获还是有的:

1:可以编码方式显示每一个class文件来至于哪一个jar包,除了使用maven分析冲突之外,这种方式可以分析非maven项目的jar冲突。程序源码:

public static List<String> getJarName(String className) throws Exception {
        //boot类加载器加载的jar路径
        String bootPath = System.getProperty("sun.boot.class.path");
        //ext类加载器加载的jar路径
        String extPath = System.getProperty("java.ext.dirs");
        //所有的jars
        String allPath = System.getProperty("java.class.path");
        //windows平台是; linux平台是:  ,最好使用File.pathSeparatorChar做跨平台
        String[] jarnames = allPath.split(";");


        List<String> jars = new ArrayList<>();

        for (int i = 0; i < jarnames.length; i++) {
            File file = new File(jarnames[i]);
            if (file.isFile() && file.exists()) {
                JarFile jar = new JarFile(file);
                Enumeration<JarEntry> enums = jar.entries();
                while (enums.hasMoreElements()) {
                    JarEntry entry = enums.nextElement();
                    String qulifierName = entry.getName().replace("/", ".");
                    if (qulifierName.equals(className)) {
                        jars.add("class name =  "+qulifierName + " , jar filename = " + file.getName());
                    }
                }
            }
        }

        return jars;
    }

例如调用:

getJarName("java.util.concurrent.ThreadPoolExecutor.class")

输出:

class name =  java.util.concurrent.ThreadPoolExecutor.class , jar filename = rt.jar

2:使用-verbose:class显示程序加载class的详细信息:



输出部分信息如:



3:关于Flink调试JVM参数文档地址:

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#common-options


配置参数的类:

org.apache.flink.configuration.ConfigConstants
org.apache.flink.configuration.CoreOptions

所有有关配置的工具类都在org.apache.flink.configuration包中。





评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值