本文记录了hadoop程序开发完成打jar包到服务器上运行过程中的常见报错信息并提供解决方案:
1、运行jar文件的命令:
hadoop jar name.jar mainClass args[0] args[1]
name.jar -- jar文件名
mainClass -- 要执行的程序main方法所在类,以项目src/为根目录的对应路径,中间以"."连接或直接取main方法所在类的第一行(有Java代码编写经验的都懂啦)
比如上图,假如要执行的话,mainClass则应为 practise.day01.ioPractise
args[0]和args[1]为运行参数,我接触到的较多案例为两个hdfs路径。
2、执行后可能报错:
a) 签名文件错误:
Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
at sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:314)
at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:268)
at java.util.jar.JarVerifier.processEntry(JarVerifier.java:316)
at java.util.jar.JarVerifier.update(JarVerifier.java:228)
at java.util.jar.JarFile.initializeVerifier(JarFile.java:383)
at java.util.jar.JarFile.getInputStream(JarFile.java:450)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:101)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:81)
at org.apache.hadoop.util.RunJar.run(RunJar.java:209)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
解决:
删除jar文件中的部分文件:
zip -d name.jar META-INF/*.DSA META-INF/*.RSA META-INF/*.SF
再执行,即可(具体原理暂时未研究,网上相关内容较多,感兴趣的同学可自行百度)
b) 10020拒绝连接:
Exception in thread "main" java.io.IOException: java.net.ConnectException: Call From myspark/192.168.13.150 to myspark:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:344)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:429)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:573)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:616)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1323)
at actual.jd_order_analysis.pre.WeblogPreProcess.main(WeblogPreProcess.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.ConnectException: Call From myspark/192.168.13.150 to myspark:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.getJobReport(Unknown Source)
at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:325)
... 17 more
Caused by: java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 25 more
这是因为hadoop的historyserver未开启,启动脚本在$HADOOP-HOME/sbin目录下,如果没有配置环境变量,需切换到$HADOOP-HOME/sbin目录,执行:
./mr-jobhistory-daemon.sh start historyserver
以上,b)对程序运行无影响,Hadoop会将执行完成的作业重定向到10020,做历史记录。如果未开启historyserver,则无法保存已完成的作业记录。