运行mapreduce程序时出现异常java.lang.RuntimeException: java.lang.ClassNotFoundException的原因
hadoop分布式配置完毕后,我将主节点的下的hadoop-0.20.1直接导入eclipse,想直接在eclipse中书写程序并直接编译运行于hadoop集群上。今天才发现,这是不可能成功的。因为我忽略了mapreduce程序在hadoop中的运行机理:mapreduce框架在运行Job时,为了使得各个从节点上能执行task任务(即map和reduce函数),会在作业提交时将运行作业所需的资源,包括作业jar文件、配置文件和计算所得的输入划分,复制到HDFS上一个以作业ID命名的目录中,并且作业jar的副本较多,以保证tasktracker运行task时可以访问副本,执行程序。
现象:
0/08/16 15:25:48 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
Output directory wcout already exists,firstly delete it
10/08/16 15:25:49 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
10/08/16 15:25:49 INFO input.FileInputFormat: Total input paths to process : 4
10/08/16 15:25:50 INFO mapred.JobClient: Running job: job_201008161439_0004
10/08/16 15:25:51 INFO mapred.JobClient: map 0% reduce 0%
10/08/16 15:26:00 INFO mapred.JobClient: Task Id : attempt_201008161439_0004_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.examples.WordCount2$WordCountMapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:808)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:532)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.examples.WordCount2$WordCountMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:806)
... 4 more
原因分析:
程序不是以jar的形式运行的,所以不会上传jar到HDFS中,以致节点外的所有节点在执行task任务时上不能找到map和reduce类,所以在运行task时会出现错误。实际上程序中对没有上传jar的运行方式可能会导致错误有提示信息:No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String)。
至于原来的Wordcount程序能运行是因为每个节点都有这个类存在,而我现在确只有主节点上有该类。
解决办法:
(1)所有节点的examples包中加上该类,
(2)打包成jar,再运行。
注意事项:
即时是伪分布式下(一个节点既是主节点又是从节点)下直接在eclipse中也不能运行成功,也需要打包成jar