Assuming that:
- /user/joe/wordcount/input - input directory in HDFS
- /user/joe/wordcount/output - output directory in HDFS
上面的意思并没有要求在hdfs当中创建好output,如果创建会报出后面那些一样。
Run the application:
$ bin/hadoop jar wc.jar WordCount /user/joe/wordcount/input /user/joe/wordcount/output
这里一开始我把WordCount.java和编译的类文件都放在自己的目录下(wc.jar包一直在Hadoop目录下),WordCount也写了绝对路径(java文件不在任何包下,拷贝自Hadoop网站),但是一开始依然报class not found 的异常,后来在Hadoop目录下编译了java文件,之后成功,不是很理解为什么。
17/02/25 16:36:05 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.28.131:8032
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://master:9000/user/daiqing/wordcount/output already existsat org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at WordCount.main(WordCount.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
运行example当中的WordCount,使用-files做外部文件的输入
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount -files WordCount.java /user/daiqing/wordcount/input /user/daiqing/wordcount/output1