1. jar cf WordCount.jar WordCount*.class
使用情况:
编译WordCount.java文件,编译java文件的命令为javac,截图如下:
此时,在workspace文件夹下将会出现生成三个class文件,
编译成功后,即可将三个class文件打包成jar文件,
执行成功后,在workspace文件下生成了WordCount.jar文件,
-c:创建新的jar文件包;
-f:指定jar文件名;
WordCount.jar:[jar-文件] 即需要生成、查看、更新或者解开的 JAR 文件包,它是 -f 参数的附属参数 ;
WordCount*.class:名字可以简写为WordCount的.class文件;
2.bin/hadoop jar workspace/WordCount.jar WordCount input output
使用情况:
在/usr/local/hadoop文件夹下新建一个input文件夹,用于存放数据,
接着cd 到input文件下,执行以下命令,就是将’Hello World Bye World’写进file01文件,将’Hello Hadoop Goodbye Hadoop’ 写进file02文件
最后运行程序,
类似的:
hadoop jar WordCount.jar WordCount input output
hadoop jar WordCount.jar WordCount /tmp/input /tmp/output
/usr/local/hadoop/bin/hadoop jar WordCount.jar WordCount input output
/usr/local/hadoop/bin/hadoop jar WordCount.jarorg.apache.hadoop.examples.WordCount input output
(因为某些程序中声明了 package ,所以在命令中也要 org.apache.hadoop.examples 写完整,这些程序的第一行代码就是:package org.apache.hadoop.examples)
bin/hadoop:/usr/local/hadoop/bin/hadoop,这是一个hadoop文件的位置,不是文件夹,是对java命令的又一层封装,可以认为是hadoop在shell端的脚本;
jar:执行一个作业任务,其数据在jar中;
workspace/WordCount.jar:WordCount.jar的详细位置,结合前面的参数/usr/local/hadoop,详细位置为/usr/local/hadoop/workspace/WordCount.jar;
WordCount:
input:在hdfs中的数据输入目录;
output:在hdfs中的数据输出目录;
代码示例:
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
publicclassWordCount{
publicstaticclassTokenizerMapper
extendsMapper<Object,Text,Text,IntWritable>{
privatefinalstaticIntWritable one=newIntWritable(1);
privateText word=newText();
publicvoid map(Object key,Text value,Context context
)throwsIOException,InterruptedException{
StringTokenizer itr =newStringTokenizer(value.toString());
while(itr.hasMoreTokens()){
word.set(itr.nextToken());
context.write(word, one);
}
}
}
publicstaticclassIntSumReducer
extendsReducer<Text,IntWritable,Text,IntWritable>{
privateIntWritable result=newIntWritable();
publicvoid reduce(Text key,Iterable<IntWritable> values,
Context context
)throwsIOException,InterruptedException{
int sum =0;
for(IntWritable val: values){
sum+= val.get();
}
result.set(sum);
context.write(key, result);
}
}
publicstaticvoid main(String[] args)throwsException{
Configuration conf =newConfiguration();
String[] otherArgs =newGenericOptionsParser(conf, args).getRemainingArgs();
if(otherArgs.length!=2){
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job =newJob(conf,"word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job,newPath(otherArgs[0]));
FileOutputFormat.setOutputPath(job,newPath(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
参考资料:
http://blog.csdn.net/wang_zhenwei/article/details/47403825
http://dblab.xmu.edu.cn/blog/hadoop-build-project-by-shell/