1 建立一个java项目,将hadoop依赖的包导入项目中
2 创建Mapper类
public class MapperClass extends Mapper<Object, Text, Text, IntWritable>{
IntWritable one = new IntWritable(1);
Text word = new Text();
protected void map(Object key, Text value,org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException, InterruptedException {
String string = value.toString();
StringTokenizer stringTokenizer = new StringTokenizer(string);
while(stringTokenizer.hasMoreTokens()){
word.set(stringTokenizer.nextToken());
context.write(word, one);
}
}
}
3 创建Reducer 类
public class ReducerClass extends Reducer<Text, IntWritable, Text, IntWritable>{
protected void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
注意,千万不要用Eclipse的自动填充,默认会填充成下面的形式,但是在做UT的时候会造成断言不正确,原因不明,