MapperReduce入门Wordcount案例
0.本案例是在本地运行MapperReduce
1.准备材料开发工具Intellij IDEA + 运行hadoop使用的jar包
2.打开IDEA创建一个普通Java工程,导入jar包,为方便查看日志信息,引入一个log4j.properties的配置文件
3.需要自己编写的类包括三个WordCountMapper、WordCountReducer、WordCountDriver
代码如下
WordCountMapper:
package com.liu;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WCMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
//1.mapper阶段,切片
// 1).mapper类首先要继承自mapper类,指定输入的key类型,输入的value类型
// 2).指定输出的key类型,输出的value类型
// 3).重写map方法
// 在map方法里面获取的是文本的行号,一行文本的内容,写出的上下文对象
Text k = new Text();
IntWritable v = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split(" ");
for (String word:words
) {
k.set(word);
context.write(k, v);
}
}
}
WordCountReducer:
package com.liu;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class WCReduce extends Reducer<Text,IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum=0; //定义一个变量来统计单词出现的次数
for (IntWritable num:values //遍历这个迭代器,累计单词出现的次数
) {
sum += num.get();
}
context.write(key,new IntWritable(sum));
}
}
WordCountDriver:
package com.liu;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WCDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//创建Job作业
Job job = Job.getInstance(new Configuration());
//设置驱动类
job.setJarByClass(WCDriver.class);
//设置mapper类、reduce类
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReduce.class);
//设置map阶段输出的key类型、value类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//设置reduce阶段输出key类型、value类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//设置读取文件路径、输出文件路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//等待提交作业
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
4.在本地运行MapperReduce,直接运行Driver所在的类的main方法,注意输入参数,文件读取路径、文件输出路径为参数(注意输出路径一定不能存在,否则会报错)
5.执行结果