1、MapReduce 案例
本文实现 MapReduce 用 Java API 实现统计单词案例、代码优化及其 MapReduce 编程格式;这也是面试经典笔试代码题。代码就是 “ MapReduce 八股文 ”,我们只需修改部分代码。
Hadoop之旅(4)— MapReduce 与 YARN 原理讲解 、 Hadoop之旅(1)—单机与伪集群安装、简单经典案例
准备好环境:idea、创建项目、添加依赖;
A:依赖
<!-- Hadoop Client 依赖--> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency>
B:启动 MapReduce
2、代码实现 WordCountMapReduce.class
2.1、map 阶段 — 代码有一定的格式、只需修改部分代码
/** * step 1: Map Class * <p> * map 输入 ——> map 输出 * public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> */ public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private Text mapOutputKey = new Text(); private final static IntWritable mapOuputValue = new IntWritable(1); /** * Called once for each key/value pair in the input split. Most applications * should override this, but the default is the identity function. * * @param key * @param value * @param context */ @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //super.map(key, value, context); // 获取值 String lineValue = value.toString(); // split 占内存 ——> 分割字符串 // String[] strs = lineValue.split(" "); StringTokenizer stringTokenizer = new StringTokenizer(lineValue); // iterator while(stringTokenizer.hasMoreTokens()){ // get word value String wordValue = stringTokenizer.nextToken(); // set value mapOutputKey.set(wordValue);; // output ——> 上下文操作 context.write(mapOutputKey, mapOuputValue); } } }
2.2、Reduce 阶段
/** * step 2: Reduce Class * reduce 输入 ——> reduce 输出 * public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> */ public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable outputValue = new IntWritable(); /** * This method is called once for each key. Most applications will define * their reduce class by overriding this method. The default implementation * is an identity function. * * @param key * @param values * @param context */ @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { //super.reduce(key, values, context); // sum tmp int sum= 0 ; // iterator for(IntWritable value: values){ // total sum += value.get(); } // set value outputValue.set(sum); // output context.write(key, outputValue); } }
2.3、封装map和reduce
// step 3: Driver ,component job // 封装map和reduce public int run(String[] args) throws Exception { // 1: get confifuration ——> extends Configured ——> new Configuration(); Configuration configuration = getConf(); // 2: create Job Job job = Job.getInstance(configuration, this.getClass().getSimpleName()); // run jar job.setJarByClass(this.getClass()); // 3: set job // input -> map -> reduce -> output // 3.1: input ——> 参数(数据源) Path ipath = new Path(args[0]); FileInputFormat.addInputPath(job,ipath); // 3.2: map job.setMapperClass(WordCountMapper.class); // 设置输出类型 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); // 3.3: reduce job.setReducerClass(WordCountReducer.class); // 设置输出类型 job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // 3.4: output ——> 参数(输出源) Path outPath = new Path(args[1]); FileOutputFormat.setOutputPath(job, outPath); // 4: submit job ——> job 提交 boolean isSuccess = job.waitForCompletion(true); return isSuccess ? 0 :1; }
2.4、运行
// step 4: run program public static void main(String[] args) throws Exception { // 1: get confifuration Configuration configuration = new Configuration(); //int status = new WordCountMapReduce().run(args); // ——> extends Configured ——> implements Tool int status = ToolRunner.run(configuration,// new WordCountMapReduce(),// args); System.exit(status); }
这里需要注意的是、用到ToolRunner类,所以我们的 WordCountMapReduce 类要写成这样:
public class WordCountMapReduce extends Configured implements Tool{
2.5、程序打包
2.6、运行 jar
数据源:/chenzhengyou/mapreduce/wordcount/input/idea.input
输出源:/chenzhengyou/mapreduce/wordcount/output/test01
运行MapReduce:[root@czy-1 hadoop-2.5.0]#
bin/hadoop jar /usr/local/chenzhengyou/hadoop/standalone/hadoop-2.5.0/jars/hadoop-mapreduce.jar /chenzhengyou/mapreduce/wordcount/input/idea.input /chenzhengyou/mapreduce/wordcount/output/test01
查看结果:bin/hdfs dfs -text /chenzhengyou/mapreduce/wordcount/output/test01/par*