2020-12-04

编写WordCountMapper类,完成对单词的切分处理,并以(k,v)的形式输出到Reduce阶段
让【WordCountMapper】继承类Mapper同时指定需要的参数类型,根据业务逻辑修改map类的内容如下:
package com.simple;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, NullWritable, LongWritable> {
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, NullWritable, LongWritable>.Context context)
throws IOException, InterruptedException {
//获取value的字符串
String valueString = value.toString();
//对字符串进行分割
String wArr[] = valueString.split(’ ');
//map输出(k,v)
context.write(NullWritable.get(), new LongWritable(wArr.length));
}
}


完成WordCountReducer类的编写,主要是对单词个数的统计。
在项目【src】目录下指定的包名【com.simple】下右键点击,新建一个类名为【WordCountReducer】并继承Reducer类,然后添加该类中的代码内容如下所示:

package com.simple;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<NullWritable, LongWritable, NullWritable, LongWritable> {
@Override
protected void reduce(NullWritable key, Iterable v2s,
Reducer<NullWritable, LongWritable, NullWritable, LongWritable>.Context context)
throws IOException, InterruptedException {
Iterator it = v2s.iterator();
// 定义一个sum用来记录总行数
long sum = 0;
//通过迭代器处理,进行总行数的统计
while (it.hasNext()) {
sum += it.next().get();
}
context.write(NullWritable.get(), new LongWritable(sum));
}
}


创建TestMapReducer类,主要是对Map、Reduce的编写的运行调用
  在项目【src】目录下指定的包名【com.simple】下右键点击,新建一个测试主类名为【TestMapReducer】。测试代码如下所示:

package com.simple;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class TestMapReducer {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set(‘fs.defaultFS’, ‘hdfs://localhost:9000’);
// 获取一个Job实例
Job job = Job.getInstance(conf);
//设置主类
job.setJarByClass(TestMapReducer.class);
//设置Mapper类和Reducer类
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
//设置map阶段和reduce阶段的输出类型
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(LongWritable.class);
// 设置输入输出路径
FileInputFormat.setInputPaths(job, new Path(’/data/dataset/EnglishWordsCount.txt’));
FileOutputFormat.setOutputPath(job, new Path(’/data/dataset/output/’));
//提交任务
job.waitForCompletion(true);
}
}

按照以上的步骤,把mapper和reducer阶段以及测试代码编写完毕之后,选中测试类【TestMapReducer】,右键点击选择【Run as】->【Java Application】,待控制台红色按钮变暗,无报错信息,控制台打印如下图所示,则程序运行成功。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值