一、数据去重--Mapper类:
public class DataMapper extends Mapper<LongWritable,Text,Text,Text> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] split = value.toString().split(" ");
context.write(new Text(split[0]),new Text(split[1]+" "+split[2]));
}
}
复制代码
二、Reducer类:
public class DataReducer extends Reducer<Text,Text,Text,Text> {
@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
StringBuffer d = new StringBuffer();
for (Text value : values) {
d.append(value);
}
context.write(key,new Text(d.toString()));
}
}
复制代码
三、测试类:
public class Main {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(Main.class);
job.setMapperClass(DataMapper.class);
job.setReducerClass(DataReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
File file = new File("E:\\fengzeze\\输出");
if (file.exists()) {
FileUtils.deleteDirectory(file);
}
FileInputFormat.setInputPaths(job, new Path("E:\\fengzeze\\数据"));
FileOutputFormat.setOutputPath(job, new Path("E:\\fengzeze\\输出"));
job.setNumReduceTasks(1);
boolean b = job.waitForCompletion(true);
System.exit(b ? 0 : 1);
}
}
复制代码
设计思路:
[数据去重]的最终目标是让[原始数据]中[出现次数][超过一次]的数据在输出文件中只出现一次。我们自然而然会想到将同一个数据的所有记录都交给一台reduce机器,无论这个数据出现多少次,只要在最终结果中输出一次就可以了。具体就是[reduce的输入]应该以数据作为key,而对value-list则没有要求。当reduce接收到一个<key,value-list>时就直接将key复制到输出的key中,并将value设置成空值。
==========================================================================
一、数据排序加序号--Mapper类:
public class DataSorting extends Mapper<LongWritable,Text,Text,NullWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
context.write(new Text(value.toString()),NullWritable.get());
}
}
复制代码
二、Reducer类:
用for循环递增1,在Key位置上输出i+1。
public class DataReducer extends Reducer<Text,NullWritable,Text,Text> {
List<Integer> lien = new ArrayList<>();
@Override
protected void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
lien.add(Integer.parseInt(key.toString()));
}
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
Collections.sort(lien);
for (int i = 0; i < lien.size(); i++) {
context.write(new Text(String.valueOf(i+1)), new Text(String.valueOf(lien.get(i))));
}
}
}
复制代码
三、测试类:
public class Test {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(Test.class);
job.setMapperClass(DataSorting.class);
job.setReducerClass(DataReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
File file = new File("E:\\fengzeze\\数据排序\\输出");
if (file.exists()){
FileUtils.deleteDirectory(file);
}
FileInputFormat.setInputPaths(job,new Path("E:\\fengzeze\\数据排序\\数据"));
FileOutputFormat.setOutputPath(job,new Path("E:\\fengzeze\\数据排序\\输出"));
job.setNumReduceTasks(1);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
复制代码
设计思路:
但是在使用之前首先需要了解它的默认排序规则。它是按照key值进行排序的,如果key为封装int的IntWritable类型,那么MapReduce按照数字大小对key排序,如果key为封装为String的Text类型,那么MapReduce按照字典顺序对字符串排序。没有用Combiner。
==========================================================================
一、平均成绩--Mapper类:
public class AMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] split = value.toString().split(" ");
context.write(new Text(split[0]),new IntWritable(Integer.parseInt(split[1])));
}
}
复制代码
二、Reducer类:
public class AReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int courseCount = 0;
//定义科目的数量
int sum = 0;
//定义总成绩
int average = 0;
//定义平均值
for (IntWritable value : values) {
sum += value.get();
courseCount ++;
}
average = sum / courseCount;
context.write(new Text(key),new IntWritable(average));
}
}
复制代码
测试类:
public class AvTest {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(AvTest.class);
job.setMapperClass(AMapper.class);
job.setReducerClass(AReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
File file = new File("E:\\fengzeze\\平均成绩\\输出");
if (file.exists()){
FileUtils.deleteDirectory(file);
}
FileInputFormat.setInputPaths(job,new Path("E:\\fengzeze\\平均成绩\\数据"));
FileOutputFormat.setOutputPath(job,new Path("E:\\fengzeze\\平均成绩\\输出"));
job.setNumReduceTasks(1);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
复制代码