Map Reducer的2个实例,涵括分区、二次排序、Combiner、自定义数据类型

如果本文对您有所帮助,可以点一下赞👍

本文只是学习笔记,欢迎指错,转载标明出处

实例1

假设一个年级有两个班级,数据分别在class1.csv和class2.csv中,求该年级的数学成绩平均值。数据第一列为学号,第二列为数学成绩。 

要求,必须使用Combiner类,且最后输出一行数据,该行仅有一个平均值。

class1.csv、class2.csv部分截图

结果截图

工程目录

Exper2Mapper.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Exper2Mapper  extends Mapper<LongWritable, Text, NullWritable, IntWritable> {
	private IntWritable result = new IntWritable();  
	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, NullWritable, IntWritable>.Context context)
			throws IOException, InterruptedException {
		
		String str[]=value.toString().split(",");
		
		
		result.set(Integer.parseInt(str[1]));
		context.write(NullWritable.get(),result );
		
		
	}

}

 

Exper2Reducer.java

 

import java.io.IOException;


import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Exper2Reducer extends Reducer<NullWritable, IntWritable,NullWritable, IntWritable>{
	
	private IntWritable result = new IntWritable();  
	//protected void reduce(NullWriter key, Iterable<IntWritable> value,
			//Reducer<NullWritable, IntWritable, NullWritable, IntWritable>.Context context) throws IOException, InterruptedException {
	//@Override
	//protected void reduce(NullWriter key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { 	
	
	@Override
	protected void reduce(NullWritable key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { 
	Integer sum=0;
		 Integer avg=0;
		 Integer count=0;		 		 
		 for(IntWritable item :  values) {
			 count+=1;
			 sum+=item.get();
			 
		 }	 
		 avg=sum/count;
		 
		 result.set(avg);
		 
		 context.write(NullWritable.get(), result);
		 
	}

}

 ExperA2Driver.java

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class ExperA2Driver {
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf=new Configuration();
		Job job=Job.getInstance(conf);
		
		job.setJarByClass(ExperA2Driver.class);
		
		job.setMapperClass(Exper2Mapper.class);

        //Combiner使用Reducer的类		
		job.setCombinerClass(Exper2Reducer.class);
		
		job.setReducerClass(Exper2Reducer.class);
		
		job.setOutputKeyClass(NullWritable.class);
		
		job.setOutputValueClass(IntWritable.class);
		
		job.setNumReduceTasks(1);
		
		FileInputFormat.addInputPath(job,  new Path("hdfs://localhost:9000/user/hadoop/data"));
		
		FileOutputFormat.setOutputPath(job,  new Path("hdfs://localhost:9000/output/exper2"));
		
		System.exit(job.waitForCompletion(true) ? 0 : 1);
		
	}

}

 

实例2

 

假设有一个服务器每天都记录同一个网站的访问量数据,主要是该网站下所有页面中的最大访问量和最小访问量,数据存储在下面三个文件中。

数据格式如下(记录时不具体到天):

 

说明:第一列为某年某月的时间信息,第二列为该月内某天观测到的最大访问量,第三列为该月内同一天观测到的最小访问量。

程序设计要求如下:

要求①:最后输出网站每个月内的最大值、最小值,一个月一行数据。

  如图中2017-07最大值为900,最小值为100;2017-08最大值为560,最小值为200

输出格式如下

2017-08 560 200

2017-07 900 100

要求②必须自定义一个数据类型,包含某天观测到的最大最小访问量。自定义类型

要求③要求自定义分区函数,2017年的数据全部规约到一个reducer处理,2018年的数据全部规约到另一个reducer处理。分区

要求④要求同一年的数据按月份降序排序。二次排序

如:

2017-08 560 200

2017-07 900 100

结果截图

 

工程目录

 

Exper3Container.java

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;

public class Exper3Container implements Writable {
	
	private int max;
	private int min;
	public int getMax() {
		return max;
	}
	public void setMax(int max) {
		this.max = max;
	}
	public int getMin() {
		return min;
	}
	public void setMin(int min) {
		this.min = min;
	}
	
	@Override
	public void readFields(DataInput in) throws IOException{
		max = in.readInt();
		min = in.readInt();
	}
	
	@Override
	public void write(DataOutput out) throws IOException{
		out.writeInt(max);
		out.writeInt(min);
	}
	
	@Override
	public String toString() {
		// TODO Auto-generated method stub
		return max+" "+min;
	}

}

 

Exper3Driver.java

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;



public class Exper3Driver {
	
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		
		Configuration conf=new Configuration();
		Job job=Job.getInstance(conf);
		
		job.setJarByClass(Exper3Driver.class);
		
		job.setMapperClass(Exper3Mapper.class);
		
		//job.setCombinerClass(Exper3Reducer.class);
		
		job.setNumReduceTasks(2);
		
		job.setSortComparatorClass(Exper3Sort.class);
		
		job.setReducerClass(Exper3Reducer.class);

		
		job.setOutputKeyClass(Text.class);
		
		job.setOutputValueClass(Exper3Container.class);
		
		job.setPartitionerClass(Exper3Partitioner.class);
			
		
		FileInputFormat.addInputPath(job,  new Path("hdfs://localhost:9000/user/hadoop/data/Exper3"));
		
		FileOutputFormat.setOutputPath(job,  new Path("hdfs://localhost:9000/output/exper3"));
		
		System.exit(job.waitForCompletion(true) ? 0 : 1);
		
	}

}

Exper3Mapper.java

import java.io.IOException;

public class Exper3Mapper extends Mapper<LongWritable,Text, Text, Exper3Container>{
	
	private Exper3Container  container=new Exper3Container(); 
	
	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Exper3Container>.Context context)
			throws IOException, InterruptedException {
		
		String str[]=value.toString().split(" ");
		
		container.setMax(Integer.parseInt(str[1]));
		container.setMin(Integer.parseInt(str[2]));
		Text text=new Text();
		text.set(str[0]);
		
		context.write(text, container);
		
		
	}

}

Exper3Partitioner.java

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

public class Exper3Partitioner extends Partitioner<Text, Exper3Container>{

	@Override
	public int getPartition(Text key, Exper3Container value, int numReduceTasks) {
		
		
		String[] str=key.toString().split("-");
		
		int num=Integer.parseInt(str[0]);
		return  (num& Integer.MAX_VALUE) % numReduceTasks;

	
	}

}

 

Exper3Reducer.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;

public class Exper3Reducer extends Reducer<Text, Exper3Container, Text ,Exper3Container> {
	
	private Exper3Container  container=new Exper3Container(); 
	
	@Override
	protected void reduce(Text key, Iterable<Exper3Container> values,Context context)
			throws IOException, InterruptedException {
		
		int max=0;
		int min=Integer.MAX_VALUE;
		
		for (Exper3Container item : values) {
			
			if(item.getMax()>max) max=item.getMax();
			if(item.getMin()<min) min=item.getMin();
			
		}
		
		container.setMax(max);
		container.setMin(min);
		
		context.write(key, container);
		
		
	}

}

 

Exper3Sort.java


import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;

public class Exper3Sort extends WritableComparator{

	public Exper3Sort() {
		super(Text.class,true);
	}
	
	@Override
	public int compare(WritableComparable a, WritableComparable b) {
		
		Text t1 =(Text)a;
		Text t2 =(Text)b;
		
		String[] str1 =t1.toString().split("-");	
		int num1=Integer.parseInt(str1[1]);
		
		String[] str2 =t2.toString().split("-");
		int num2=Integer.parseInt(str2[1]);
		
		
		if(num1>num2) return -1;
		else if(num2>num1) return 1;
		else return 0;

	}

}

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值