一,需求案例
有如下数据格式:年月日 时间 温度
1980-12-12 14:30 25
1980-12-10 15:30 26
1981-12-11 12:30 36
1982-01-01 14:30 22
1980-05-05 15:30 26
1980-05-26 15:30 37
1980-05-06 15:30 36
1980-07-05 15:30 36
1980-07-07 12:30 40
1981-12-15 12:30 16
1982-01-11 14:30 25
1982-01-22 14:30 20
1982-01-21 14:30 19
1982-02-11 14:30 32
1982-02-21 14:30 22
1982-02-18 14:30 19
1982-02-17 14:30 17
1981-12-13 12:30 26
1981-12-14 12:30 19
1981-12-15 12:30 27
1980-07-06 12:30 30
1980-07-15 12:30 33
统计每年每月的最高三个温度,每年一个文件。
部分结果如下所示:
1980-05-26 15:30 37
1980-05-06 15:30 36
1980-05-05 15:30 26
1980-07-07 12:30 40
1980-07-05 15:30 36
1980-07-15 12:30 33
1980-12-10 15:30 26
1980-12-12 14:30 25
二、解决思路
MapReduce的核心是如何设计<K,V>,让其在Map——Reduce的过程中流动起来。
这需要先分析需求和数据格式。
数据格式方面:各个字段间空格分隔,标准的日期时间格式,温度整数。
需求方面:要求按照年和月来进行统计,需要求三个最高温度。
然后根据MapReduce的特点:
1、MapReduce总是会有序
2、默认按照升序排列,如果是Text类型按照字典序,数值按照大小
得到本案例实际需求为:
1、需要自定义数据类型
2、需要关注年和月排序,在相同的情况下按照温度降序,二次排序
3、要把相同的年和月分为一组,自定义分组
4、每年一个文件,需要自定义分区
5、按照默认分组升序排序,分组排序
6、结果要求年月日时间温度都按照原样输出
自定义数据类型设计:
1、因为结果要求原样输出,所以必须设计年月日时间温度的封装。
2、因为排序,需要实现WritableComparable接口,重写CompareTo方法
3、序列化和反序列化因为排序的原因,设计为数值型。
4、重写toString()方法。
根据需求设计<K,V>
1、map输入的k0,v0就使用默认的输入格式:<LongWritable,Text>也就是<行偏移量,行数据>。
2、map出去的<k1,v1>,使用<自定义,IntWriable>,这里我设计的是温度是V1,按照IntWritable输出。
3、Redcue的输出<k2,v2>,使用<自定义,NullWritable>。
整个程序按照流程书写,过程为:自定义类型–Mapper–分区–分组–分组排序–Reducer–驱动类。
三、代码实现
1、自定义类型:
public class KeyPair implements WritableComparable<KeyPair> {
private int year;
private int month;
private int day;
private int hour;
private int min;
private int temp;
public int getYear() {
return year;
}
public void setYear(int year) {
this.year = year;
}
public int getMonth() {
return month;
}
public void setMonth(int month) {
this.month = month;
}
public int getDay() {
return day;
}
public void setDay(int day) {
this.day = day;
}
public int getHour() {
return hour;
}
public void setHour(int hour) {
this.hour = hour;
}
public int getMin() {
return min;
}
public void setMin(int min) {
this.min = min;
}
public int getTemp() {
return temp;
}
public void setTemp(int temp) {
this.temp = temp;
}
@Override
public int compareTo(KeyPair o) {
int y1 = Integer.compare(this.year, o.getYear());
if (y1 == 0) {
int m2 = Integer.compare(this.month, o.getMonth());
if (m2 == 0) {
return Integer.compare(this.temp, o.getTemp());
}
return m2;
}
return y1;
}
@Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeInt(year);
dataOutput.writeInt(month);
dataOutput.writeInt(day);
dataOutput.writeInt(hour);
dataOutput.writeInt(min);
dataOutput.writeInt(temp);
}
@Override
public void readFields(DataInput dataInput) throws IOException {
this.year = dataInput.readInt();
this.month = dataInput.readInt();
this.day = dataInput.readInt();
this.hour = dataInput.readInt();
this.min = dataInput.readInt();
this.temp = dataInput.readInt();
}
@Override
public String toString() {
return year +"-" + month +"-" + day +" " + hour +":" + min +" "+ temp ;
}
}
2、Mapper类,注意KEY, VALUE
public class TempMapper extends Mapper<LongWritable, Text, KeyPair, IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] ss = line.split(" ");
KeyPair myKey = new KeyPair();
myKey.setYear(Integer.parseInt(ss[0].substring(0, 4)));
myKey.setMonth(Integer.parseInt(ss[0].substring(5, 7)));
myKey.setDay(Integer.parseInt(ss[0].substring(8, 10)));
myKey.setHour(Integer.parseInt(ss[1].substring(0, 2)));
myKey.setMin(Integer.parseInt(ss[1].substring(3, 5)));
int temp = Integer.parseInt(ss[2]);
myKey.setTemp(temp);
context.write(myKey, new IntWritable(temp));
}
}
3、自定义分区
public class TempPartition extends Partitioner<KeyPair, IntWritable> {
@Override
public int getPartition(KeyPair keyPair, IntWritable intWritable, int numPartitions) {
return (keyPair.getYear() - 1980) % numPartitions;
}
}
4、自定义分组
public class TempGruop extends WritableComparator {
public TempGruop() {
super(KeyPair.class, true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
KeyPair key = (KeyPair) a;
KeyPair key1 = (KeyPair) b;
int rYear = Integer.compare(key.getYear(), key1.getYear());
if (rYear == 0) {
return Integer.compare(key.getMonth(), key1.getMonth());
}
return rYear;
}
}
5、分组排序,按照年月升序,温度降序
public class TempSort extends WritableComparator {
public TempSort() {
super(KeyPair.class, true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
KeyPair key = (KeyPair) a;
KeyPair key1 = (KeyPair) b;
int rYear = Integer.compare(key.getYear(), key1.getYear());
if (rYear == 0) {
int rMonth = Integer.compare(key.getMonth(), key1.getMonth());
if (rMonth == 0) {
return -Double.compare(key.getTemp(), key1.getTemp());
}
return rMonth;
}
return rYear;
}
}
6、Redcuer类(注意这里的注释,如果你没有重写自定义类的tostring方法的话可以考虑)
public class TempReducer extends Reducer<KeyPair, IntWritable, KeyPair, NullWritable> {
KeyPair myKey = new KeyPair();
@Override
protected void reduce(KeyPair key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum++;
//String kk = key.getYear() + "-" + key.getMonth() + "-" + key.getDay() + " " + key.getHour() + ":" + key.getMin() + " " + value.get();
myKey.setYear(key.getYear());
myKey.setMonth(key.getMonth());
myKey.setDay(key.getDay());
myKey.setHour(key.getHour());
myKey.setMin(key.getMin());
myKey.setTemp(value.get());
if (sum > 3) {
break;
}
context.write(myKey, NullWritable.get());
}
}
}
7、驱动类
public class TempDr {
public static void main(String[] args) {
Configuration conf = new Configuration();
Job job = null;
try {
job = Job.getInstance(conf, "tempcount");
} catch (Exception e) {
e.printStackTrace();
}
job.setJarByClass(TempDr.class);
job.setMapperClass(TempMapper.class);
job.setMapOutputKeyClass(KeyPair.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(TempReducer.class);
job.setOutputKeyClass(KeyPair.class);
job.setOutputValueClass(NullWritable.class);
job.setPartitionerClass(TempPartition.class);
job.setNumReduceTasks(3); //reducer的数量
job.setSortComparatorClass(TempSort.class); //在进入reducer之前进行排序
job.setGroupingComparatorClass(TempGruop.class);
try {
FileInputFormat.addInputPath(job, new Path("/air/air.txt"));
FileOutputFormat.setOutputPath(job, new Path("/airresult"));
boolean f = job.waitForCompletion(true);
} catch (Exception e) {
e.printStackTrace();
}
}
}
四、运行结果