需求分析:统计每个年月下,温度最高的前两天.
- 按年月分组
- 再按温度排序取前两个
测试数据
2020-01-02 10:22:22 1c
2020-01-03 10:22:22 2c
2020-01-04 10:22:22 100c
2020-01-04 10:22:22 4c
2020-02-01 10:22:22 7c
2020-02-02 10:22:22 9c
2020-02-03 10:22:22 11c
2020-02-04 10:22:22 1c
2019-01-02 10:22:22 1c
2019-01-03 10:22:22 2c
2019-01-04 10:22:22 4c
2019-02-01 10:22:22 7c
2019-02-04 10:22:22 72c
2019-02-02 10:22:22 9c
2018-02-03 10:22:22 111c
2018-02-04 10:22:22 1c
mapper类
public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] split = value.toString().split("\t"); //分为两个 split[1]为 1c
String s =split[0];
int i = s.lastIndexOf("-");
String substring = s.substring(0,i); //取年月,自动分组
context.write(new Text(substring),new Text(split[0]+" "+split[1]));
}
}
reduce类
public static class MyReducer extends Reducer<Text, Text, Text,Text> {
@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
Map<String,Integer> map=new HashMap<>();
Iterator<Text> iterator = values.iterator();
while (iterator.hasNext()){
String[] split = iterator.next().toString().split("\\s+");//分为3个
int index = split[2].lastIndexOf("c"); //取到c的index
int temp = Integer.parseInt(split[2].substring(0,index)); //取出数并转为int
map.put(split[0]+split[1],temp);
}
//这里将map.entrySet()转换成list,里面存到是相同年月的map
List<Map.Entry<String,Integer>> list=new LinkedList<Map.Entry<String,Integer>>(map.entrySet());
Collections.sort(list,new Comparator<Map.Entry<String, Integer>>() { //排序
@Override
public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
return (int)(o2.getValue()-o1.getValue()); //降序排
}
});
for (int i=0;i<2;i++){//排好序后输出最大的两个值
context.write(new Text(list.get(i).getKey()),new Text(list.get(i).getValue()+"c"));
}
}
}
出现的问题:reduce分割空格时split(“\t”)失效,换了个正则的。
执行
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();//配置运行的配置
Job job = Job.getInstance(conf, "number wendu");
job.setJarByClass(MapReduce_wendu.class); //运行的jar包
job.setMapperClass(MyMapper.class); //mapper操作
job.setReducerClass(MyReducer.class); //只要合并一次就行
job.setOutputKeyClass(Text.class); //输出的键类型 与reduce的key保持一直
job.setOutputValueClass(Text.class); //输出的值类型 与reduce的result保持一直
FileInputFormat.addInputPath(job,new Path("C:\\Users\\q'q'w\\Desktop\\MapReduce\\input"));//提前存在的
Util_lgy.deleteFile(Util_lgy.file); //删output
FileOutputFormat.setOutputPath(job,new Path("C:\\Users\\q'q'w\\Desktop\\MapReduce\\output"));//不存在,运行后要删了output才能执行
// FileInputFormat.addInputPath(job, new Path(args[0]));
// Path path=new Path(args[1]);
// Util_lgy.delete(path); //删hdfs里的
// FileOutputFormat.setOutputPath(job, path);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
输出结果:
2018-02-0310:22:22 111c
2018-02-0410:22:22 1c
2019-01-0410:22:22 4c
2019-01-0310:22:22 2c
2019-02-0410:22:22 72c
2019-02-0210:22:22 9c
2020-01-0410:22:22 100c
2020-01-0310:22:22 2c
2020-02-0310:22:22 11c
2020-02-0210:22:22 9c