combiner组合器

combiner组合器

1. 作用:作用于Mapper端

  • 【但不能影响最终结果,max、sum行,avg不行】

  • a.降低Mapper端的本地磁盘输出

  • b.减少Reducer端的网络通信

  • 【在Map端做了一次Reduce操作】

2. Temperature案例

​ 【在Mapper后,开启Combiner,意味着在Reducer前执行了一次Reduce操作,可以降低Mapper端的本地磁盘输出以及减少Reduce端的网络通信】

  • TempMapper

    package combiner;
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    public class TempMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
    	String line;
    	String year;
    	String temp;
    	String quality;
    	IntWritable _year = new IntWritable();
    	IntWritable _temp = new IntWritable();
    	int iy;
    	int it;
    	@Override
    	protected void map(LongWritable key, Text value,
    			Mapper<LongWritable, Text, IntWritable, IntWritable>.Context context)
    			throws IOException, InterruptedException {
    		line = value.toString();
    		year = line.substring(15, 19);
    		temp = line.substring(87, 92);
    		quality = line.substring(92, 93);
    		iy = Integer.valueOf(year);
    		it = Integer.valueOf(temp);
    		if(Math.abs(it) != 9999 && quality.matches("[01459]")) {
    			_year.set(iy);
    			_temp.set(it);
    			context.write(_year, _temp);
    		}
    	}
    }
    
  • TempCombiner

    package combiner;
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.mapreduce.Reducer;
    
    public class TempCombiner extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
    	IntWritable max_temp = new IntWritable();
    	@Override
    	protected void reduce(IntWritable key, Iterable<IntWritable> values,
    			Reducer<IntWritable, IntWritable, IntWritable, IntWritable>.Context context)
    			throws IOException, InterruptedException {
    		int max = Integer.MIN_VALUE;
    		for (IntWritable value : values) {
    			max = Math.max(max, value.get());
    		}
    		max_temp.set(max);
    		context.write(key, max_temp);
    	}
    }
    
  • TempReducer

    package combiner;
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.mapreduce.Reducer;
    
    public class TempReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
    	IntWritable max_temp = new IntWritable();
    	@Override
    	protected void reduce(IntWritable key, Iterable<IntWritable> values,
    			Reducer<IntWritable, IntWritable, IntWritable, IntWritable>.Context context)
    			throws IOException, InterruptedException {
    		int max = Integer.MIN_VALUE;
    		for (IntWritable value : values) {
    			max = Math.max(max, value.get());
    		}
    		max_temp.set(max);
    		context.write(key, max_temp);
    	}
    }
    
  • TempDriver

    package combiner;
    
    import java.io.IOException;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class TempDriver {
    	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    		Configuration conf = new Configuration();
    		conf.set("mapreduce.framework.name", "local");
    		Path outPut = new Path("file:///D:/out");
    		FileSystem fs = outPut.getFileSystem(conf);
    		if(fs.exists(outPut)) {
    			fs.delete(outPut, true);
    		}
    		Job job = Job.getInstance(conf);
    		job.setJobName("temp");
    		job.setJarByClass(TempDriver.class);
    		job.setMapperClass(TempMapper.class);
    		job.setCombinerClass(TempCombiner.class);
    		job.setReducerClass(TempReducer.class);
    		job.setMapOutputKeyClass(IntWritable.class);
    		job.setMapOutputValueClass(IntWritable.class);
    		FileInputFormat.addInputPath(job, new Path("file:///D:/temp"));
    		FileOutputFormat.setOutputPath(job, outPut);
    		System.exit(job.waitForCompletion(true) ? 0 : 1);
    	}
    }
    
  • 开启Combiner前:

    File System Counters
    FILE: Number of bytes read=4707063
    FILE: Number of bytes written=1333907
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=0
    HDFS: Number of bytes written=0
    HDFS: Number of read operations=0
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=0
    Map-Reduce Framework
    Map input records=13130
    Map output records=13129
    Map output bytes=105032
    Map output materialized bytes=131302
    Input split bytes=166
    Combine input records=0
    Combine output records=0
    Reduce input groups=2
    Reduce shuffle bytes=131302
    Reduce input records=13129
    Reduce output records=2
    Spilled Records=26258
    Shuffled Maps =2
    Failed Shuffles=0
    Merged Map outputs=2
    GC time elapsed (ms)=4
    Total committed heap usage (bytes)=879230976
    Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
    File Input Format Counters
    Bytes Read=1777168
    File Output Format Counters
    Bytes Written=30

  • 开启Combiner后:

    File System Counters
    FILE: Number of bytes read=4444523
    FILE: Number of bytes written=871031
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=0
    HDFS: Number of bytes written=0
    HDFS: Number of read operations=0
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=0
    Map-Reduce Framework
    Map input records=13130
    Map output records=13129
    Map output bytes=105032
    Map output materialized bytes=32
    Input split bytes=166
    Combine input records=13129
    Combine output records=2
    Reduce input groups=2
    Reduce shuffle bytes=32
    Reduce input records=2
    Reduce output records=2
    Spilled Records=4
    Shuffled Maps =2
    Failed Shuffles=0
    Merged Map outputs=2
    GC time elapsed (ms)=0
    Total committed heap usage (bytes)=868220928
    Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
    File Input Format Counters
    Bytes Read=1777168
    File Output Format Counters
    Bytes Written=30

3. 结果

1901 317
1902 244

【自然键转变成组合键时,导致分组、分区、排序的影响,因此要修改这些操作】

3. MapReduce流程(无Combiner)

  1. InputFormat
  2. InputSplit(切分)
  3. Map()函数
  4. Buffer(环形缓冲区)
  5. Partition(分区)
  6. Sort(排序QuickSort)
  7. Spill to disk(溢写)
  8. Merge on disk(合并)
  9. Sort(Collection.sort())
  10. fetch(通过Http拉取数据)【默认5个线程并发,性能最高】
  11. Merge
  12. Sort(Collection.sort())
  13. Reduce()
  14. OutputFormat
  15. close()

4. MapReduce流程(有Combiner)

  1. InputFormat
  2. InputSplit(切分)
  3. Map()函数
  4. Buffer(环形缓冲区)
  5. Partition(分区)
  6. Sort(排序QuickSort)
  7. 》》combiner()
  8. Spill to disk(溢写)
  9. 》》【溢写文件>=3时,才再做一次combiner操作】
  10. Merge on disk(合并)
  11. Sort(Collection.sort())
  12. fetch(通过Http拉取数据)
  13. 》》【设置了combiner,此处才会再次执行,合并】
  14. Merge
  15. Sort(Collection.sort())
  16. Reduce()
  17. OutputFormat
  18. close()
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值