combiner组合器

最新推荐文章于 2021-02-26 05:22:54 发布

geekLinyi

最新推荐文章于 2021-02-26 05:22:54 发布

阅读量198

点赞数

分类专栏： hadoop 文章标签： MapReduce Combiner

本文链接：https://blog.csdn.net/weixin_43855370/article/details/101722393

版权

hadoop 专栏收录该内容

25 篇文章 0 订阅

订阅专栏

combiner组合器

1. 作用：作用于Mapper端

【但不能影响最终结果，max、sum行，avg不行】
a.降低Mapper端的本地磁盘输出
b.减少Reducer端的网络通信
【在Map端做了一次Reduce操作】

2. Temperature案例

【在Mapper后，开启Combiner，意味着在Reducer前执行了一次Reduce操作，可以降低Mapper端的本地磁盘输出以及减少Reduce端的网络通信】

TempMapper

package combiner;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class TempMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
	String line;
	String year;
	String temp;
	String quality;
	IntWritable _year = new IntWritable();
	IntWritable _temp = new IntWritable();
	int iy;
	int it;
	@Override
	protected void map(LongWritable key, Text value,
			Mapper<LongWritable, Text, IntWritable, IntWritable>.Context context)
			throws IOException, InterruptedException {
		line = value.toString();
		year = line.substring(15, 19);
		temp = line.substring(87, 92);
		quality = line.substring(92, 93);
		iy = Integer.valueOf(year);
		it = Integer.valueOf(temp);
		if(Math.abs(it) != 9999 && quality.matches("[01459]")) {
			_year.set(iy);
			_temp.set(it);
			context.write(_year, _temp);
		}
	}
}

TempCombiner

package combiner;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;

public class TempCombiner extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
	IntWritable max_temp = new IntWritable();
	@Override
	protected void reduce(IntWritable key, Iterable<IntWritable> values,
			Reducer<IntWritable, IntWritable, IntWritable, IntWritable>.Context context)
			throws IOException, InterruptedException {
		int max = Integer.MIN_VALUE;
		for (IntWritable value : values) {
			max = Math.max(max, value.get());
		}
		max_temp.set(max);
		context.write(key, max_temp);
	}
}

TempReducer

package combiner;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;

public class TempReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
	IntWritable max_temp = new IntWritable();
	@Override
	protected void reduce(IntWritable key, Iterable<IntWritable> values,
			Reducer<IntWritable, IntWritable, IntWritable, IntWritable>.Context context)
			throws IOException, InterruptedException {
		int max = Integer.MIN_VALUE;
		for (IntWritable value : values) {
			max = Math.max(max, value.get());
		}
		max_temp.set(max);
		context.write(key, max_temp);
	}
}

TempDriver

package combiner;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class TempDriver {
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();
		conf.set("mapreduce.framework.name", "local");
		Path outPut = new Path("file:///D:/out");
		FileSystem fs = outPut.getFileSystem(conf);
		if(fs.exists(outPut)) {
			fs.delete(outPut, true);
		}
		Job job = Job.getInstance(conf);
		job.setJobName("temp");
		job.setJarByClass(TempDriver.class);
		job.setMapperClass(TempMapper.class);
		job.setCombinerClass(TempCombiner.class);
		job.setReducerClass(TempReducer.class);
		job.setMapOutputKeyClass(IntWritable.class);
		job.setMapOutputValueClass(IntWritable.class);
		FileInputFormat.addInputPath(job, new Path("file:///D:/temp"));
		FileOutputFormat.setOutputPath(job, outPut);
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

开启Combiner前：

File System Counters
FILE: Number of bytes read=4707063
FILE: Number of bytes written=1333907
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=0
HDFS: Number of read operations=0
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Map-Reduce Framework
Map input records=13130
Map output records=13129
Map output bytes=105032
Map output materialized bytes=131302
Input split bytes=166
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=131302
Reduce input records=13129
Reduce output records=2
Spilled Records=26258
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=4
Total committed heap usage (bytes)=879230976
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1777168
File Output Format Counters
Bytes Written=30
开启Combiner后：

File System Counters
FILE: Number of bytes read=4444523
FILE: Number of bytes written=871031
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=0
HDFS: Number of read operations=0
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Map-Reduce Framework
Map input records=13130
Map output records=13129
Map output bytes=105032
Map output materialized bytes=32
Input split bytes=166
Combine input records=13129
Combine output records=2
Reduce input groups=2
Reduce shuffle bytes=32
Reduce input records=2
Reduce output records=2
Spilled Records=4
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=0
Total committed heap usage (bytes)=868220928
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1777168
File Output Format Counters
Bytes Written=30

3. 结果

1901 317
1902 244

【自然键转变成组合键时，导致分组、分区、排序的影响，因此要修改这些操作】

3. MapReduce流程（无Combiner）

InputFormat
InputSplit（切分）
Map()函数
Buffer（环形缓冲区）
Partition（分区）
Sort（排序QuickSort）
Spill to disk（溢写）
Merge on disk（合并）
Sort(Collection.sort())
fetch（通过Http拉取数据）【默认5个线程并发，性能最高】
Merge
Sort(Collection.sort())
Reduce()
OutputFormat
close()

4. MapReduce流程（有Combiner）

InputFormat
InputSplit（切分）
Map()函数
Buffer（环形缓冲区）
Partition（分区）
Sort（排序QuickSort）
》》combiner()
Spill to disk（溢写）
》》【溢写文件>=3时，才再做一次combiner操作】
Merge on disk（合并）
Sort(Collection.sort())
fetch（通过Http拉取数据）
》》【设置了combiner，此处才会再次执行，合并】
Merge
Sort(Collection.sort())
Reduce()
OutputFormat
close()

geekLinyi

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
combiner组合器

combiner组合器1. 作用：作用于Mapper端【但不能影响最终结果，max、sum行，avg不行】a.降低Mapper端的本地磁盘输出b.减少Reducer端的网络通信【在Map端做了一次Reduce操作】2. Temperature案例【在Mapper后，开启Combiner，意味着在Reducer前执行了一次Reduce操作，可以降低Mapper端...
复制链接

扫一扫