combine函数把一个map函数产生的<key,value>对(多个key, value)合并成一个新的<key2,value2>.
将新的<key2,value2>作为输入到reduce函数中。其格式与reduce函数相同。
例如:将3个文件中的数值相加。
file1:
1 2 3
file2: 4 5 6
file3: 7 8 9
public class MyMapre06
{
public static class Map extends MapReduceBase
implements
Mapper<LongWritable, Text, Text, Text> {
private Text word = new Text();
private Text val = new Text();
public void
map(LongWritable key, Text value,
OutputCollector<Text, Text> output,
Reporter reporter)
throws IOException
{
String line
= value.toString();
String bignum = new StringBuffer(line).toString();
word.set("1");
val.set(bignum);
output.collect(word,
val);
}
}
public static
class Reduce extends MapReduceBase implements
Reducer<Text, Text, Text, Text> {
public void
reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter
reporter)
throws IOException {
BigInteger num = BigInteger.valueOf(0);
String tmp = new
String();
Text v =
new Text();
while (values.hasNext()) // 计算同一个key下,所有value的总和
{
tmp =
values.next().toString();
num = num.add(new
BigInteger(tmp));
}
String res =
new StringBuffer(num.toString()).toString();
v.set(res);
output.collect(key, v); //
收集reduce输出结果
}
}
public static
class Combiner extends MapReduceBase implements
Reducer<Text, Text, Text, Text>
{
public void reduce(Text key,
Iterator<Text> values,
OutputCollector<Text, Text> output,
Reporter reporter)
throws IOException {
BigInteger num = BigInteger.valueOf(0);
String tmp = new
String();
Text v =
new Text();
while (values.hasNext()) // 计算同一个key下,所有value的总和
{
tmp =
values.next().toString();
num = num.add(new
BigInteger(tmp));
}
v.set(num.toString());
output.collect(key, v); // 收集reduce输出结果
}
}
public
static void main(String[] args) throws Exception {
JobConf conf = new
JobConf(MyMapre06.class);
conf.setJobName("Sum");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Combiner.class);
//使用combiner函数
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new
Path(args[0]));
FileOutputFormat.setOutputPath(conf, new
Path(args[1]));
JobClient.runJob(conf);
}
}
经过 Combiner函数, file1
为 6, file2 为 15, file3 为 24
进过 Reduce函数, 输出 key 为 1 value 为 35
hadoop中的combine函数使用
最新推荐文章于 2024-02-13 00:20:41 发布