遇到的问题:
1、有关text的问题(还是没想明白)
@Override
protectedvoid reduce(TextPair key, Iterable<Text> values, Contextcontext)
throwsIOException, InterruptedException {
//TODO Auto-generated method stub
Textt1 =new Text();
Textk1 = key.getKey1();
System.out.println("k1:" + k1);
Iterator<Text>it = values.iterator();
t1= it.next();
while(it.hasNext()) {
Textt2 = it.next();
//这里有个奇怪的问题:newText(t2.toString() + "\t" +t1.toString())的值是2个t2的值,而不是t1和t2的值
context.write(k1,new Text(t2.toString() + "\t" + t1.toString()));
}
}
修改为如下:
@Override
protectedvoid reduce(TextPair key, Iterable<Text> values, Contextcontext)
throwsIOException, InterruptedException {
//TODO Auto-generated method stub
Stringt1 =null;
Textk1 = key.getKey1();
Iterator<Text>it = values.iterator();
t1= it.next().toString();
while(it.hasNext()) {
Stringt2 = it.next().toString();
context.write(k1,new Text(t2 + "\t" + t1));
}
}
2、job.setGroupingComparatorClass(MyGroupingComparator.class);
这个函数的作用是指定哪些key放到同一个iterator里面,并且输出值中的key就是第一个key,所以其他的key就不存在了,默认情况下,它只会将key完全相同的放到一起,而不是按照partitioner的结果来存放到同一个iterator中,partitioner的结果是按照要求将map的输出到同一个reduce中,并没有产生iterator。所以,在本例中,如果不设置GroupingComparator,输出结果就为:
1003 201004 jkl 201001 abc
1005 201006 pqr 201002 def
如果按照key1设置grouping,就是将key1相同的放到同一个iterator中,就可以得到正确的结果:
1003 201001 abc kaka
1003 201004 jkl kaka
1004 201005 mno da
1005 201002 def jue
1005 201006 pqr jue
1006 201003 ghi zhao
3、test问题
importorg.apache.hadoop.io.LongWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mrunit.mapreduce.MapDriver;
importorg.junit.Test;
publicclassMyMapperTest {
@Test
publicvoidprocessesValidRecord()throwsException {
Textvalue =newText();
value.set("201001 1003 abc");
TextPairtextPair =newTextPair();
textPair.setKey1(newText("1003"));
textPair.setKey2(newText("1"));
TextoutValue =newText("201001"+"\t"+"abc");
newMapDriver<LongWritable, Text, TextPair, Text>()
.withMapper(newMyMapper())
.withInput(newLongWritable(0), value)
.withOutput(textPair,outValue).runTest();
}
}
这个是对MyMapper中有关data.info部分的测试,这里刚开始的时候,在写TextPair类的时候,并没有覆写
@Override
publicinthashCode() {
//TODOAuto-generated method stub
returnInteger.parseInt(this.getKey1().toString())* 157
+Integer.parseInt(this.getKey2().toString());
}
@Override
publicbooleanequals(Object obj) {
//TODOAuto-generated method stub
TextPairo = (TextPair) obj;
returnthis.getKey1().equals(o.key1)&&this.getKey2().equals(o.key2);
}
@Override
publicString toString() {
//TODOAuto-generated method stub
returnthis.getKey1().toString()+"\t"+this.getKey2().toString();
}
上面这3个方法,导致怎么测试都不能成功,即使结果完全一样也没有通过测试,后来,覆写之后就成功了。