今天写了一个writable,其代码如下:
public class CFWritable implements Writable {
private IntWritable mark ;//标识位
private List<ItemWritable> items ;
public CFWritable(){
mark = new IntWritable(0);
items = new ArrayList<ItemWritable>(2);
}
public CFWritable(int mark,List<ItemWritable> items){
this.mark = new IntWritable(mark);
this.items = items ;
}
@Override
public void write(DataOutput out) throws IOException {
out.writeInt(items.size());
mark.write(out);
for(ItemWritable item:items){
item.write(out);
}
}
@Override
public void readFields(DataInput in) throws IOException {
int itemsSize = in.readInt();
mark.readFields(in);
for(int i = 0 ; i < itemsSize; i ++){
ItemWritable item = new ItemWritable();
item.readFields(in);
items.add(item);
}
}
public int getMark() {
return mark.get();
}
public void setMark(int mark) {
this.mark = new IntWritable(mark);
}
public List<ItemWritable> getItems() {
return items;
}
public void setItems(List<ItemWritable> items) {
this.items = items;
}
}
上面的代码在跑集群任务的时候,发现Reduce到66%这个数后就基本上不动了。排查一番,感觉类中的items的个数不会超过100个,那么在计算的时候不应该慢下来。为了验证想法,自己在程序中打印了一些信息,其中就包含items的size;打印出来的结果令我不解,items的size就是前面的累计。
仔细排查代码后,突然在脑中一闪:在ruduce的时候,mr为了加快速度(不要重新new)就复用了writable的类,而我这里却没有任何机制清空items,所以这里会一直在items 的后面添加数据。
问题找到后,修改代码如下:
public class CFWritable implements Writable {
private IntWritable mark ;//标识位
private List<ItemWritable> items ;
public CFWritable(){
mark = new IntWritable(0);
items = new ArrayList<ItemWritable>(2);
}
public CFWritable(int mark,List<ItemWritable> items){
this.mark = new IntWritable(mark);
this.items = items ;
}
@Override
public void write(DataOutput out) throws IOException {
out.writeInt(items.size());
mark.write(out);
for(ItemWritable item:items){
item.write(out);
}
}
@Override
public void readFields(DataInput in) throws IOException {
<span style="color:#ff0000;">clear();//先清除上次给的值</span>
int itemsSize = in.readInt();
mark.readFields(in);
for(int i = 0 ; i < itemsSize; i ++){
ItemWritable item = new ItemWritable();
item.readFields(in);
items.add(item);
}
}
public void clear(){
items.clear();
}
public int getMark() {
return mark.get();
}
public void setMark(int mark) {
this.mark = new IntWritable(mark);
}
public List<ItemWritable> getItems() {
return items;
}
public void setItems(List<ItemWritable> items) {
this.items = items;
}
}