hadoop关于reduce方法里面迭代value问题

最新推荐文章于 2022-11-16 18:39:14 发布

qq_42506914

最新推荐文章于 2022-11-16 18:39:14 发布

阅读量683

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/qq_42506914/article/details/86168718

版权

hadoop 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

reducejoin 案例

其中需要在reduce方法里面遍历 value 并保存在一个集合里面

public class ReduceJoinReducer extends Reducer<IntWritable,OrderDetailBean,OrderDetailBean, NullWritable> {
    @Override
    protected void reduce(IntWritable key, Iterable<OrderDetailBean> values, Context context) throws IOException, InterruptedException {

        List<OrderDetailBean> orders = new ArrayList<>();
        String pname =null;
        OrderDetailBean detailBean = new OrderDetailBean();
        for (OrderDetailBean value : values) {

            if("order".equals(value.getTable())){
                try {
                    //为什么不直接把value add到orders集合里面
                    //直接添加会造成集合里存了都是一样的值，为最后一次添加的数据
                    BeanUtils.copyProperties(detailBean,value);
                } catch (IllegalAccessException e) {
                    e.printStackTrace();
                } catch (InvocationTargetException e) {
                    e.printStackTrace();
                }
                orders.add(detailBean);
            }
            if("pd".equals(value.getTable())){
                pname = value.getPname();
            }
        }

        for (OrderDetailBean order : orders) {
            order.setPname(pname);
            context.write(order,NullWritable.get());
        }
    }
}

里面使用了BeanUtils 来copy属性到一个临时对象里面，再添加到集合中；

为什么不直接添加 order.add(value) 中呢。

如果这样最后集合里面的值都是一样的，这个value只是个引用，地址值是不变的，但是里面的值是每次迭代变化的。

造成最后保存的都是最后一次的value的值（整个list）

这种遍历其实调用的就是

values.next()方法

public boolean nextKey() throws IOException,InterruptedException {
    while (hasMore && nextKeyIsSame) {
      nextKeyValue();
    }
    if (hasMore) {
      if (inputKeyCounter != null) {
        inputKeyCounter.increment(1);
      }
      return nextKeyValue();
    } else {
      return false;
    }
  }

调用了 nextKeyValue() 方法

 private RawKeyValueIterator input;
 private KEYIN key;                   
 private VALUEIN value; 

public boolean nextKeyValue() throws IOException, InterruptedException {
    if (!hasMore) {
      key = null;
      value = null;
      return false;
    }
    firstValue = !nextKeyIsSame;
    DataInputBuffer nextKey = input.getKey();
    currentRawKey.set(nextKey.getData(), nextKey.getPosition(), 
                      nextKey.getLength() - nextKey.getPosition());
    buffer.reset(currentRawKey.getBytes(), 0, currentRawKey.getLength());
    key = keyDeserializer.deserialize(key);
   //input    是 private RawKeyValueIterator input;可以获取kv信息的迭代器
    DataInputBuffer nextVal = input.getValue();
    //处理获取的value
    buffer.reset(nextVal.getData(), nextVal.getPosition(), nextVal.getLength()
        - nextVal.getPosition());
//
    value = valueDeserializer.deserialize(value);

    。。。。。
    inputValueCounter.increment(1);
    return true;
  }

reset方法

public void reset(byte[] input, int start, int length) {
      this.buf = input;
      this.count = start+length;
      this.mark = start;
      this.pos = start;
    }