Hadoop 序列化

最新推荐文章于 2024-10-13 20:28:02 发布

weixin_30888027

最新推荐文章于 2024-10-13 20:28:02 发布

阅读量42

点赞数

文章标签：大数据 java

原文链接：http://www.cnblogs.com/tyler-jin/p/10424621.html

版权

Hadoop提供了Writable以提供序列化功能，write方法用于将数据写入流中，readFields方法用于从流中读取数据

public interface Writable {
  void write(DataOutput out) throws IOException;
  void readFields(DataInput in) throws IOException;
}

Hadoop对于Java常用类都实现了对应的Writable方法，以BooleanWritable为例，对于基础的类型，只是简单的写入与读取

  @Override
  public void readFields(DataInput in) throws IOException {
    value = in.readBoolean();
  }

  @Override
  public void write(DataOutput out) throws IOException {
    out.writeBoolean(value);
  }

BooleanWritable实现了WritableComparable接口，为其提供比较的能力

同时Hadoop定义了RawComparator接口，以提供比较流中未被反序列化的数据的能力，提高比较的效率

public interface RawComparator<T> extends Comparator<T> {
  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);
}

Hadoop提供了对RawComparator的基础实现WritableComparator

  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
    try {
      buffer.reset(b1, s1, l1);                   // parse key1
      key1.readFields(buffer);
      
      buffer.reset(b2, s2, l2);                   // parse key2
      key2.readFields(buffer);
      
      buffer.reset(null, 0, 0);                   // clean up reference
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
    
    return compare(key1, key2);                   // compare them
  }

而在BooleanWritable类中，实现了继承WritableComparator的Comparator，并通过static方法注册到WritableComparator中

  public static class Comparator extends WritableComparator {
    public Comparator() {
      super(BooleanWritable.class);
    }

    @Override
    public int compare(byte[] b1, int s1, int l1,
                       byte[] b2, int s2, int l2) {
      return compareBytes(b1, s1, l1, b2, s2, l2);
    }
  }
  // 注册
  static {
    WritableComparator.define(BooleanWritable.class, new Comparator());
  }

可以通过WritableComparator.get方法根据具体的获取到对应比较器

  public static WritableComparator get(
      Class<? extends WritableComparable> c, Configuration conf) {
    WritableComparator comparator = comparators.get(c);
    if (comparator == null) {
      forceInit(c);
      comparator = comparators.get(c);
      if (comparator == null) {
        comparator = new WritableComparator(c, conf, true);
      }
    }
    ReflectionUtils.setConf(comparator, conf);
    return comparator;
  }

Hadoop提供了通用的ObjectWritable类

private Class declaredClass;
private Object instance;
private Configuration conf;

declaredClass：对象的类
instance：对象实例
conf：对象运行时的配置
ObjectWritable可以用于Hadoop的远程方法调用以及序列化不同的对象到同一个字段，但是ObjectWritable作为一种通用实现，会将类声明作为字符串添加到每一个Key-Value对中，造成较大的性能损失，因此Hadoop提供了GenericWritable
对于少量类型数据，可以使用GenericWritable，预先缓存下所有的类型，通过类型编号对类型进行查找，以降低网络传输，但是GenericWritable也有明显的缺陷，即在类型不确定或者类型特别多的时候并不适用

转载于:https://www.cnblogs.com/tyler-jin/p/10424621.html