hadoop 序列化框架

最新推荐文章于 2023-05-05 18:56:49 发布

chuntongshen5917

最新推荐文章于 2023-05-05 18:56:49 发布

阅读量202

点赞数

文章标签：大数据 java

原文链接：https://my.oschina.net/zwf007/blog/891064

版权

hadoop 序列化框架

[toc]

序列化，反序列化

序列化：按照一定格式把一个对象编码成一个字节流，可以存储在硬盘，可以在网络中传递，可以拷贝，克隆等，反序列化：把存入字节流的对象，解析成一个对象。

java 序列化

序列化接口： Serializable 输入输出： ObjectInputStream 和 ObjectOutputStream 的 readObject() 和writeObject() 序列化内容：对象类，类签名，非静态成员变量值，所有父类对象，其他引用的对象等

hadoop序列化

Writable接口

InterfaceAudience.Public
InterfaceStability.Stable
public interface Writable {
  /** 
   * 输出对象到数据流中
   */
  void write(DataOutput out) throws IOException;

  /** 
   * Deserialize the fields of this object from <code>in</code>.  
   * 从流中读取对象，为了效率，尽可能复用现有对象
   */
  void readFields(DataInput in) throws IOException;
}

介绍几个重要的接口：

WritableComparable ：有比较能力的序列化接口，同时继承了writable 和 comparable 接口， ByteWritable,IntWritable,DoubleWritable 等java 基本类型对应的Writable 都继承了这个接口 RawComparator : 允许从流中读取未被反序列化的对象进行比较。 WritableComparator ： RawComparator 的通用实现类

例子 ObjectWirtable 类

主要成员变量

  //需要序列化，反序列化的类名
  private Class declaredClass;
  //被封装的对象的实例
  private Object instance;
  private Configuration conf;

序列化方法

      
      
    @Override
    public void write(DataOutput out) throws IOException {
        writeObject(out, instance, declaredClass, conf);
    }
    public static void writeObject(DataOutput out, Object instance,
                                 Class declaredClass, 
                                 Configuration conf) throws IOException {
        writeObject(out, instance, declaredClass, conf, false);
    }

    public static void writeObject(DataOutput out, Object instance,
        Class declaredClass, Configuration conf, boolean allowCompactArrays) 
    throws IOException {
    //判断实例是不是为null
    if (instance == null) {                       // null
      instance = new NullInstance(declaredClass, conf);
      declaredClass = Writable.class;
    }
    //判断是不是基本类型的数组
    // Special case: must come before writing out the declaredClass.
    // If this is an eligible array of primitives,
    // wrap it in an ArrayPrimitiveWritable$Internal wrapper class.
    if (allowCompactArrays && declaredClass.isArray()
        && instance.getClass().getName().equals(declaredClass.getName())
        && instance.getClass().getComponentType().isPrimitive()) {
      instance = new ArrayPrimitiveWritable.Internal(instance);
      declaredClass = ArrayPrimitiveWritable.Internal.class;
    }

    UTF8.writeString(out, declaredClass.getName()); // always write declared
    
    if (declaredClass.isArray()) {     // non-primitive or non-compact array
      int length = Array.getLength(instance);
      out.writeInt(length);
      for (int i = 0; i < length; i++) {
        writeObject(out, Array.get(instance, i),
            declaredClass.getComponentType(), conf, allowCompactArrays);
      }
      
    } else if (declaredClass == ArrayPrimitiveWritable.Internal.class) {
      ((ArrayPrimitiveWritable.Internal) instance).write(out);
      
    } else if (declaredClass == String.class) {   // String
      UTF8.writeString(out, (String)instance);
     //判断是否是基本类型
    } else if (declaredClass.isPrimitive()) {     // primitive type

      if (declaredClass == Boolean.TYPE) {        // boolean
        out.writeBoolean(((Boolean)instance).booleanValue());
      } else if (declaredClass == Character.TYPE) { // char
        out.writeChar(((Character)instance).charValue());
      } else if (declaredClass == Byte.TYPE) {    // byte
        out.writeByte(((Byte)instance).byteValue());
      } else if (declaredClass == Short.TYPE) {   // short
        out.writeShort(((Short)instance).shortValue());
      } else if (declaredClass == Integer.TYPE) { // int
        out.writeInt(((Integer)instance).intValue());
      } else if (declaredClass == Long.TYPE) {    // long
        out.writeLong(((Long)instance).longValue());
      } else if (declaredClass == Float.TYPE) {   // float
        out.writeFloat(((Float)instance).floatValue());
      } else if (declaredClass == Double.TYPE) {  // double
        out.writeDouble(((Double)instance).doubleValue());
      } else if (declaredClass == Void.TYPE) {    // void
      } else {
        throw new IllegalArgumentException("Not a primitive: "+declaredClass);
      }
    } else if (declaredClass.isEnum()) {         // enum
      UTF8.writeString(out, ((Enum)instance).name());
    } else if (Writable.class.isAssignableFrom(declaredClass)) { // 其他实现了writable接口的类型
      UTF8.writeString(out, instance.getClass().getName());
      ((Writable)instance).write(out);

    } else if (Message.class.isAssignableFrom(declaredClass)) {
      ((Message)instance).writeDelimitedTo(
          DataOutputOutputStream.constructOutputStream(out));
    } else {
      throw new IOException("Can't write: "+instance+" as "+declaredClass);
    }
  }

上边介绍的writable 接口的序列化，主要应用在mapreduce 过程中输入输出，但是hadoop还支持了其他序列化方法，包括hadoop Avro， Apache Thrift 和Google Protocol Bufferd等但是这些主要应用在远程rpc通信。对应数据存储例如：map的输出，reduce输出等就主要用到writable接口实现的类。

hadoop简单的序列化框架

序列化类图

接口 Serialzation

方法：

    //判断序列化实现是否支持该类对象
    boolean accept(Class<?> c);
    //获取用于序列化的对象Serializer的实现
    Serializer<T> getSerializer(Class<T> c);
    //获取用于反序列化的对象Deserializer实现
    Deserializer<T> getDeserializer(Class<T> c);

接口 Serializer

    //打开流，为序列化准备
  void open(OutputStream out) throws IOException;
    //开始将对象序列化到流中
  void serialize(T t) throws IOException;
    //关闭流，结束序列化，清理 
  void close() throws IOException;

接口 Deserializer

与序列化过程类似

java序列化支持

主要实现了Serialzation 接口，并且有两个静态内部类JavaSerializationDeserializer 和 JavaSerializationSerializer 分别实现Deserializer 和Serializer 接口具体代码可以查看 hadoop 项目hadoop-common 的org.apache.hadoop.io.serializer.JavaSerializationl 类。WritableSerialization 和AvroReflectSerialization 也有类似的实现。

转载于:https://my.oschina.net/zwf007/blog/891064

chuntongshen5917

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hadoop 序列化框架

hadoop 序列化框架[toc]序列化，反序列化序列化：按照一定格式把一个对象编码成一个字节流，可以存储在硬盘，可以在网络中传递，可以拷贝，克隆等，反序列化：把存入字节流的对象，解析成一个对象。java 序列化序列化接口： Serializable输入输出： Object...
复制链接

扫一扫