说hadoop的序列化与反序列化,怎么都离不开java的序列化。
Block block = new Block();
block.setId(0000001l);
block.setName("block one");
block.setPosition("offset 1");
block.setSize(10000l);
ObjectOutputStream outputStream = new ObjectOutputStream(System.out);
outputStream.writeObject(block);
Block 如下:
public class Block implements Serializable, Comparable<Block> {
java对象的序列化是将对象转换未连续的 字节流,便于存储或者网络传输。如代码一,创建ObjectOutputStream对象,通过调用writeObject()方法就可达到目的。如果关注过RMI的同学可能注意过,当RMI调用发生异常时,异常信息通常是有ObjectOutputStream的信息的。
我觉得这里有必要贴一下 writeObject()方法的注释。
/**
* Write the specified object to the ObjectOutputStream. The class of the
* object, the signature of the class, and the values of the non-transient
* and non-static fields of the class and all of its supertypes are
* written. Default serialization for a class can be overridden using the
* writeObject and the readObject methods. Objects referenced by this
* object are written transitively so that a complete equivalent graph of
* objects can be reconstructed by an ObjectInputStream.
*
* <p>Exceptions are thrown for problems with the OutputStream and for
* classes that should not be serialized. All exceptions are fatal to the
* OutputStream, which is left in an indeterminate state, and it is up to
* the caller to ignore or recover the stream state.
*
* @throws InvalidClassException Something is wrong with a class used by
* serialization.
* @throws NotSerializableException Some object to be serialized does not
* implement the java.io.Serializable interface.
* @throws IOException Any exception thrown by the underlying
* OutputStream.
*/
public final void writeObject(Object obj) throws IOException {
writeObject()会将 类的non-transient 和 non-static 字段的值以及它所有的父类都要被写入。序列化机制会自动访问对象的父类,以保证对象内容的一致性。同时,序列化机制不仅存储对象在内存中的原始数据,还会追踪通过该对象可以到达的其他对象的内部数据,并描述所有这些对象是如何被链接起来的。
java序列化机制中,反序列化的过程会不断的创建新的对象,这增加了 java对象的分配和回收,效率有些低。Hadoop中,反序列化可以复用对象,可以在同一个对象上得到多个反序列化结果。