Serialization is the process of turning structured objects into a byte stream for transmissionover a network or for writing to persistent storage. Deserialization is the reverseprocess of turning a byte stream back into a series of structured objects. Serialization appears in two quite distinct areas of distributed data processing: for interprocess communicationand for persistent storage.
Serialization mechanism should consider following:
1. Primitive type serialization
a. big-endian or little-endian
b. one-byte character or two-byte character
c. binary or textual
d. etc
2. Constructed type serialization
a. the order of each included primitive types or objects
Typical object serialization mechanisms include Java Serialization, Hadoop Writable, Thrift, Avro, ProtocalBuffer, etc.
Serialized Object Container
One primitive value or constructed type value can be viewed as an object. For interprocess communication, the unit of a byte steam is an object (as arguments or return values). But for persistent storage, storing every single object into a file is not efficient. Normally, we need to store a sequence of objects into one file. Therefore, a object container file format is needed, Avro datafile, SequenceFile are such file formats.
Typical object container file formats include Avro datafile, SequenceFile, etc.
The object container file format can use different types serializer to do the object serialization. For example, SequenceFile normally use Hadoop Writable as its key/value. But it can also use other serializer like Avro to serialize key/value.
RMI vs RPC
RMI is different from RPC in that it is object-oriented. The core concept is distributed object invocation. In Java RMI, when an object implements Remote interface(by default Serializable), it is regarded as a remote object, which means that it will be passed by reference (remote object referemce) across JVMs. If it only implements Serializable but not Remote interface, it will be passed by value across JVMs. The benefit of remote object invocation is that the client and the server always see the same object states.