Object serialization API provides a framework for encoding objects as byte streams and reconstructing objects from their byte-stream encodings. Once an object has been serialized, its encoding can be transmitted from one running virtual machine to another and stored on disk for later deserialization. It provides standard wire-level object representation for remote communication, and the standard persistent data format for the JavaBeans component architecture. Notice serialization proxy pattern which can help you avoid many of the pitfalls of object serialization.
Item 74: implement serializable judiciously
A major cost of implementing serializable is that itdecreases the flexibility to change a class's implementationonce it has been released.
The class's private and package-private instance fields become part of exported API, which harms information hiding.
We must carefully design a high-quality serialized form. But it also places constrains on the evolution of a class.
E.g stream unique identifiers/ serial version UIDs, if we use default one which is affected by class names, fields and so on, compatibility will be broken, resulting in an InvalidClassException at runtime.
A second cost is that itincreases the likelihood of bugs and security holes.
Deserialization is a hidden constructor, can easily leave objects open to invariant corruption and illegal access.
A third cost is that it increases thetesting burden associated with releasing a new version of a class.
The tests will be huge and can't be constructed automatically. We must testbinary compatibility and semantic compatibility which means ensure both that the serialization-deserialization process succeeds.
Implementing the serializable interface is not a decision to be undertaken lightly.
It's useful for object transmission or persistence. As a rule of thumb, value classes such as Date and BigInteger should implement Serializable as should most collection classes. Classes representing active entities, such as thread pools should rarely implement it.
Classes designed for inheritance should rarely implement Serializable and interfaces should rarely extend it.
But some exceptions:
Throwable implements it so exceptions from RMI (remote method invocation) can be passed.
Component implements it so GUIs can be sent saved and restored.
HttpServlet implements it so session state can be cached.
If you implement a class with serializable and extensible fields, cautiondefault values.
// readObjectNoData for stateful extendable serializable classes
private void readObjectNoData() throws InvalidObjectException {
throw new InvalidObjectException("Stream data required!");
}
You should consider providing a parameterless constructor on nonserializable classes designed for inherence. So that subclasses may be serializable.
E.g
// Nonserializable stateful class allowing serializable subclass
public abstract class AbstractFoo{
private int x, y ; // Out state
//This enum and field are used to track initialization
private enum State{
NEW, INITIALIZING, INITIALIZED
};
private final AtomicReference<State> init = new AtomicReference<State>(State.NEW);
public AbstractFoo(int x, int y) {
initialize(x, y);
}
//This constructor and the following method allow
// subclass's readObject method to initialize our state
protected AbstractFoo() {
}
protected final void initialize(int x, int y) {
if(! init.compareAndSet(State.NEW, State.INITIALIZED))
throw new IllegalStateException("Already initialized");
this.x = x;
this.y = y;
init.set(State.INITIALIZED);
}
//This method provides access to internal state so it can be manually serialized by subclass's writeObject method
protected final int getX() {
checkInit();
return x;
}
protected final int getY() {
checkInit();
return y;
}
//Must call from all public and protected instance methods
private void checkInit(){
if(init.get() != State.INITIALIZED)
throw new IllegalStateException("Uninitialized");
}
}
//Serializable subclass of nonserializable stateful class
public class Foo extends AbstractFoo implements Serializable{
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException{
s.defaultReadObject();
//Manually deserialize and initialize superclass state
int x = s.readInt();
int y = s.readInt();
initialize(x, y);
}
private void writeObject( ObjectOutputStream s){
S.defaultWriteObject();
//Manually serialize superclass state
s.writeInt(getX());
s.writeInt(getY());
}
//Constructor dose not use the fancy mechanism
public Foo(int x, int y){
super(x,y);
}
private static final long serialVersionUID = 1856835860954L;
}
Inner classes should not implement serializable.
The default serialized form of an inner class is ill-defined. A static member class can implement serializable.
To summarize, the ease of implementing serializable is specious. Unless a class is to be thrown away after a short period of use, implementing serializable is a serious commitment that should be made with care. For super classes, an intermediate design point between implementing serializable and prohibiting it in subclasses is to provide an accessible parameterless constructor. This design point permits, but does not require subclasses to implement serializable.
Item 75: consider using a custom serialized form
Do not accept the default serialized form without first considering whether it is appropriate.
The default serialized form is likely to be appropriate if an object's physical representation is identical to its logical content.
/**
* Good candidate for default serialized form
*/
public class Name implements Serializable{
/**
* Last name. Must be non-null
* @serial
*/
private final String lastName;
/**
* First name. Must be non-null
* @serial
*/
private final String firstName;
/**
* Middle name, or null if there is non.
* @serial
*/
private final String middleName;
}
Even if you decide that the default serialized form is appropriate, you oftenmust provide a readObject method to ensure invariants and security.
Using the default serialized form when an object's physical representation differs substantially from its logical data content has fourdisadvantages:
1. It permanently ties the exported API to the current internal representation.
2. It can consume excessive space.
3. It can consume excessive time.
4. It can cause stack overflows.
E.g StringList
//StringList with a reasonable custom serialized form
public final class StringList implements Serializable{
private transient int size = 0;
private transient Entry head = null;
//No longer serializable
private static class Entry{
String data;
Entry next;
Entry previous;
}
//Appends the specified string to the list
public final void add(String s) { ... }
/**
* Serialize this {@code StringList} instance
*
* @serialData The size of the list (the number of strings it contains) is emitted ({@code int}),
* followed by all of its elements (each a {@code String}), in the proper sequence.
*
* @param s
* @throws IOException
*/
private void writeObject(ObjectOutputStream s) throws IOException{
s.defaultWriteObject();
s.writeInt(size);
//Write out all elements in the proper order
for(Entry e = head; e != null; e = e.next)){
s.writeObject(e.data);
}
}
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException{
s.defaultReadObject();
int numElements = s.readInt();
//Read in all elements and insert them in list
for(int i = 0; i < numElements; i ++){
add((String)s.readObject());
}
}
}
If all instance fields are transient, it is technically permissible to dispence with invoking defaultWrite/Read Object, but it's not recommended.
Objects whose invariants are tied to implementation-specific details, such as HashTable tied to JVM, should not use default serialized form.
When defaultWriteObject is executed, every field not labeled transient will be serialized. So redundant fields, fields whose values are tied to one particular run of JVM such as a long field representing a pointer to a native data structure should be transient.
Before deciding to make a field transient, convince yourself that its value is part of the logical state of the object.
You must impose any synchronization on object serialization that you would impose on any other method that reads the entire state of the object.
Regardless of what serialied form you choose, declare an explicit serial version UID in every serializable class you write.
To summarize, when you decide a class should be serialized, think about its serialized form. If the logical state and physical representation is the same, you can use default form. You should allocate time to design the serialized form of a class as you allocate to design its exported API. Just as you can't eliminate exported methods from future versions, you can't eliminate fields from serialized form. Pay attention to compatibility.
Item 76: write readObject methods defensively
readObject is a constructor that takes a byte stream as its sole parameter. So a byte stream can be artificially constructed.
We can provide readObject method that checks the validity of the deserialized object.
But attackers can mutate instance by accessing object references.
When an object is deserialized, it is critical to defensively copy any field containing an object reference that a client must not possess. Every serializable immutable class containing private mutable components must defensively copy these components in its readObject method.
Defensive copy is not possible for final fields.
Would you feel comfortable adding a public comstructor that took as parameters the values for each nontransient field in the object and stored the values in the fields with no validation whatsoever? If not, you must provide a readObject.
A readObject method must not invoke an overridable method, directly or indirectly. If so the overriding method will run before the subclass's state has been deserialized.
To summarize, when writing readObject, just like writing a public constructor.
For classes with object reference fields that must remain private, defensively copy each object in such a field. Mutable components of immutable classes fall into this category.
Check any invariants and throw an InvalidObjectException if a check fails. The checks should follow defensive copying.
If an entire object graph must be validated after it is deserialzed, use the ObjectInputValidation interface.
Dont invoke any overridable methods in the class directly or indirectly.
Item 77: for instance control, prefer enum types to readResolve
Any readObject method, returns a newly created instance, which will not be the same instance that was created at class initialization time.
Use readResolve to guarantee the singleton property. It is invoked on the newly creadted object after it is deserialized. The object reference returned by the method is then returned in place of the newly created object and then be collected by gc.
If you depend on it for instance control, all instance fields with object reference types must be declared transient. The attack can steal the reference to the originally deserialized singleton before the singleton's readResolve is run.
The accessibility of readResolve is significant.
To summarize, use enum types to enforce instance control invariants wherever possible. If must use a class both serializable and instance-controlled, must provide a readResolve and ensure all of the instance fields are either primitive or transient.
Item 78: consider serialization proxies instead of serialized instances
1. design a private static nested class of the serialzable class (serialization proxy) that concisely represents the logical state of an instance of the enclosing class.
2. The default serialized form of the proxy is the perfect form of the enclosing class. Both enclosing and proxy class should be serializable.
3. Add writeReplace method to the enclosing class, which translate enclosing instance to proxy before serialization. So we never create enclosing serialized class.
4. Add readObject to the enclosing class to prevent attackers create serialized instance of enclosing class.
5. Provide readResolve method on the proxy class that returns a logically equivalent instance of the enclosing class which translates proxy back to an enclosing instance.
Advantages:
stops byte-stream attack and internal field theft attack.
allows the fields of enclosing class to be final which is truely immutable.
you don't need to think about which fields might be attacked and perform validity checking
it allows the deserialized insance to have a different class from the originally serialized instance.
E.g
//EnumSet's serialization proxy
private static class SerializationProxy <E extends Enum<E>> implements Serializable{
//The element type of this enum set.
private final Class<E> elementType;
//The elements contained in this enum set.
private final Enum[] elements;
SerializationProxy(EnumSet<E> set) {
elementType = set.elementType;
elements = set.toArray(EMPTY_ENUM_ARRAY);
}
private Object readResolve(){
EnumSet<E> result = EnumSet.noneOf(elementType);
for(Enum e: elements)
result.add((E)e);
return result;
}
private static final long serialVersionUID = 325295792374L;
}
Limitations:
its less effecient.
not compilable with classes that are extendable by their client.
not compilable with classes whose object graphs contain circularities.
In sammary, consider it whenever you write a readObject method on a class that is not extendable by its clients. This is perhaps the easiest way to robustly serialize objects with non-trival invariants.