Java 支持一种称为对象徐硫化(object serialization)的非常通用的机制,可以将任何对写出到流,并在之后将其读回,我们可以使用 ObjectOutputStream
和 ObjectInputStream
来进行序列化和反序列化操作。
所有支持序列化的类都必须实现 Serializable
接口,这也是一个标记接口。
简单使用
当对一个对象序列化时,这个对象内部可能存在复杂的对象网络,我们不能保存和恢复内部变量的内存地址,因为当对象被重新加载时,它可能占据的是与原来完全不同的内存地址。因此,每个对象都是用一个序列号(serial number)保存的,这就是这种机制被称为对象序列化的原因。
我们可以使用序列化将对象集合保存在磁盘文件中,并按照他们被存储的样子获取他们。序列化的另一个非常重要的应用是通过网络将对象集合传送到另一台计算机上,远程调用就是用到这一原理。
序列号的作用为,在序列化与反序列化时,每个对象引用都关联一个序列号,用此序列号来代表该对象,简单使用示例:
public static void main(String[] args) throws IOException, ClassNotFoundException {
Employee harry = new Employee("Harry Hacker", 50000, 1989, 10, 1);
Manager carl = new Manager("Carl Cracker", 80000, 1987, 12, 15);
carl.setSecretary(harry);
Manager tony = new Manager("Tony Tester", 40000, 1990, 3, 15);
tony.setSecretary(harry);
Employee[] staff = new Employee[3];
staff[0] = carl;
staff[1] = harry;
staff[2] = tony;
// save all employee records to the file employee.dat
try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("employee.dat"))) {
out.writeObject(staff);
}
try (ObjectInputStream in = new ObjectInputStream(new FileInputStream("employee.dat"))) {
// retrieve all records into a new array
Employee[] newStaff = (Employee[]) in.readObject();
// raise secretary's salary
newStaff[1].raiseSalary(10);
// print the newly read employee records
for (Employee e : newStaff)
System.out.println(e);
}
}
序列化算法
Java 中序列化的核心为序列号,利用序列号来进行序列化操作,算法大致过程为:
- 对遇到的每一个对象引用都关联也序列号
- 对于每个对象,当第一次遇到时,保留其对象数据到流中
- 如果某个对象之前已经被保存过,那么只写出“与之前保存过的序列号为x的对象相同”
我们用以下代码验证下此算法,可以得知同一个对象(引用相同)确实只会实例化一次
@Test
public void test() {
try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("a.txt"));
ObjectInputStream in = new ObjectInputStream(new FileInputStream("a.txt"))) {
Person p1 = new Person("1", 1);
Person p2 = new Person("2", 2);
Person p3 = new Person("1", 1);
out.writeObject(p1);
out.writeObject(p2);
out.writeObject(p3);
out.writeObject(p3);
Person pp1 = (Person) in.readObject();
Person pp2 = (Person) in.readObject();
Person pp3 = (Person) in.readObject();
Person pp4 = (Person) in.readObject();
System.out.println(pp1); // Person(name=1, age=1)
System.out.println(pp2); // Person(name=2, age=2)
System.out.println(pp3); // Person(name=1, age=1)
System.out.println(pp4); // Person(name=1, age=1)
System.out.println(pp1 == pp3); // false
System.out.println(pp3 == pp4); // true
} catch (IOException | ClassNotFoundException e) {
e.getMessage();
}
}
当然这个序列化算法也会存在问题:由于java序利化算法不会重复序列化同一个对象,只会记录已序列化对象的编号。如果序列化一个可变对象(对象内的内容可更改)后,更改了对象内容,再次序列化,并不会再次将此对象转换为字节序列,而只是保存序列化编号。
public void test2() {
try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("a.txt"));
ObjectInputStream in = new ObjectInputStream(new FileInputStream("a.txt"))) {
Person p1 = new Person("1", 1);
out.writeObject(p1);
p1.setAge(2);
p1.setName("2");
out.writeObject(p1);
Person pp1 = (Person) in.readObject();
Person pp2 = (Person) in.readObject();
System.out.println(pp1); // Person(name=1, age=1)
System.out.println(pp2);// Person(name=1, age=1)
} catch (IOException | ClassNotFoundException e) {
e.getMessage();
}
}
自定义序列化
transient
当某些属性不需要序列化时,使用 transient 关键字可以达到目的。使用transient修饰的属性,java序列化时,会忽略掉此字段,所以反序列化出的对象,被transient修饰的属性是默认值。对于引用类型,值是null;基本类型,值是0;boolean类型,值是false。
重写方法
通过重写writeObject与readObject方法,可以自己选择哪些属性需要序列化, 哪些属性不需要。如果writeObject使用某种规则序列化,则相应的readObject需要相反的规则反序列化,以便能正确反序列化出对象。
Classes that require special handling during the serialization and deserialization process must implement special methods with these exact signatures:
// 写出指定的对象到ObjectOutputStream,这个方法将存储指定对象的类、类的签名以及这个类及其超类中所有非静态和非瞬时的域的值
private void writeObject(java.io.ObjectOutputStream out)
throws IOException
// 从ObjectInputStream中读入一个对象,这个方法会读回指定对象的类、类的签名以及这个类及其超类中所有非静态和非瞬时的域的值
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException;
private void readObjectNoData()
throws ObjectStreamException;
The writeObject method is responsible for writing the state of the object for its particular class so that the corresponding readObject method can restore it. The default mechanism for saving the Object’s fields can be invoked by calling out.defaultWriteObject. The method does not need to concern itself with the state belonging to its superclasses or subclasses. State is saved by writing the individual fields to the ObjectOutputStream using the writeObject method or by using the methods for primitive data types supported by DataOutput.
The readObject method is responsible for reading from the stream and restoring the classes fields. It may call in.defaultReadObject to invoke the default mechanism for restoring the object’s non-static and non-transient fields. The defaultReadObject method uses information in the stream to assign the fields of the object saved in the stream with the correspondingly named fields in the current object. This handles the case when the class has evolved to add new fields. The method does not need to concern itself with the state belonging to its superclasses or subclasses. State is restored by reading data from the ObjectInputStream for the individual fields and making assignments to the appropriate fields of the object. Reading primitive data types is supported by DataInput.
The readObjectNoData method is responsible for initializing the state of the object for its particular class in the event that the serialization stream does not list the given class as a superclass of the object being deserialized. This may occur in cases where the receiving party uses a different version of the deserialized instance’s class than the sending party, and the receiver’s version extends classes that are not extended by the sender’s version. This may also occur if the serialization stream has been tampered; hence, readObjectNoData is useful for initializing deserialized objects properly despite a “hostile” or incomplete source stream.
有些时候,我们有这样的需求,某些属性不需要序列化。使用transient关键字选择不需要序列化的字段。
使用transient修饰的属性,java序列化时,会忽略掉此字段,所以反序列化出的对象,被transient修饰的属性是默认值。对于引用类型,值是null;基本类型,值是0;boolean类型,值是false。
Externalizable
public interface Externalizable extends java.io.Serializable {
void writeExternal(ObjectOutput out) throws IOException;
void readExternal(ObjectInput in) throws IOException, ClassNotFoundException;
}
使用Externalizable
时必须实现 writeExternal
和 readExternal
方法,这些方法对包括超类数据在内的整个对象的存储和恢复负责,而序列化机制在流中仅仅制式记录该对象所属的类。在读入可外部化的类时,对象流将用无参构造器创建一个对象,然后调用 readExternal
方法。
需要注意的是 readObject
和wrieteObject
方法是私有的,并有只能被序列化机制调用。与此不同的是,readExternal
和 writeExternal
方法是公共的,并且readExternal
还潜在地允许改变现有对象的状态。
序列化单例和类型安全的枚举
在序列化和反序列化时,如果目标对象是唯一的,此时一定要小心。如果使用的是 Java 中你的 enum结构,它可以正常工作,但如果维护的是以下代码
@Test
public void test() {
Orientation a = Orientation.A;
try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("b.dat"));
ObjectInputStream in = new ObjectInputStream(new FileInputStream("b.dat"))) {
System.out.println(a); // Orientation(value=1)
out.writeObject(a);
Orientation o = (Orientation) in.readObject();
System.out.println(a == o); // false
System.out.println(o); // Orientation(value=1)
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
}
}
@ToString
class Orientation implements Serializable{
public static final Orientation A = new Orientation(1);
private int value;
private Orientation(int value) {
this.value = value;
}
}
需要注意反序列化得到的对象是一个全新的对象,和序列化时的对象不是同一个。及时构造器是私有的,序列化机制也可以创建新的对象。
此时如果想得到原来的对象,可以使用 readResolve
的特殊序列化方法,该方法会在对象被序列化之后会被调用,它必须返回一个对象,而该对象之后会成为 readObject
的返回值。
对于上述代码的改动即在 Orientation 类中增加以下代码即可在反序列化时得到序列化时相同的对象。
private Object readResolve() {
if (value == 1) return Orientation.A;
return null;
}
同样地,还有 writeReplace
方法:在序列化时,会先调用此方法,再调用writeObject方法。此方法可将任意对象代替目标序列化对象。
序列化版本号serialVersionUID
The serialization runtime associates with each serializable class a version number, called a serialVersionUID, which is used during deserialization to verify that the sender and receiver of a serialized object have loaded classes for that object that are compatible with respect to serialization. If the receiver has loaded a class for the object that has a different serialVersionUID than that of the corresponding sender’s class, then deserialization will result in an
InvalidClassException
. A serializable class can declare its own serialVersionUID explicitly by declaring a field named"serialVersionUID"
that must be static, final, and of typelong
:
ANY-ACCESS-MODIFIER static final long serialVersionUID = 42L;
If a serializable class does not explicitly declare a serialVersionUID, then the serialization runtime will calculate a default serialVersionUID value for that class based on various aspects of the class, as described in the Java™ Object Serialization Specification. However, it is strongly recommended that all serializable classes explicitly declare serialVersionUID values, since the default serialVersionUID computation is highly sensitive to class details that may vary depending on compiler implementations, and can thus result in unexpected
InvalidClassException
s during deserialization. Therefore, to guarantee a consistent serialVersionUID value across different java compiler implementations, a serializable class must declare an explicit serialVersionUID value. It is also strongly advised that explicit serialVersionUID declarations use theprivate
modifier where possible, since such declarations apply only to the immediately declaring class–serialVersionUID fields are not useful as inherited members. Array classes cannot declare an explicit serialVersionUID, so they always have the default computed value, but the requirement for matching serialVersionUID values is waived for array classes.
根据 JDK 文档, serialVersionUID
主要的作用为在序列化和反序列化时进行校验,如果在反序列化时使用的 class 版本号与序列化时使用的不一致,会抛出 InvalidClassException
异常。
如果不自定义 serialVersionUID
,在运行时会自动生成,但是官方推荐自定义 serialVersionUID
并且将其设为 private,这样子类就不能使用,而且避免了因在不同 jvm 间移植时生成的 serialVersionUID
不同的风险。