Serialization - shared delegates

from:

http://www.jroller.com/scolebourne/entry/serialization_shared_delegates

 

I've been working on  Joda-Money   as a side project and have been investigating serialization, with a hope of improving  JSR-310

Small serialization

Joda-Money has two key classes -  BigMoney , capable of storing information to any scale and  Money , limited to the correct number of decimal places for the currency.

 

 

A default application of serialization to these classes will generate 525 bytes for  BigMoney   and 599 bytes for  Money . This is a lot of data to be sending for objects that seem quite simple.

Where does the size go?

Well, each serialized class had to write a header to state what the class is. For something like  Money , it has to write a header for itself,  BigMoney ,CurrencyUnit ,  BigDecimal   and  BigInteger . The header also includes the serialization version number and the names of each field.

Of course, serialization is designed to handle complex cases where the versions of the class file differ on two JVMs. Data is populated into the right fields using the field name. But for simple classes like money, the data isn't going to change over time.

One interesting fact is that the class header is only sent once per stream for a class. As a result, for each subsequent after the first the size is reduced. For default serialization of a subsequent  BigMoney   the size is 59 bytes and for  Money   it is 65 bytes. Clearly, the header is a major overhead.

Making the data smaller

The key to this is using a serialization delegate class. The delegate is a class that is written into the output stream  in place of   the original class. This approach is required because the fields are  final   which prevents a sensible data format from being written/read by the class itself.

 

 

So, there is a new class  Ser   which will appear in the stream wherever the  Money   class would have been. The name  Ser   is deliberately short, as each letter takes up space in the stream.

The delegate class is usually written as a static inner class:

 

 

The delegate class uses the low level  writeObject   and  readObject   to control the data in the stream. The  readResolve   method then returns the correct object back for the serialization mechanism to put in the object structure. The class is  static   to ensure a stable serialized form.

Simply taking control of the stream in this way will greatly reduce the overall size. The biggest gain is in writing out the  BigDecimal   in an efficient manner.

Even better?

My investigation has shown a technique to make the stream even smaller.

Firstly, rather than using a static inner class, use a top-level package scoped class. This will have a shorter fully qualified class name, thus a shorter header.

Secondly, look at the other classes in the package. If there are more classes that need the same treatment, why not use a single delegate class for all of them?

 

 

So, both classes are sharing the same serialization delegate, using a single byte type to distinguish them. Since the header is written once per class per stream, there is now only one header written whether your stream contains  BigMoney ,  Money   or both.

I've also switched to using  Externalizable   rather than  Serializable . Despite the public methods, these cannot be called on the general API because this is a package scoped class. This change doesn't affect the stream size, but should perform faster (untested!) as there is less reflection involved.

With these changes, the stream size for sending one  BigMoney   or  Money   drops to 58 bytes from 525/299 bytes. Sending a subsequent object of the same type drops to 24 bytes, whereas the default would be 59/65 bytes.

The single shared delegate approach also results in a smaller jar file, as there is a large jar file size overhead for each separate class. (We've replaced two delegates by one, so the jar is smaller).

One downside with this approach is that serialization is no longer encapsulated within the class being serialized. This may result in a constructor becoming package scoped rather than private.

The approach is also only recommended where the class and serialized format is stable, as you are fully responsible for evolution over time of the data format.

A final downside is that the object identity of objects might not be not preserved. For example, if the data of the  BigDecimal   is written out rather than a reference to the object then a new  BigDecimal   object will be created for each  BigMoney   deserialized. The extent to which this is a problem is dependent on the memory structure being serialized.

The same problem applies to multiple  Money   object backed by the same  BigMoney . The default serialized size for the second would be just 10 bytes, whereas the basic shared delegate approach would be 24 bytes.

As a result, I recommend only writing the base class,  BigMoney   in this case, directly using its contents. Other classes that contain the base class,  Money   in this case, should write out a reference to the  BigMoney   from the shared delegate. This approach means that the second  Money   takes 14 bytes when theBigMoney   is shared and 34 bytes when it isn't.

Using this final approach, the figures are as follows

ObjectDefault serializationShared delegate
First sentSubsequentFirst sentSubsequent
BigMoney525595824
Money599656834
Money with shared BigMoney599106814
Summary

The shared delegate technique offers one route to the smallest stream size for serialization. The data size for the first object was a tenth of the original, and halved for subsequent objects. However, I would recommend this as a specialist technique for low level value objects rather than general beans.

So is this worth applying to JSR-310? Feedback welcome!

 

 

http://sourceforge.net/apps/trac/lilith/wiki/SerializationPerformance

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值