java中block_size_java序列化与blockdata

1.

摘要

前些日子系统发现一个bug,序列化后再反序列化对象后,原先的文字会产生乱码。纠结一通宵,发现原因是java对象序列化再反序列化时,通过read(byte[])没有读到完整的数据。

2.

现象

待序列化的对象实现了Externalizable接口:

public class Test implements Externalizable {

private String s;

@Override

public void writeExternal(ObjectOutput out) throws IOException {

if (s == null) {

out.writeShort(-1);

} else {

byte[] bb = s.getBytes("utf-8");

out.writeShort(bb.length);

out.write(bb);

}

}

@Override

public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {

int len = in.readShort();

if (len < 0)

return null;

byte[] bb = new byte[len];

in.read(bb);

return new String(bb, "utf-8");

}

}

序列化的逻辑如下:

ByteArrayOutputStream bos = new ByteArrayOutputStream();

(new ObjectOutputStream( bos )).writeObject( value );

val = bos.toByteArray();

反序列化的逻辑大致如下:

ContextObjectInputStream ois =new ContextObjectInputStream( new ByteArrayInputStream( buf ), classLoader );

Object obj=ois.readObject(buf);

当byte[]超过1024byte时,obj中的s会被截断,超出部分变为空格。

3.

原因

追查源代码,发现Java在序列化、反序列化时有BlockData这个概念。

3.1.

序列化与writeObject

writeObject()逻辑:

ObjectOutputStream.writeObject(obj) -> ObjectOutputStream.writeOrdinaryObject(obj,class) -> ObjectOutputStream.writeExternalData(obj)

ObjectOutputStream类关键代码如下:

public class ObjectOutputStream

extends OutputStream implements ObjectOutput, ObjectStreamConstants

{

private final BlockDataOutputStream bout;

public void write(byte[] buf) throws IOException {

bout.write(buf, 0, buf.length, false);

}

private void writeExternalData(Externalizable obj) throws IOException {

……

bout.setBlockDataMode(true);

obj.writeExternal(this); //这里调用的是GroupDm的writeExternal

bout.setBlockDataMode(false);

bout.writeByte(TC_ENDBLOCKDATA);

……

}

}

在调用obj.writeExternal(this)时回调了 ObjectOutputStream 的write(byte[] buf),进一步调用了BlockDataOutputStream的write(buf, 0, buf.length, false)。

BlockDataOutputStream的关键部分如下:

private static class BlockDataOutputStream

extends OutputStream implements DataOutput

{

/** maximum data block length */

private static final int MAX_BLOCK_SIZE = 1024;

/** buffer for writing general/block data */

private final byte[] buf = new byte[MAX_BLOCK_SIZE];

/** underlying output stream */

private final OutputStream out;

void write(byte[] b, int off, int len, boolean copy)

throws IOException

{

……

while (len > 0) {

if (pos >= MAX_BLOCK_SIZE) {

drain();

}

if (len >= MAX_BLOCK_SIZE && !copy && pos == 0) {

// avoid unnecessary copy

writeBlockHeader(MAX_BLOCK_SIZE);

out.write(b, off, MAX_BLOCK_SIZE);

off += MAX_BLOCK_SIZE;

len -= MAX_BLOCK_SIZE;

} else {

int wlen = Math.min(len, MAX_BLOCK_SIZE - pos);

System.arraycopy(b, off, buf, pos, wlen);

pos += wlen;

off += wlen;

len -= wlen;

}

}

}

/**

* Writes all buffered data from this stream to the underlying stream,

* but does not flush underlying stream.

*/

void drain() throws IOException {

if (pos == 0) {

return;

}

if (blkmode) {

writeBlockHeader(pos);

}

out.write(buf, 0, pos);

pos = 0;

}

/**

* Writes block data header. Data blocks shorter than 256 bytes are

* prefixed with a 2-byte header; all others start with a 5-byte

* header.

*/

private void writeBlockHeader(int len) throws IOException {

if (len <= 0xFF) {

hbuf[0] = TC_BLOCKDATA; //TC_BLOCKDATA = (byte)0x77;

hbuf[1] = (byte) len;

out.write(hbuf, 0, 2);

} else {

hbuf[0] = TC_BLOCKDATALONG;

Bits.putInt(hbuf, 1, len);

out.write(hbuf, 0, 5);

}

}

/**

* Sets block data mode to the given mode (true == on, false == off)

* and returns the previous mode value. If the new mode is the same as

* the old mode, no action is taken. If the new mode differs from the

* old mode, any buffered data is flushed before switching to the new

* mode.

*/

boolean setBlockDataMode(boolean mode) throws IOException {

if (blkmode == mode) {

return blkmode;

}

drain();

blkmode = mode;

return !blkmode;

}

}

当在block-mode写入时,会先写入到BlockData的buffer里,当buffer长度>=1024或执行 setBlockDataMode时,会在实际output中写入block header后再写入buffer数据,header有两种格式,若header后的数据<256,则占用 2byte(TC_BLOCKDATA+len),否则占用5byte(TC_BLOCKDATALONG+len),于是序列化后的对象的数据结构大致如下:

meta ..block-header(5byte)..data(1024byte)..block-header(5byte)..data(1024byte)..block-header(2byte)..data(<256byte)..block-end

其中:

TC_BLOCKDATA = (byte)0×77;

TC_ENDBLOCKDATA = (byte)0×78;

3.2.

反序列化与readObject:

ObjectInputStream.readObject(buf) ->ObjectInputStream.readObject0() -> ObjectInputStream.readOrdinaryObject() ->ObjectInputStream.readExternalData(obj) -> GroupDm.readExternal(objectInputStream) ->回调InputStream.read(byte[]) ->ObjectInputStream.read(byte[] buffer,0,buffer.length)

进一步调用了BlockDataInputStream.read(buf,off,len),关键代码如下:

private class BlockDataInputStream

extends InputStream implements DataInput

{

/**

* Attempts to read len bytes into byte array b at offset off. Returns

* the number of bytes read, or -1 if the end of stream/block data has

* been reached. If copy is true, reads values into an intermediate

* buffer before copying them to b (to avoid exposing a reference to

* b).

*/

int read(byte[] b, int off, int len, boolean copy) throws IOException {

if (len == 0) {

return 0;

} else if (blkmode) {

if (pos == end) {

refill();

}

if (end < 0) {

return -1;

}

int nread = Math.min(len, end - pos);

System.arraycopy(buf, pos, b, off, nread);

pos += nread;

return nread;

} else if (copy) {

int nread = in.read(buf, 0, Math.min(len, MAX_BLOCK_SIZE));

if (nread > 0) {

System.arraycopy(buf, 0, b, off, nread);

}

return nread;

} else {

return in.read(b, off, len);

}

}

/**

* Refills internal buffer buf with block data. Any data in buf at the

* time of the call is considered consumed. Sets the pos, end, and

* unread fields to reflect the new amount of available block data; if

* the next element in the stream is not a data block, sets pos and

* unread to 0 and end to -1.

*/

private void refill() throws IOException {

try {

do {

pos = 0;

if (unread > 0) {

int n =

in.read(buf, 0, Math.min(unread, MAX_BLOCK_SIZE));

if (n >= 0) {

end = n;

unread -= n;

} else {

throw new StreamCorruptedException(

"unexpected EOF in middle of data block");

}

} else {

int n = readBlockHeader(true);

if (n >= 0) {

end = 0;

unread = n;

} else {

end = -1;

unread = 0;

}

}

} while (pos == end);

} catch (IOException ex) {

pos = 0;

end = -1;

unread = 0;

throw ex;

}

}

}

read时会读block header的数据,之后把一个block的数据加载到buffer,每次读取请求直接返回buffer里的数据,如果一次请求超过了buffer里剩余 数据的长度,那么返回buffer剩余的数据,下次调用时才会再刷新buffer,所以一次调用可能读到的数据不完整,并且后面的部分都是空 格。

4.

解决

read(byte[])改为:

while ((n = in.read(data, readLen, len - readLen)) > 0) {

readLen += n;

if (readLen >= len)

break;

}

5.

为什么会有block-data:

关于block-data,java doc里的描述: http://docs.oracle.com/javase/7/docs/platform/serialization/spec/protocol.html

6.3 Stream Protocol Versions

It was necessary to make a change to the serialization stream format

in JDK 1.2 that is not backwards compatible to all minor releases of

JDK 1.1. To provide for cases where backwards compatibility is

required, a capability has been added to indicate what

PROTOCOL_VERSION to use when writing a serialization stream. The

method ObjectOutputStream.useProtocolVersiontakes as a parameter the

protocol version to use to write the serialization stream. The Stream

Protocol Versions are as follows:

ObjectStreamConstants.PROTOCOL_VERSION_1: Indicates the initial stream

format. ObjectStreamConstants.PROTOCOL_VERSION_2: Indicates the new

external data format. Primitive data is written in block data mode and

is terminated with TC_ENDBLOCKDATA. Block data boundaries have been

standardized. Primitive data written in block data mode is normalized

to not exceed 1024 byte chunks. The benefit of this change was to

tighten the specification of serialized data format within the stream.

This change is fully backward and forward compatible.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值