zookeeper源码解析系列一序列化与反序列化

最新推荐文章于 2021-12-23 23:23:24 发布

我是小水杯

最新推荐文章于 2021-12-23 23:23:24 发布

阅读量628

点赞数

分类专栏： zookeeper 文章标签： zookeeper 序列化 jute 源码

本文链接：https://blog.csdn.net/missv5/article/details/86654131

版权

zookeeper 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

1. 序列化与反序列化概述

序列化：将对象的状态信息转换为可以存储或传输的形式的过程
反序列化：字节序列的或XML编码格式等还原为完全相等的对象

序列化的主要使用场景：
(1) 将序列化的对象存储到某个存储媒介中
(2) 用于网络传输

2. java序列化和反序列化

2.1 基本概念

使用java自带的序列化功能，一定要实现Serializable和Externalizable接口，然后配合ObjectInputStream 和 ObjectOutputStream进行对象的读写。

2.2 实例

// 需要序列化的对象()
    @Data // lombok注解
    public class Vote implements Serializable {
        private Long id;
        private Integer version;
    }
// 测试序列化
    public class TestSer {
        public static final String FILE_PATH = "/tmp/vote";
    
        public static void main(String[] args) throws IOException, ClassNotFoundException {
            serialize();
            Vote vote = deserialize();
            System.out.println(vote);
        }
    
    
        public static void serialize() throws IOException {
            ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(FILE_PATH));
            Vote vote = new Vote();
            vote.setId(1L);
            vote.setVersion(10);
            oos.writeObject(vote);
            oos.close();
        }
    
        public static Vote deserialize() throws IOException, ClassNotFoundException {
            ObjectInputStream ois = new ObjectInputStream(new FileInputStream(FILE_PATH));
            Vote vote = (Vote)ois.readObject();
            return vote;
        }
    }

2.3 缺点

我们可以看到短短的几行代码就可以实现序列化功能，但是zookeeper并没有使用java自身的序列化。主要是因为，java自身的序列化功能存在以下几个缺点：

无法跨语言：只能使用java语言进行序列化操作，在网络传输过程中我们很有可能会使用其他语言来处理，这个限制极大的影响了多语言的配合。
码流太大:：磁盘IO和网络IO很容易成为性能瓶颈，码流过大会操作性能下降；
性能差：由于Java序列化采用同步阻塞IO效率非常差。

考虑到java序列化功能的局限制，zookeeper使用jute序列化和反序列化。虽然现在已经存在比jute更好的序列化组件，比如:Avro,Protobuf等，但是新老版本序列化的兼容性问题将变得比较棘手，而且jute还没有成为zookeeper的性能瓶颈。

3. jute介绍

3.1 核心类

InputArchive ：所有反序列化器都需要实现的接口
OutputArchive：所有序列化器都需要实现的接口
Index ：用于迭代反序列化器的迭代器
Record : 所有用于网络传输或者本地存储的类型都实现该接口

3.2 InputArchive

3.2.1 接口介绍

// 读取各种类型变量的接口
public interface InputArchive {
    public byte readByte(String tag) throws IOException;
    public boolean readBool(String tag) throws IOException;
    public int readInt(String tag) throws IOException;
......

3.2.2 子类结构

InputArchive核心子类

BinaryInputArchive：封装DataInput接口，用于从二进制流中读取字节
CsvInputArchive　：用于读取csv格式流的数据
XmlInputArchive　：用于读取xml格式流的数据

由于zookeeper中主要使用BinaryInputArchive，所以只对这部分源码进行介绍

public class BinaryInputArchive implements InputArchive {
    static public final String UNREASONBLE_LENGTH= "Unreasonable length = ";
    // DataInput 接口提供用于读取二进制流字节并重建为任何java原始数据类型，也提供了一个转换为UTF-8编码的字符串类型方法；
    private DataInput in;
    
    // 将流对象转换为本类对象
    static public BinaryInputArchive getArchive(InputStream strm) {
        return new BinaryInputArchive(new DataInputStream(strm));
    }
    // 迭代器类
    static private class BinaryIndex implements Index {
        // 元素个数
        private int nelems;
        BinaryIndex(int nelems) {
            this.nelems = nelems;
        }
        public boolean done() {
            return (nelems <= 0);
        }
        public void incr() {
            nelems--;
        }
    }
    /** Creates a new instance of BinaryInputArchive */
    public BinaryInputArchive(DataInput in) {
        this.in = in;
    }
    
    /*
     读取各种基本数据类型
    */
    public byte readByte(String tag) throws IOException {
        return in.readByte();
    }
    
    public boolean readBool(String tag) throws IOException {
        return in.readBoolean();
    }
    
    public int readInt(String tag) throws IOException {
        return in.readInt();
    }
    
    public long readLong(String tag) throws IOException {
        return in.readLong();
    }
    
    public float readFloat(String tag) throws IOException {
        return in.readFloat();
    }
    
    public double readDouble(String tag) throws IOException {
        return in.readDouble();
    }
    
    /*
    读取一个字符串
    */
    public String readString(String tag) throws IOException {
        // 确定长度
    	int len = in.readInt();
    	if (len == -1) return null;
    	// 校验长度范围
        checkLength(len);
    	byte b[] = new byte[len];
    	// 从输入流中读取一些字节，并将它们存储在缓冲区数组b中
    	in.readFully(b);
    	return new String(b, "UTF8");
    }
    
    static public final int maxBuffer = Integer.getInteger("jute.maxbuffer", 0xfffff);

    /*
    读取缓冲区和字符串方式一样
    */
    public byte[] readBuffer(String tag) throws IOException {
        int len = readInt(tag);
        if (len == -1) return null;
        checkLength(len);
        byte[] arr = new byte[len];
        in.readFully(arr);
        return arr;
    }
    /**
    * 读取记录
    */
    public void readRecord(Record r, String tag) throws IOException {
        r.deserialize(this, tag);
    }
    
    public void startRecord(String tag) throws IOException {}
    
    public void endRecord(String tag) throws IOException {}
    
    /**
     * 创建迭代器
     */
    public Index startVector(String tag) throws IOException {
        int len = readInt(tag);
        if (len == -1) {
        	return null;
        }
		return new BinaryIndex(len);
    }
    
    public void endVector(String tag) throws IOException {}
    
    public Index startMap(String tag) throws IOException {
        return new BinaryIndex(readInt(tag));
    }
    
    public void endMap(String tag) throws IOException {}

    /**
    * 只是一个粗略的检查
    */
    private void checkLength(int len) throws IOException {
        if (len < 0 || len > maxBuffer + 1024) {
            throw new IOException(UNREASONBLE_LENGTH + len);
        }
    }
}

3.3 OutputArchive

3.3.1 接口介绍

// 写入各种类型变量的接口
public interface OutputArchive {
    public void writeByte(byte b, String tag) throws IOException;
    public void writeBool(boolean b, String tag) throws IOException;
    public void writeInt(int i, String tag) throws IOException;

3.3.2 子类结构

OutputArchive子类结构
同上只介绍BinaryOutputArchive：

public class BinaryOutputArchive implements OutputArchive {
    // 缓冲
    private ByteBuffer bb = ByteBuffer.allocate(1024);
    // 封装DataOutput对象，DataOutput 接口用于将任意 Java 基本类型转换为一系列字节，并将这些字节写入二进制流。同时还提供了一个将 String 转换成 UTF-8 修改版格式并写入所得到的系列字节的工具。
    private DataOutput out;
    
    // 包装OutputStream
    public static BinaryOutputArchive getArchive(OutputStream strm) {
        return new BinaryOutputArchive(new DataOutputStream(strm));
    }
    
    /** Creates a new instance of BinaryOutputArchive */
    public BinaryOutputArchive(DataOutput out) {
        this.out = out;
    }
    
    /*
      写入基础数据类型
    */
    public void writeByte(byte b, String tag) throws IOException {
        out.writeByte(b);
    }
    
    public void writeBool(boolean b, String tag) throws IOException {
        out.writeBoolean(b);
    }
    
    public void writeInt(int i, String tag) throws IOException {
        out.writeInt(i);
    }
    
    public void writeLong(long l, String tag) throws IOException {
        out.writeLong(l);
    }
    
    public void writeFloat(float f, String tag) throws IOException {
        out.writeFloat(f);
    }
    
    public void writeDouble(double d, String tag) throws IOException {
        out.writeDouble(d);
    }
    
    /**
     * create our own char encoder to utf8. This is faster 
     * then string.getbytes(UTF8).
     * 
     * 字符串类型转换为ByteBuffer
     * @param s the string to encode into utf8
     * @return utf8 byte sequence.
     */
    final private ByteBuffer stringToByteBuffer(CharSequence s) {
        bb.clear();
        final int len = s.length();
        for (int i = 0; i < len; i++) {
            if (bb.remaining() < 3) {// 剩余空间小于3
                // 扩容一倍
                ByteBuffer n = ByteBuffer.allocate(bb.capacity() << 1);
                bb.flip();
                n.put(bb);
                // 重新赋值到缓冲
                bb = n;
            }
            char c = s.charAt(i);
            if (c < 0x80) {// 0 -128 
                // 1个字节
                bb.put((byte) c);
            } else if (c < 0x800) {// 128 - 2048
                // 2个字节
                bb.put((byte) (0xc0 | (c >> 6)));
                bb.put((byte) (0x80 | (c & 0x3f)));
            } else { // 2048 -
                bb.put((byte) (0xe0 | (c >> 12)));
                bb.put((byte) (0x80 | ((c >> 6) & 0x3f)));
                bb.put((byte) (0x80 | (c & 0x3f)));
            }
        }
        bb.flip();
        return bb;
    }

    /**
     *  写入字符串类型
     */
    public void writeString(String s, String tag) throws IOException {
        // 长度-1表示null
        if (s == null) {
            writeInt(-1, "len");
            return;
        }
        ByteBuffer bb = stringToByteBuffer(s);
        // 写入长度
        writeInt(bb.remaining(), "len");
        // 写入数据
        out.write(bb.array(), bb.position(), bb.limit());
    }

   /**
    * 写缓冲
    */
    public void writeBuffer(byte barr[], String tag)
    throws IOException {
    	if (barr == null) {
    		out.writeInt(-1);
    		return;
    	}
    	out.writeInt(barr.length);
        out.write(barr);
    }
    
    public void writeRecord(Record r, String tag) throws IOException {
        r.serialize(this, tag);
    }
    
    public void startRecord(Record r, String tag) throws IOException {}
    
    public void endRecord(Record r, String tag) throws IOException {}
    
    public void startVector(List v, String tag) throws IOException {
    	if (v == null) {
    		writeInt(-1, tag);
    		return;
    	}
        writeInt(v.size(), tag);
    }
    
    public void endVector(List v, String tag) throws IOException {}
    
    public void startMap(TreeMap v, String tag) throws IOException {
        writeInt(v.size(), tag);
    }
    
    public void endMap(TreeMap v, String tag) throws IOException {}
    
}

3.4 Index

3.4.1 接口介绍

public interface Index {
    // 是否已经完成
    public boolean done();
    // 下一项
    public void incr();
}

BinaryIndex在前面已经介绍了，不做重复

3.5 Record

public interface Record {
    // 序列化
    public void serialize(OutputArchive archive, String tag)
        throws IOException;
    // 反序列化
    public void deserialize(InputArchive archive, String tag)
        throws IOException;
}

4.实例

@Data
public class Vote2 implements Record{
    private Long id;
    private Integer version;

    @Override
    public void serialize(OutputArchive archive, String tag) throws IOException {
        archive.startRecord(this, tag);
        archive.writeLong(id, "id");
        archive.writeInt(version, "version");
        archive.endRecord(this, tag);
    }

    @Override
    public void deserialize(InputArchive archive, String tag) throws IOException {
        archive.startRecord(tag);
        id = archive.readLong("id");
        version = archive.readInt("version");
    }
}


// 测试类
public class TestSer2 {

    public static final String FILE_PATH = "/tmp/vote2";

    public static void main(String[] args) throws IOException {
    // 序列化操作
        OutputStream outputStream = new FileOutputStream(new File(FILE_PATH));
        BinaryOutputArchive binaryOutputArchive = BinaryOutputArchive.getArchive(outputStream);
        // 序列化同步对象
        Vote2 vote2 = new Vote2();
        vote2.setId(30L);
        vote2.setVersion(12);
        binaryOutputArchive.writeRecord(vote2, "vote2");
        // 序列化map
        TreeMap<String, Integer> map = new TreeMap<String, Integer>();
        map.put("age1", 25);
        map.put("age2", 25);
        Set<String> keys = map.keySet();
        // 调用startMap方法
        binaryOutputArchive.startMap(map, "map");
        int i = 0;
        for (String key: keys) {
            // 依次写入
            binaryOutputArchive.writeString(key, "map");
            binaryOutputArchive.writeInt(map.get(key), "map");
            i++;
        }

        binaryOutputArchive.endMap(map, "map");


        // 反序列化
        InputStream inputStream = new FileInputStream(new File(FILE_PATH));
        BinaryInputArchive binaryInputArchive = BinaryInputArchive.getArchive(inputStream);

        Vote2 v2 = new Vote2();
        binaryInputArchive.readRecord(v2, "vote2");
        System.out.println(v2);

        /**
         * 可以看出可以和tag无关，但是和写入顺序有关。tag只是用来标识的，比如标识异常
         */
        Index index = binaryInputArchive.startMap("map2");
        while (!index.done()) {
            System.out.println("key = " + binaryInputArchive.readString("map1")
                    + ", value = " + binaryInputArchive.readInt("map1"));
            index.incr();
        }
        binaryInputArchive.endMap("map2");
 
    }
}

// 打印信息
Vote2(id=30, version=12)
key = age1, value = 25
key = age2, value = 25

5. 数据描述语言

上面的Vote2类我们可以看出，其实实现Record的类只要定义好属性，序列化和反序列化的代码可以相应的写出来。在只有简单的几个需要序列化的类，这样的编写方式还不会让我们觉得很麻烦。但是如果需要序列化的类有几十上百个这就变成了一个大麻烦。为了解决它，可以使用数据描述语言。只要编写少量的代码就可以实现一样的功能，还可以防止不小心写错的问题。

在zookeeper/src下可以找到zookeeper.jute文件，文件的大致内容如下：

// module指定了包名
module org.apache.zookeeper.data { 
// class指定类名
    class Id {
    // 字段定义
        ustring scheme;
        ustring id;
    }
    class ACL {
        int perms;
        Id id;
    }

为了将这个数据描述语言编写的文件转换为不同语言支持的文件，我们还需要执行一些操作
jute数据描述语言核心类
可以执行上图中的RCC类生成想要的类文件：

// 默认生成的就是java的类文件，如果需要生成其他语言的可以查看RCC源码
 Rcc.main(new String[]{"zookeeper.jute"});

参考资料：
http://www.cnblogs.com/leesf456/p/6278853.html