zookeeper 存储之文件格式分析

 
 

zookeeper主要存放了两类文件,一个是snapshot和log,前者是内存数的快照,后者类似mysql的binlog,将所有与修改数据相关的操作记录在log中,

两类文件的目录可在配置文件中指定

下面通过几个典型的场景来分析两种文件的存储格式

snapshot文件格式

详见ZooKeeperServer.takeSnapshot,

列举1个简单的场景说明问题

场景 刚刚装了zookeeper,服务启动后会产生snapshot文件

000000005a4b534e00000002ffffffffffffffff|ZKSN............|

00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000020 ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000060 00 00 00 00 00 00 00 0a 2f 7a 6f 6f 6b 65 65 70 |......../zookeep| 00000070 65 72 00 00 00 00 ff ff ff ff ff ff ff ff 00 00 |er..............| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 10 2f 7a |............../z| 000000c0 6f 6f 6b 65 65 70 65 72 2f 71 75 6f 74 61 00 00 |ookeeper/quota..| 000000d0 00 00 ff ff ff ff ff ff ff ff 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000110 00 00 00 00 00 00 00 00 00 01 2f 00 00 00 00 ab |........../.....| 00000120 10 2b d2 00 00 00 01 2f |.+...../| 00000128


分成4个部分

a)header

  • magic:4个字节,“ZKSN”的int值对应0x5a4b534e【对应偏移地址0x00000000---0x00000003

  • version:4个字节,默认为2对应0x00000002【对应偏移地址0x00000004---0x00000007】

  • dbid:8个字节,默认为-1对应0xffffffffffffffff【对应偏移地址0x00000008---0x0000000f】

b)data

  • count:session个数,4个字节此时为0,如果不为0,会存放session的id和timeout,对应0x00000000【对应偏移地址0x00000010---0x00000013】

  • 内存树:

    • map:acl映射的个数4个字节,此时为0对应0x00000000【对应偏移地址0x00000013---0x00000017

    • 开始递归写node

      • 第一个node路径为""也即根节点

        • path

          • len:4个字节,此时为0对应0x00000000【对应偏移地址0x00000017---0x0000001b

        • node

          • data

            • len:4个字节,此时为0,对应0x00000000【对应偏移地址0x0000001c---0x0000001f

          • acl:8个字节,此时-1,对应0xffffffffffffffff【对应偏移地址0x00000020---0x00000027】

          • statpersisted:状态存储

            • czxid:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x00000028---0x0000002f

            • mzxid:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x00000030---0x00000037】

            • ctime:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x00000038---0x0000003f】

            • mtime:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x00000040---0x00000047】

            • version:4个字节,此时为0,对应0x000000【对应偏移地址0x00000048---0x0000004b】

            • cversion:4个字节,此时为0,对应0x000000【对应偏移地址0x0000004c---0x0000004f】

            • aversion:4个字节,此时为0,对应0x000000【对应偏移地址0x00000050---0x00000053】

            • ephemeralOwner:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x00000054---0x0000005b】

            • pzxid:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x0000005c---0x00000063】

      • 开始序列化第2个节点,即根节点的子节点(/zookeeper)

          • path

            • len:4个字节,此时为/zookeeper的长度10对应0x0000000a【对应偏移地址0x00000064---0x00000067

            • 内容:10个字节此时为“/zookeeper”的ascii表示0x2f7a6f6f6b6565706572【对应偏移地址0x00000068---0x00000071


          • node:此时和节点一样,下面的字节和根节点一样

            • data

              • len:4个字节,此时为0,对应0x00000000【对应偏移地址0x00000072---0x00000075

            • acl:8个字节,此时-1,对应0xffffffffffffffff【对应偏移地址0x00000076---0x0000007d】

            • statpersisted:状态存储

              • czxid:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x0000007e---0x00000085

              • mzxid:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x00000086---0x0000008d】

              • ctime:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x0000008e---0x00000095】

              • mtime:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x00000096---0x0000009d】

              • version:4个字节,此时为0,对应0x000000【对应偏移地址0x0000009e---0x000000a1】

              • cversion:4个字节,此时为0,对应0x000000【对应偏移地址0x000000a2---0x000000a5】

              • aversion:4个字节,此时为0,对应0x000000【对应偏移地址0x000000a6---0x000000a9】

              • ephemeralOwner:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x000000aa---0x000000b1】

              • pzxid:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x000000b2---0x000000b9】

        • 开始序列化第3个节点(/zookeeper的子节点/zookeeper/quota

          • path

            • len:16个字节,此时为/zookeeper/quota“的长度16,对应0x00000010【对应偏移地址0x000000ba---0x000000bd

            • 内容:10个字节此时为”/zookeeper/quota“的ascii表示0x2f7a6f6f6b65657065722f71756f7461

              【对应偏移地址0x000000be---0x000000cd


          • node:此时和根节点一样,下面的字节和根节点一样

            • data

              • len:4个字节,此时为0,对应0x00000000【对应偏移地址0x000000ce---0x000000d1

            • acl:8个字节,此时-1,对应0xffffffffffffffff【对应偏移地址0x000000d2---0x000000d9】

            • statpersisted:状态存储

              • czxid:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x000000da---0x000000e1

              • mzxid:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x000000e2---0x000000e9】

              • ctime:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x000000ea---0x000000f1】

              • mtime:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x000000f2---0x000000f9】

              • version:4个字节,此时为0,对应0x000000【对应偏移地址0x000000fa---0x000000fd】

              • cversion:4个字节,此时为0,对应0x000000【对应偏移地址0x000000fe---0x00000101】

              • aversion:4个字节,此时为0,对应0x000000【对应偏移地址0x00000102---0x00000105】

              • ephemeralOwner:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x00000106---0x0000010d】

              • pzxid:8个字节,此时为0,对应0x0000000000000000【对应偏移地址0x0000010e---0x00000115】

    • 树的结尾以"/"结束

      • 共5个字节,前4个表示长度为1,后面是"/"的ascii码0x2f,总共是0x000000012f【对应偏移地址0x00000116---0x0000011a

c)校验码

通过Adler32校验算法,对前面的字节得出的一个校验码,占8个字节

0x00000000ab102bd2【对应偏移地址0x0000011b---0x00000122


d)结束符

和内存树一样以"/"为结束符

共5个字节,前4个表示长度为1,后面是"/"的ascii码0x2f,总共是0x000000012f【对应偏移地址0x00000123---0x00000127】


log文件格式

详见FileTxnLog.append

场景1)启动一个客户端

此时会跟据当前事务的id,此时为1,产生log.1的文件

1)fileheader

  • magic:4个字节,“ZKLG”的int值对应0x5a4b4c47【对应偏移地址0x00000000---0x00000003

  • version:4个字节,默认为2对应0x00000002【对应偏移地址0x00000004---0x00000007】

  • dbid:8个字节,默认为0对应0x0000000000000000【对应偏移地址0x00000008---0x0000000f】

2)请求内容

  • txnEntryCRC(校验码,对于下面的txEntry)

    • 采用和snapshot同样的算法Adler32得到的长整数8个字节0x0000000059270806【对应偏移地址0x00000010---0x00000017】

  • txEntry

    • 内容长度:4个字节0x00000024【对应偏移地址0x00000018---0x000001b】

    • hdr

    • clientId:长整数8个字节0x013a694e191a0000【对应偏移地址0x0000001c---0x00000023】

    • cxid:此时为整数0,4个字节0x00000000【对应偏移地址0x00000024---0x00000027】

    • zxid:此时为长整数1,8个字节0x0000000000000001【对应偏移地址0x00000028---0x0000002f】

    • time:长整数8个字节,0000013a694eabaf【对应偏移地址0x00000030---0x00000037】

    • type:操作码(码表见org.apache.zookeeper.ZooDefs.OpCode)此时为整数-10,4个字节0xfffffff6【对应偏移地址0x00000038---0x0000003b】

    • txn

      • timeOut:此时整数400000,4个字节0x00061a80【对应偏移地址0x0000003c---0x0000003f】

  • EOR

    • 写入一个固定的字节作为结尾:0x42【对应偏移地址0x00000040】


此时为

1
2
3
4
5
6
7
8
00000000 5a 4b 4c 47 00 00 00 02 00 00 00 00 00 00 00 00 |ZKLG............|
00000010 00 00 00 00 59 27 08 06 00 00 00 24 01 3a 69 4e |....Y'.....$.:iN|
00000020 19 1a 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |................|
00000030 00 00 01 3a 69 4e ab af ff ff ff f6 00 06 1a 80 |...:iN..........|
00000040 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |B...............|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
04000010




场景2)添加节点

zk.create("/root","mydata".getBytes(),Ids.OPEN_ACL_UNSAFE,

CreateMode.PERSISTENT);


开始添加txnEntry

  • txnEntryCRC(校验码)

采用和snapshot同样的算法Adler32得到的长整数8个字节,

0x00000000ba260c5e【对应偏移地址0x00000041---0x00000048


  • txnEntry

    • 内容长度:4个字节,0x00000053【对应偏移地址0x00000049---0x000004c

    • hdr:entry头信息

    • clientId:此时为长整数,8个字节0x013a694e191a0000【对应偏移地址0x0000004d---0x00000054

    • cxid:此时为整数2,4个字节,0x00000002【对应偏移地址0x00000055---0x00000058

    • zxid:此时为整数2,8个字节,0x0000000000000002【对应偏移地址0x00000059---0x00000060

    • time:此时为整数8个字节,0x0000013a694f3fa5【对应偏移地址0x00000061---0x00000068

    • type:此时为整数1,4个字节,0x00000001【对应偏移地址0x00000069---0x0000006c

    • txn:节点内容

    • path:此时为“/root”,占用9个字节,前4个表示长度5,后5个为"/root"5个字符的ascii码,0x000000052f726f6f74【对应偏移地址0x0000006d---0x00000075

    • data:此时为"mydata"的字节数组,占用10个字节,前4个为长度6,后6个为"mydata"的字节数组,

0x000000066d7964617461【对应偏移地址0x00000076---0x0000007f

开始写acl信息

  • acl:acl长度,占用4个字节,此时长度为1,0x00000001【对应偏移地址0x00000080---0x00000083

  • e1:一条acl具体信息

    • perms:4个字节,此时为整数31,0x0000001f【对应偏移地址0x00000084---0x00000087

    • id

      • scheme:此时为字符串“world”,占用9个字节,前4个为长度5,后5个world5个字符的ascii码,

0x00000005776f726c64【对应偏移地址0x00000088---0x00000090

  • id:此时为字符串“anyone”,占用10个字节,前4个为长度6,后6个anyone”6个字符的ascii码,

0x00000006616e796f6e65【对应偏移地址0x00000091---0x0000009a

ephemeral:此时为“false”,占用1个字节,如果true,写1,false写0,0x00【对应偏移地址0x0000009b

parentCVersion:此时为整数1,占用4个字节,0x00000001【对应偏移地址0x0000009c---0x0000009f


EOR

  • 写入一个固定的字节作为结尾:0x42【对应偏移地址0x000000a0



1
2
3
4
5
6
7
8
9
10
11
12
13
14
00000000 5a 4b 4c 47 00 00 00 02 00 00 00 00 00 00 00 00 |ZKLG............|
00000010 00 00 00 00 59 27 08 06 00 00 00 24 01 3a 69 4e |....Y'.....$.:iN|
00000020 19 1a 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |................|
00000030 00 00 01 3a 69 4e ab af ff ff ff f6 00 06 1a 80 |...:iN..........|
00000040 42 00 00 00 00 ba 26 0c 5e 00 00 00 53 01 3a 69 |B.....&.^...S.:i|
00000050 4e 19 1a 00 00 00 00 00 02 00 00 00 00 00 00 00 |N...............|
00000060 02 00 00 01 3a 69 4f 3f a5 00 00 00 01 00 00 00 |....:iO?........|
00000070 05 2f 72 6f 6f 74 00 00 00 06 6d 79 64 61 74 61 |./root....mydata|
00000080 00 00 00 01 00 00 00 1f 00 00 00 05 77 6f 72 6c |............worl|
00000090 64 00 00 00 06 61 6e 79 6f 6e 65 00 00 00 00 01 |d....anyone.....|
000000a0 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |B...............|
000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
04000010


场景3)再添加一个节点

StringrealPath=zk.create("/root/childone",

"childone".getBytes(),Ids.OPEN_ACL_UNSAFE,

CreateMode.PERSISTENT);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
00000000 5a 4b 4c 47 00 00 00 02 00 00 00 00 00 00 00 00 |ZKLG............|
00000010 00 00 00 00 59 27 08 06 00 00 00 24 01 3a 69 4e |....Y'.....$.:iN|
00000020 19 1a 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |................|
00000030 00 00 01 3a 69 4e ab af ff ff ff f6 00 06 1a 80 |...:iN..........|
00000040 42 00 00 00 00 ba 26 0c 5e 00 00 00 53 01 3a 69 |B.....&.^...S.:i|
00000050 4e 19 1a 00 00 00 00 00 02 00 00 00 00 00 00 00 |N...............|
00000060 02 00 00 01 3a 69 4f 3f a5 00 00 00 01 00 00 00 |....:iO?........|
00000070 05 2f 72 6f 6f 74 00 00 00 06 6d 79 64 61 74 61 |./root....mydata|
00000080 00 00 00 01 00 00 00 1f 00 00 00 05 77 6f 72 6c |............worl|
00000090 64 00 00 00 06 61 6e 79 6f 6e 65 00 00 00 00 01 |d....anyone.....|
000000a0 42 00 00 00 00 bc 21 10 aa 00 00 00 5e 01 3a 69 |B.....!.....^.:i|
000000b0 4e 19 1a 00 00 00 00 00 04 00 00 00 00 00 00 00 |N...............|
000000c0 03 00 00 01 3a 69 6a 30 9c 00 00 00 01 00 00 00 |....:ij0........|
000000d0 0e 2f 72 6f 6f 74 2f 63 68 69 6c 64 6f 6e 65 00 |./root/childone.|
000000e0 00 00 08 63 68 69 6c 64 6f 6e 65 00 00 00 01 00 |...childone.....|
000000f0 00 00 1f 00 00 00 05 77 6f 72 6c 64 00 00 00 06 |.......world....|
00000100 61 6e 79 6f 6e 65 00 00 00 00 01 42 00 00 00 00 |anyone.....B....|
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
04000010

这个也是产生节点,和上面类似,不再赘述,可以分析出来

字节范围从000000a1---0000010b

此次的cxid为4,zxid为3,type还是1


场景4)修改节点数据

zk.setData("/root/childone","childonemodify".getBytes(),-1);


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
00000000 5a 4b 4c 47 00 00 00 02 00 00 00 00 00 00 00 00 |ZKLG............|
00000010 00 00 00 00 59 27 08 06 00 00 00 24 01 3a 69 4e |....Y'.....$.:iN|
00000020 19 1a 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |................|
00000030 00 00 01 3a 69 4e ab af ff ff ff f6 00 06 1a 80 |...:iN..........|
00000040 42 00 00 00 00 ba 26 0c 5e 00 00 00 53 01 3a 69 |B.....&.^...S.:i|
00000050 4e 19 1a 00 00 00 00 00 02 00 00 00 00 00 00 00 |N...............|
00000060 02 00 00 01 3a 69 4f 3f a5 00 00 00 01 00 00 00 |....:iO?........|
00000070 05 2f 72 6f 6f 74 00 00 00 06 6d 79 64 61 74 61 |./root....mydata|
00000080 00 00 00 01 00 00 00 1f 00 00 00 05 77 6f 72 6c |............worl|
00000090 64 00 00 00 06 61 6e 79 6f 6e 65 00 00 00 00 01 |d....anyone.....|
000000a0 42 00 00 00 00 bc 21 10 aa 00 00 00 5e 01 3a 69 |B.....!.....^.:i|
000000b0 4e 19 1a 00 00 00 00 00 04 00 00 00 00 00 00 00 |N...............|
000000c0 03 00 00 01 3a 69 6a 30 9c 00 00 00 01 00 00 00 |....:ij0........|
000000d0 0e 2f 72 6f 6f 74 2f 63 68 69 6c 64 6f 6e 65 00 |./root/childone.|
000000e0 00 00 08 63 68 69 6c 64 6f 6e 65 00 00 00 01 00 |...childone.....|
000000f0 00 00 1f 00 00 00 05 77 6f 72 6c 64 00 00 00 06 |.......world....|
00000100 61 6e 79 6f 6e 65 00 00 00 00 01 42 00 00 00 00 |anyone.....B....|
00000110 af 4a 0f 23 00 00 00 48 01 3a 69 4e 19 1a 00 00 |.J.#...H.:iN....|
00000120 00 00 00 07 00 00 00 00 00 00 00 04 00 00 01 3a |...............:|
00000130 69 74 8f f3 00 00 00 05 00 00 00 0e 2f 72 6f 6f |it........../roo|
00000140 74 2f 63 68 69 6c 64 6f 6e 65 00 00 00 0e 63 68 |t/childone....ch|
00000150 69 6c 64 6f 6e 65 6d 6f 64 69 66 79 00 00 00 01 |ildonemodify....|
00000160 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |B...............|
00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
04000010

这个也是修改节点,和上面类似,不再赘述,可以分析出来

字节范围从0000010c---00000160


此次的cxid为7,zxid为4,type还是5(从org.apache.zookeeper.ZooDefs.OpCode看到5就是setData)




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值