test zfs dedup vs compress which suit in your environment-CSDN博客

你的数据适合压缩还是适合开启去重?

这个可以拿到你的数据进行评估, 在一个已有使用zdb -S zpname进行评估, 然后使用zdb -DD产生一个报告.

例如 :

对zp1 pool的数据块, 采样评估

[root@db- ~]# zdb -S zp1

报告deduplicate table

[root@db- ~]# zdb -DD zp1

DDT-sha256-zap-duplicate: 272229 entries, size 291 on disk, 141 in core

DDT-sha256-zap-unique: 71346264 entries, size 314 on disk, 167 in core

DDT histogram (aggregated over all DDTs):

bucket allocated referenced

______ ______________________________ ______________________________

refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE

------ ------ ----- ----- ----- ------ ----- ----- -----

1 68.0M 8.50T 1.18T 1.18T 68.0M 8.50T 1.18T 1.18T

2 264K 32.8G 19.3G 19.3G 529K 65.7G 38.5G 38.5G

4 1.64K 210M 886K 886K 7.93K 1015M 4.16M 4.16M

8 330 41.0M 167K 167K 3.18K 404M 1.61M 1.61M

16 59 7.25M 29.5K 29.5K 1.33K 168M 684K 684K

32 69 8.62M 34.5K 34.5K 3.18K 406M 1.59M 1.59M

64 85 10.6M 42.5K 42.5K 8.07K 1.01G 4.04M 4.04M

128 72 9M 36K 36K 11.4K 1.43G 5.72M 5.72M

256 59 7.38M 29.5K 29.5K 17.5K 2.19G 8.76M 8.76M

512 5 640K 2.50K 2.50K 4.26K 546M 2.13M 2.13M

Total 68.3M 8.54T 1.19T 1.19T 68.6M 8.58T 1.21T 1.21T

dedup = 1.02, compress = 7.06, copies = 1.00, dedup * compress / copies = 7.18

从结果来看, 压缩比为7.06, dedup为1.02, 显然这份数据更适合开启压缩.

另外, dedup需要维持数据块的内存表(DDT), 跟踪deduplicate.

那么需要多少内存呢? 一般每个数据块需要320字节的内存.

根据zdb -DD zp1的输出, blocks总共有68.3M个, 那么DDT的大小是68.3M*320约21GB内存.

如果内存不够, 又需要开启dedup的话, 建议增加ssd作为L2ARC来保持DDT.

在没有数据的情况下, 没有办法对其进行评估. 所以至少需要一些测试数据.

我这里的测试数据基本上是一些PostgreSQL csvlog文本文件和tar.bz2文件, 看起来并不适合dedup.

[注意]

1. 压缩和去重一样, 都只对enable后的数据提交生效, 已经存在的数据不会被压缩或去重. 只有enable后加入的数据或修改的数据才能被压缩和去重.

2. 去重的flush操作不是原子操作, 所以断电可能导致数据受损.

Further, deduplicated data is not flushed to disk as an atomic transaction. Instead, the blocks are written to disk serially, one block at a time. Thus, this does open you up for corruption in the event of a power failure before the blocks have been written.

3. 去重的属性是在dataset级别设置的, 但是去重是整个zpool中进行的.

查看去重率也是在zpool get all中看到的.

[root@db- ~]# zpool get all

NAME PROPERTY VALUE SOURCE

zp1 size 9.75T -

zp1 capacity 12% -

zp1 altroot - default

zp1 health ONLINE -

zp1 guid 5877722976139588848 default

zp1 version - default

zp1 bootfs - default

zp1 delegation on default