WHEN TO (AND NOT TO) USE RAID-Z

WHEN TO (AND NOT TO) USE RAID-Z   RAID-Z is the technology  used by ZFS  to implement a data-protection  scheme which is less  costly  than  mirroring  in  terms  of  block overhead.  Here,  I'd  like  to go  over,    from a theoretical standpoint,   the performance implication of using RAID-Z.   The goal of this technology is to allow a storage subsystem to be able  to deliver the stored data in  the face of one  or more disk   failures.  This is accomplished by joining  multiple disks into  a  N-way RAID-Z  group. Multiple  RAID-Z groups can be dynamically striped to form a larger storage pool.  To store file data onto  a RAID-Z group, ZFS  will spread a filesystem (FS) block onto the N devices that make up the  group.  So for each FS block,  (N - 1) devices  will  hold file  data  and 1 device will hold parity  information.   This information  would eventually   be used to reconstruct (or  resilver) data in the face  of any device failure. We thus  have 1 / N  of the available disk  blocks that are used to store the parity  information.   A 10-disk  RAID-Z group  has 9/10th of  the blocks effectively available to applications.  A common alternative for data protection, is  the use of mirroring. In this technology, a filesystem block is  stored onto 2 (or more) mirror copies.  Here again,  the system will  survive single disk failure (or more with N-way mirroring).  So 2-way mirror actually delivers similar data-protection at   the expense of   providing applications access to only one half of the disk blocks.  Now  let's look at this from  the performance angle in particular that of  delivered filesystem  blocks  per second  (FSBPS).  A N-way RAID-Z group  achieves it's protection  by spreading a  ZFS block  onto the N underlying devices.  That means  that a single  ZFS block I/O must  be converted to N device I/Os.  To be more precise,  in order to acces an ZFS block, we need N device I/Os for Output and (N - 1) device I/Os for input as the parity data need not generally be read-in.  Now after a request for a  ZFS block has been spread  this way, the IO scheduling code will take control of all the device  IOs that needs to be  issued.  At this  stage,  the ZFS  code  is capable of aggregating adjacent  physical   I/Os  into   fewer ones.     Because of  the  ZFS Copy-On-Write (COW) design, we   actually do expect this  reduction in number of device level I/Os to work extremely well  for just about any write intensive workloads.  We also expect  it to help streaming input loads significantly.  The situation of random inputs is one that needs special attention when considering RAID-Z.  Effectively,  as  a first approximation,  an  N-disk RAID-Z group will behave as   a single   device in  terms  of  delivered    random input IOPS. Thus  a 10-disk group of devices  each capable of 200-IOPS, will globally act as a 200-IOPS capable RAID-Z group.  This is the price to pay to achieve proper data  protection without  the 2X block  overhead associated with mirroring.  With 2-way mirroring, each FS block output must  be sent to 2 devices. Half of the available IOPS  are thus lost  to mirroring.  However, for Inputs each side of a mirror can service read calls independently from one another  since each  side   holds the full information.    Given a proper software implementation that balances  the inputs between sides of a mirror, the  FS blocks delivered by a  mirrored group is actually no less than what a simple non-protected RAID-0 stripe would give.  So looking  at random access input  load, the number  of FS blocks per second (FSBPS), Given N devices to be grouped  either in RAID-Z, 2-way mirrored or simply striped  (a.k.a RAID-0, no  data protection !), the equation would  be (where dev  represents   the capacity in  terms  of blocks of IOPS of a single device):

Random 

Blocks Available

FS Blocks / sec

----------------

-------------- RAID-Z

(N - 1) \* dev

1 \* dev

Mirror

(N / 2) \* dev

N \* dev

Stripe

N \* dev

N \* dev

 

Now lets take 100 disks of 100 GB, each each capable of 200 IOPS and look at different possible configurations; In the table below the configuration labeled:

"Z 5 x (19+1)" refers to a dynamic striping of 5 RAID-Z groups, each group made of 20 disks (19 data disk + 1 parity). M refers to a 2-way mirror and S to a simple dynamic stripe.

Random

Config

Blocks Available

FS Blocks /sec

------------

----------------

---------

Z 1 x (99+1)

9900 GB

200

Z 2 x (49+1)

9800 GB

400

Z 5 x (19+1)

9500 GB

1000

Z 10 x (9+1)

9000 GB

2000

Z 20 x (4+1)

8000 GB

4000

Z 33 x (2+1)

6600 GB

6600

M 2 x (50)

5000 GB

20000

S 1 x (100) 10000 GB

20000

So RAID-Z gives you at most 2X the number of blocks that mirroring provides but hits you with much fewer delivered IOPS. That means that, as the number of devices in a group N increases, the expected gain over mirroring (disk blocks) is bounded (to at most 2X) but the expected cost in IOPS is not bounded (cost in the range of [N/2, N] fewer IOPS). Note that for wide RAID-Z configurations, ZFS takes into account the sector size of devices (typically 512 Bytes) and dynamically adjust the effective number of columns in a stripe. So even if you request a 99+1 configuration, the actual data will probably be stored on much fewer data columns than that. Hopefully this article will contribute to steering deployments away from those types of configuration. In conclusion, when preserving IOPS capacity is important, the size of RAID-Z groups should be restrained to smaller sizes and one must accept some level of disk block overhead. When performance matters most, mirroring should be highly favored. If mirroring is considered too costly but performance is nevertheless required, one could proceed like this:

 

Given N devices each capable of X IOPS.

 

Given a target of delivered Y FS blocks per second

 

for the storage pool.

 

Build your storage using dynamically striped RAID-Z groups of

 

(Y / X) devices. For instance:

 

Given 50 devices each capable of 200 IOPS.

 

Given a target of delivered 1000 FS blocks per second

 

for the storage pool.

 

Build your storage using dynamically striped RAID-Z groups of

 

(1000 / 200) = 5 devices. In that system we then would have 20% block overhead lost to maintain RAID-Z level parity. RAID-Z is a great technology not only when disk blocks are your most precious resources but also when your available IOPS far exceed your expected needs. But beware that if you get your hands on fewer very large disks, the IOPS capacity can easily become your most precious resource. Under those conditions, mirroring should be strongly favored or alternatively a dynamic stripe of RAID-Z groups each made up of a small number of devices.

转载于:https://my.oschina.net/CasparLi/blog/1542343

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值