how to align lun to RAID strip size on linux

http://www.pythian.com/news/411/aligning-asm-disks-on-linux/

Aligning ASM Disks on Linux

Posted by Christo Kutrovsky on Mar 30, 2007

Linux is a wonderful operating system. However there are a number of things that one needs to do to make sure it runs as efficiently as possible. Today, I would like to share one of them. It has to do with using ASM (Automatic Storage Manager) disks.

In Linux, there are 2 major ways to create ASM disks.

  1. you can use ASMlib kernel driver
  2. you can use devmapper devices

You could also use /dev/raw devices, but I don’t recommend this at all. I will write another blog explaining why.

Regardless of which approach you take, you have to create partitions on your LUNs. Starting with version 2, ASMlib won’t let you use the entire disk. You have to create a partition.

The reason to force the creation of this partition is to make explicit that something exists on that device, and that it’s not empty. Otherwise, some OS tools assume the disk is unused and could mark it, or just begin using it, and override your precious Oracle data.

(Read more after the jump.)

Most people would use the “fdisk” command provided by Linux distributions. This command is quite old, and so has some old-fashioned DOS-style. behaviours built in.

When you create your partition, by default, the unit of measure is based on cylinders. Here’s a typical print command from fdisk on a 35 GB disk:

 
  

Notice where it says Units. 2048 cylinders, 512 bytes each = 1 MB. So your units are 1 MB. only when the disk is relatively small.

When your disk is larger — which is far more typical in the database world, especially with raid arrays — the Units change:

 
  

Take a look at the Units number. It’s 8 MB minus 159.5 kB. This is a very weird number, totally misaligned with any possible stripe size or stripe with (stride).

This, by itself, is not a big deal, since the best practice is to have 1 partition per LUN, which represents the entire device. However, this is not the end of it. If you switch to sector mode, you will see what the true start offset is:

 
  

Notice the start sector, 63th sector. That’s the 31.5 kB boundary. This value does not align with any stripe size. Stripe sizes are usually multiples of 2 and above 64 kB.

The result is, every so often, a block will be split between 2 separate hard disks, and the data will be returned at the speed of the slower (busier) device.

Assuming the typical 64 kB stripe (way too low, as I will discuss in another blog), and 8 kB database block size, every 8th block will be split between 2 devices. If you do the math, that’s about 12% of all your I/O. Not a significant number, but when you consider how discs are arranged in RAID 5, instead of a logical write being 2 reads and 2 writes (data+checksum, update, then write them back), each logical write could be 4 reads and 4 writes, significantly increasing your disk activity.

The solution?

Before you create your partitions, switch to “sector” mode, and create your partitions at an offset that is a power of 2.

I typically create my partitions at the 16th megabyte (32768th sector). Essentially, I “waste” 16 MBs, but gain aligned I/O for stripe width of up to 16 MB.

The procedure to create an aligned disk is:

 
  

This way, the disk is aligned. When it is aligned at least on 1MB, then the ASM files will be also aligned at the 1MB boundary.

This alignment also applies to ext3 file systems. This file system takes it a step further, allowing you to provide the array stride as a parameter during creation, optimizing writing performance (I have not tested this). Look in the man pages for more information.


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/228190/viewspace-695737/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/228190/viewspace-695737/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值