Write-back cache: Battery vs Disk

Write-back cache: Battery vs Disk

Posted in Storage Interconnects & RAID, Advisor - Tom by Tom Treadway

Question to the Storage Advisors, from anonymous: Which is better: (a) backup battery for cache as found on OEM RAID controllers or (b) writing cache content to one or more disk drives?

Good question. For those not verse in the dark arts of cache write-back strategies, we’re talking about methods for protecting user data that has been written by the OS but hasn’t actually made it to the media yet. It’s common for a disk controller to improve write performance by accepting the data from the OS and saying that it’s been written to disk, when in reality it’s still in memory (or disk, as suggested by the poster’s question). This technique is referred to as “write-back” because the data is written in the background. The opposite of write-back is “write-through” where the controller really does write the data to disk before telling the OS that it’s finished.

[Note that controllers aren’t the only things that have a write-back cache - the OS and drives also have one. But to avoid complicating a somewhat simple question, let’s just ignore those other caches for now. The OS has ways of protecting itself, and drive caches should be disabled if a write-back controller is being used.]

It’s very important that this un-written controller data is protected because it’s the only copy of that data. The OS thinks that the data is written to disk and therefore purges it from memory or wherever it came from. If a power failure occurs before this write-back data is written to disk, then it’s permanently lost. With large caches we’re talking about 100’s of MBs of data. And it can even be worse than that because the missing writes could be to a file structure or database, resulting in massive corruption and loss of files that aren’t even being accessed. It can be a real mess. And the user won’t know about it until they read back corrupted data – which isn’t always obvious.

So, on to the question: What’s the best way to protect this unwritten data? The most common approach is to simply put a battery on the disk controller. If power is lost to the system, including to the drives, the controller memory will transition to battery-backed mode and preserve any write-back data that hadn’t made it to disk yet. The battery is typically selected to provide at least 72 hours of backup time – protecting data across a weekend.

An alternative method, as suggested by the poster, is to save this write-back data to disk. There are different ways to implement this, but the most common is to “simply” store the data in a transaction log on the disk. Now, note that the data is typically stored on the disk (in either method – battery or log) using some form of RAID, protecting against data loss due to a drive failure. RAID-5 is a pretty commonly selected RAID level, but has very poor random write performance – a problem which just happens to be greatly alleviated by some form of write-back cache. So for this example, let’s assume that RAID is being used. This means that the write-back data being logged to disk should also be protected from disk failure. The easiest way to do this is to simply write the log file to two disks. (Some users prefer RAID-6 which protects against two drive failures, in which case the transaction log should be written to three disks!)

OK, now let’s look at the pros and cons of the controller-based battery and disk-based log approaches.

Backup Protection Time: A battery has a limited storage time – around 72 hours as previously pointed out. However a transaction log on disk can last almost indefinitely, i.e., the lifetime of the drive. So here the advantage clearly goes to disk-based logs. (BTW, some folks are looking at ways to automatically move the controller cache data to a more permanent storage device, like CompactFlash, allowing controller backup times similar to transaction log backup times. So this will eventually become moot.)

Life Expectancy: Another issue with batteries is that they don’t last forever. They eventually degrade and fail, lasting maybe a few years before they need to be replaced. Drives obviously don’t have this issue.

Capacity: This one is really a nit, but I figured I’d list it to be complete. If a controller has 256MB of memory, for example, then the transaction log will require 2×256MB of disk space, or 512MB. With 1TB drives, this one is a big fat “don’t care”.

Cost: Batteries and the associated circuitry probably add about $100 to the user-cost of a controller, while 512MB of disk space for the log is practically free. $100 might be a big deal for a home user (who probably doesn’t need RAID or write-back cache anyway), but it’s just another nit for serious IT folks. Once you add up the price of the motherboard, OS, drives, etc., $100 is in the noise.

Performance: So far the advantage has clearly gone to the disk log, but performance is probably the most important factor when choosing a cache backup protection method. With a battery-backed controller there are no additional steps to protect the data in cache. “It just works.” Of course there is a lot of magic in the hardware design to make it “just work”, but that has no effect on the performance. On the other hand, with disk-based logs the data has to be written to two different disks. This will probably entail two seeks, assuming that the drives had been servicing requests in some other section of the media. And eventually, that logged data will have to be read back from disk and moved to the permanent location – causing two more seeks and reads. So now a single OS write will cause four additional IOs to the disks.

So how the heck do we figure out the performance hit due to these four IOs? Let’s try this crude method:

Assume that RAID-5 is being used. Therefore each random OS write will cause four disk IOs - two reads and two writes. With disk-based logging there are four additional IOs to log and “un-log” the data for a total of eight IOs. Using this approach we can see that disk logging has twice as many IOs as controller battery-protected cache, therefore you get about a 2X difference in performance. Of course real performance modeling will be more complex than this, so just squint at the numbers and figure that the difference is anywhere from 50% to 150%. That’s a big dang difference.

The bottom line is that most users that are concerned with performance aren’t concerned with saving $100, therefore battery-backed cache is clearly the winner.

Enjoy,
TT

8 Responses to “Write-back cache: Battery vs Disk”

  1. maobo Says:

    What I want to say is when there are two controllers then the battery-backed cache will be twice the capacity as you said? What I mean is that the Active-to-Active condition.

  2. Tom Says:

    That’s correct, Mao. If a pair of controllers mirror their cache to each other in an active/active configuration, then each controller will need twice as much cache.

    TT

  3. maobo Says:

    So there should be a high speed channel between the two controllers to execute the cache mirror. For example Giga ethernet,FC channel, IB, or PCI-e and so on. But another thing I want to know is that which technology do they use to do the cache mirroring: RDMA, or something else? Thank you very much.

  4. Tom Says:

    Yes, Mao. There should be a high speed channel betweeen the controllers for active/active. It’s common to use the same back-end channel as the drive interconnet, such as SAS. But some designs use the front-end channel, such as iSCSI or FC. GbE is probably too slow. PCIe would be nice, and it’s starting to show up in some dual-controller boxes, such defined by the Storage Bridge Bay spec.

    As far as the technology, it’s all very vendor unique. RDMA would certainly be a nice choice, but a more common choice would be block storage interface commands.

    TT

  5. maobo Says:

    Now one more question: the cache only reserve the write data and commands. The others(for example the read command) are not synchronous, are they? Thank you very much.

  6. Tom Says:

    Mao, if I understand your question, yes, only the write data is synchronized to the other controller.

    However most of what I’ve been talking about is synchronization only for the sake of redundancy, i.e., in case a controller fails with dirty data in it’s cache. If you take dual controllers to the next level then you can start talking about redundant and concurrent paths to the same data. In other words, a host can issue a read command to either of the controllers - perhaps basing the decision of which controller on its view of which controller is least busy.

    If the controller pair supports this mode of operation, then every read will have to be “checked” against the other controller to see if a more recent version is in it’s cache and perhaps hasn’t made it to the other controller yet. As you might imagine, supporting this mode can be extremely difficult, and I don’t believe it’s very common in “normal” storage. I say “normal” because super-high-end, crazy-ass storage on mainframes may support this read-snooping, but that’s out of my league.

    TT

  7. Neelesh Says:

    Does IBM and Dell support 512MB battery-backed write cache for RAID 5 or more. I need to find whether they support or not and if possible some references to websites or URL’s where i can locate the info.

  8. SJS Says:

    Why not implement this storage on something like Compact Flash that will survive a power outage?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值