Redis为persistent付出2倍的memory,值得不?

下面是redis作者写的一篇文章,里面阐述了为什么redis不使用compact方法去合并aof文件!

 

文章来源:A few key problems in Redis persistence Saturday, 02 October 10

 

推荐先阅读: http://redis.io/topics/persistence

 

Redis: the strength is the data model, and the deficiency is the persistence.

 

We want two things that are hard to play well together:

  • Programmers want complex data types, the ability to use 60 years of computer science not just in the language, but in the database too. I think that even if Redis will be obsoleted in six months and I'll quit programming and do molecular cuisine, a point here is shown: plain key-value is not a cool world where programming inside. To have real data structures and atomic operations is a completely different level of abstraction.
  • We want good persistence.

snapshotting为什么使用cow方式,而不是下面的方式:

 

  • Iterate the keys, and write every key on disk
  • But since you are doing this in a non blocking way, that is, the server is also accepting queries you also need...
  • To track every time a key changed. That is, if we are snapshotting and there is a write against a given key that was not already transfered on disk, we need to make a copy of the old value (or remember the key was not existing at all), and use this old copy when we'll write this key on disk.

 

为了解决snapshotting可能丢失数据的问题,另外一个办法就是用AOF,但是使用它是有代价的:

 

  • The AOF file size is proportional to the number of writes. It will get bigger and bigger.
  • The bigger the AOF file is, the longer the server will take to restart.
  • So we need a way to compact the AOF from time to time. Again in a non blocking way, while the server is running receiving read and write queries.

那么redis是如何解决AOF越来越大的问题呢? redis使用的还是COW

 

What Redis does in order to compact the AOF is rewriting it from scratch. This means to do something very similar to writing the point in time snapshot but just into a format that happens to be a valid sequence of Redis commands. So Redis does not read the old AOF to rebuild it. It instead reads what we have in memory to write a perfect (as small as possible) AOF from scratch. When the new AOF is in place, we do an atomic rename syscall swapping the old with the new.

This is done in the child process, again, it's basically exactly the same problem of dealing with the point in time snapshot, but with the additional problem (that is easy to fix) of accumulating the new queries while the AOF rewrite is in progress, so that before to swap the old with the new, we will also append all the new operations accumulated in the meantime.

 

这种方式导致的一种最坏情况就是内存 X 2,那为什么不去compact aof files呢?

 

Segment it into small pieces, in different physical files: AOF.1, AOF.2, AOF.3 ... Every time the AOF is big enough we open a new file and continue, let a backgroud process to merge the old ones.

 

针对key/value, 如果仅仅有Set和Del,下面这种compact是可行的(去重):

  • Read all the AOF files entries in sequence. Every time I encounter a SET I write this entry into an on-disk index, like a B-TREE or alike. What I need to do is mapping every key with the file and offset of the last time I saw a SET for this key.
  • If I encounter a DEL, I'll just remove the key from the temporary index I'm taking.
  • At the end of this process I take my temporary index, and write a new AOF key by key using the offset stored in the index.

Cool, but does not work for redis

This does not work when you have complex operations against aggregate data types.

To start, in order to even parse the AOF with the command line tool, this tool need to be Redis-complete. All the operations should be implemented, for instance intersections between sets, otherwise what I'll do when I encounter a SUNIONSTORE operation?

Also, in Redis most operations are not idempotent. Think about LPUSH for instance, the simplest of our list write operations. The only way to turn list operations into idempotent operations is to turn all of them into an MKLIST <key> ... all the elements ... operation. Not viable at all.

Our command line tool at best will be able to exploit SET and DEL operations in order to reduce the size of the file, but it will just loose against an always updated sorted set.

 

所以作者认为付出2 x memory的代价是值得的,至少目前这个是最好的解决办法!

 

Conclusions

To pay 2x of the memory in the worst case may not be so bad at this stage, because it is not trivial to find a really better solution with the Redis data model. Does this means we'll never improve this part of Redis? Absolutely not! We'll try hard to make it better, for instance possibly using a binary AOF format that is faster to write and to load, more compact. We may write a command line tool that is able to process the AOF multiple time only dealing with a subset of the keys at every pass in order to use less memory (but we have inter-key operations on aggregate data types, so this may only work well if such operations are not used). For sure we'll keep investigating.

Possibly new ideas will emerge. For instance currently I'm working at Redis Cluster. With the cluster it is possible to run many instances of Redis in the same computer in a transparent way. Every instance will save just his subset of data that can be much smaller than a single giant instance. Both snapshotting and AOF log rewrite will be simpler to perform.

Redis is young and there are open problems, but this is an interesting challenge I want to take ;)

 


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值