Redis为persistent付出2倍的memory，值得不？

最新推荐文章于 2023-06-28 03:51:09 发布

macyang

最新推荐文章于 2023-06-28 03:51:09 发布

阅读量2.8k

点赞数

分类专栏： database/nosql 文章标签： redis idempotent command file server types

本文链接：https://blog.csdn.net/macyang/article/details/6257553

版权

database/nosql 专栏收录该内容

102 篇文章 0 订阅

订阅专栏

下面是redis作者写的一篇文章，里面阐述了为什么redis不使用compact方法去合并aof文件！

文章来源：A few key problems in Redis persistence Saturday, 02 October 10

推荐先阅读： http://redis.io/topics/persistence

Redis: the strength is the data model, and the deficiency is the persistence.

We want two things that are hard to play well together:

Programmers want complex data types, the ability to use 60 years of computer science not just in the language, but in the database too. I think that even if Redis will be obsoleted in six months and I'll quit programming and do molecular cuisine, a point here is shown: plain key-value is not a cool world where programming inside. To have real data structures and atomic operations is a completely different level of abstraction.
We want good persistence.

snapshotting为什么使用cow方式，而不是下面的方式：

Iterate the keys, and write every key on disk
But since you are doing this in a non blocking way, that is, the server is also accepting queries you also need...
To track every time a key changed. That is, if we are snapshotting and there is a write against a given key that was not already transfered on disk, we need to make a copy of the old value (or remember the key was not existing at all), and use this old copy when we'll write this key on disk.

为了解决snapshotting可能丢失数据的问题，另外一个办法就是用AOF，但是使用它是有代价的：

The AOF file size is proportional to the number of writes. It will get bigger and bigger.
The bigger the AOF file is, the longer the server will take to restart.
So we need a way to compact the AOF from time to time. Again in a non blocking way, while the server is running receiving read and write queries.

那么redis是如何解决AOF越来越大的问题呢？ redis使用的还是COW

What Redis does in order to compact the AOF is rewriting it from scratch. This means to do something very similar to writing the point in time snapshot but just into a format that happens to be a valid sequence of Redis commands. So Redis does not read the old AOF to rebuild it. It instead reads what we have in memory to write a perfect (as small as possible) AOF from scratch. When the new AOF is in place, we do an atomic rename syscall swapping the old with the new.

This is done in the child process, again, it's basically exactly the same problem of dealing with the point in time snapshot, but with the additional problem (that is easy to fix) of accumulating the new queries while the AOF rewrite is in progress, so that before to swap the old with the new, we will also append all the new operations accumulated in the meantime.

这种方式导致的一种最坏情况就是内存 X 2，那为什么不去compact aof files呢？

Segment it into small pieces, in different physical files: AOF.1, AOF.2, AOF.3 ... Every time the AOF is big enough we open a new file and continue, let a backgroud process to merge the old ones.

针对key/value，如果仅仅有Set和Del，下面这种compact是可行的（去重）：

Read all the AOF files entries in sequence. Every time I encounter a SET I write this entry into an on-disk index, like a B-TREE or alike. What I need to do is mapping every key with the file and offset of the last time I saw a SET for this key.
If I encounter a DEL, I'll just remove the key from the temporary index I'm taking.
At the end of this process I take my temporary index, and write a new AOF key by key using the offset stored in the index.

Cool, but does not work for redis

This does not work when you have complex operations against aggregate data types.

To start, in order to even parse the AOF with the command line tool, this tool need to be Redis-complete. All the operations should be implemented, for instance intersections between sets, otherwise what I'll do when I encounter a SUNIONSTORE operation?

Also, in Redis most operations are not idempotent. Think about LPUSH for instance, the simplest of our list write operations. The only way to turn list operations into idempotent operations is to turn all of them into an MKLIST <key> ... all the elements ... operation. Not viable at all.

Our command line tool at best will be able to exploit SET and DEL operations in order to reduce the size of the file, but it will just loose against an always updated sorted set.

所以作者认为付出2 x memory的代价是值得的，至少目前这个是最好的解决办法！

Conclusions

To pay 2x of the memory in the worst case may not be so bad at this stage, because it is not trivial to find a really better solution with the Redis data model. Does this means we'll never improve this part of Redis? Absolutely not! We'll try hard to make it better, for instance possibly using a binary AOF format that is faster to write and to load, more compact. We may write a command line tool that is able to process the AOF multiple time only dealing with a subset of the keys at every pass in order to use less memory (but we have inter-key operations on aggregate data types, so this may only work well if such operations are not used). For sure we'll keep investigating.

Possibly new ideas will emerge. For instance currently I'm working at Redis Cluster. With the cluster it is possible to run many instances of Redis in the same computer in a transparent way. Every instance will save just his subset of data that can be much smaller than a single giant instance. Both snapshotting and AOF log rewrite will be simpler to perform.

Redis is young and there are open problems, but this is an interesting challenge I want to take ;)