Redis使用的基础数据结构是什么?

本文翻译自:What are the underlying data structures used for Redis?

I'm trying to answer two questions in a definitive list: 我想在一个明确的清单中回答两个问题:

  1. What are the underlying data structures used for Redis? Redis使用的基础数据结构是什么?
  2. And what are the main advantages/disadvantages/use cases for each type? 每种类型的主要优点/缺点/用例有哪些?

So, I've read the Redis lists are actually implemented with linked lists. 所以,我读过Redis列表实际上是用链表实现的。 But for other types, I'm not able to dig up any information. 但对于其他类型,我无法挖掘任何信息。 Also, if someone were to stumble upon this question and not have a high level summary of the pros and cons of modifying or accessing different data structures, they'd have a complete list of when to best use specific types to reference as well. 此外,如果有人偶然发现了这个问题并且没有对修改或访问不同数据结构的优缺点进行高级概述,那么他们就会有一个完整的列表,列出何时最好地使用特定类型来引用。

Specifically, I'm looking to outline all types: string, list, set, zset and hash. 具体来说,我想概述所有类型:字符串,列表,集,zset和哈希。

Oh, I've looked at these article, among others, so far: 哦,到目前为止,我已经看过这些文章,其中包括:


#1楼

参考:https://stackoom.com/question/eNxu/Redis使用的基础数据结构是什么


#2楼

Most of the time, you don't need to understand the underlying data structures used by Redis. 大多数情况下,您不需要了解Redis使用的基础数据结构。 But a bit of knowledge helps you make CPU v/s Memory trade offs. 但是一些知识可以帮助你进行CPU v / s内存折衷。 It also helps you model your data in an efficient manner. 它还可以帮助您以有效的方式建模数据。

Internally, Redis uses the following data structures : 在内部,Redis使用以下数据结构:

  1. String
  2. Dictionary 字典
  3. Doubly Linked List 双重链表
  4. Skip List 跳过清单
  5. Zip List 邮编列表
  6. Int Sets Int集
  7. Zip Maps (deprecated in favour of zip list since Redis 2.6) Zip地图(自Redis 2.6以来不推荐使用zip列表)

To find the encoding used by a particular key, use the command object encoding <key> . 要查找特定键使用的object encoding <key> ,请使用命令object encoding <key>

1. Strings 1.字符串

In Redis, Strings are called Simple Dynamic Strings, or SDS . 在Redis中,字符串称为简单动态字符串或简称SDS It's a smallish wrapper over a char * that allows you to store the length of the string and number of free bytes as a prefix. 它是char *上的一个小包装器,允许您将字符串的长度和空闲字节数作为前缀存储。

Because the length of the string is stored, strlen is an O(1) operation. 因为存储了字符串的长度,所以strlen是O(1)操作。 Also, because the length is known, Redis strings are binary safe. 此外,因为长度是已知的,Redis字符串是二进制安全的。 It is perfectly legal for a string to contain the null character . 字符串包含空字符是完全合法的。

Strings are the most versatile data structure available in Redis. 字符串是Redis中最通用的数据结构。 A String is all of the following: String是以下所有内容:

  1. A string of characters that can store text. 一串可以存储文本的字符。 See SET and GET commands. 请参阅SETGET命令。
  2. A byte array that can store binary data. 一个可以存储二进制数据的字节数组。
  3. A long that can store numbers. 一个可以存储数字的long See INCR , DECR , INCRBY and DECRBY commands. 请参阅INCRDECRINCRBYDECRBY命令。
  4. An Array (of chars , ints , longs or any other data type) that can allow efficient random access. 可以允许有效随机访问的数组( charsintslongs ints或任何其他数据类型)。 See SETRANGE and GETRANGE commands. 请参阅SETRANGEGETRANGE命令。
  5. A bit array that allows you to set or get individual bits. 一个位数组 ,允许您设置或获取单个位。 See SETBIT and GETBIT commands. 请参见SETBITGETBIT命令。
  6. A block of memory that you can use to build other data structures. 可用于构建其他数据结构的内存块。 This is used internally to build ziplists and intsets, which are compact, memory-efficient data structures for small number of elements. 这在内部用于构建ziplists和intsets,它们是用于少量元素的紧凑,内存有效的数据结构。 More on this below. 更多关于此的信息。

2. Dictionary 2.字典

Redis uses a Dictionary for the following: Redis使用字典进行以下操作:

  1. To map a key to its associated value, where value can be a string, hash, set, sorted set or list. 要将键映射到其关联值,其中value可以是字符串,散列,集,排序集或列表。
  2. To map a key to its expiry timestamp. 将密钥映射到其到期时间戳。
  3. To implement Hash, Set and Sorted Set data types. 实现散列,设置和排序集数据类型。
  4. To map Redis commands to the functions that handle those commands. 将Redis命令映射到处理这些命令的函数。
  5. To map a Redis key to a list of clients that are blocked on that key. 将Redis密钥映射到该密钥上阻止的客户端列表。 See BLPOP . BLPOP

Redis Dictionaries are implemented using Hash Tables . Redis Dictionaries使用哈希表实现。 Instead of explaining the implementation, I will just explain the Redis specific things : 我将解释Redis的具体内容,而不是解释实现:

  1. Dictionaries use a structure called dictType to extend the behaviour of a hash table. 字典使用名为dictType的结构来扩展哈希表的行为。 This structure has function pointers, and so the following operations are extendable: a) hash function, b) key comparison, c) key destructor, and d) value destructor. 此结构具有函数指针,因此以下操作是可扩展的:a)散列函数,b)密钥比较,c)密钥析构函数,以及d)值析构函数。
  2. Dictionaries use the murmurhash2 . 字典使用murmurhash2 (Previously they used the djb2 hash function , with seed=5381, but then the hash function was switched to murmur2 . See this question for an explanation of the djb2 hash algorithm .) (之前他们使用djb2哈希函数 ,种子= 5381,但是哈希函数被切换到murmur2 。请参阅这个问题以获得对djb2哈希算法的解释 。)
  3. Redis uses Incremental Hashing, also known as Incremental Resizing . Redis使用Incremental Hashing,也称为Incremental Resizing The dictionary has two hash tables. 该字典有两个哈希表。 Every time the dictionary is touched , one bucket is migrated from the first (smaller) hash table to the second. 每次触摸字典时,一个存储桶从第一个(较小的)哈希表迁移到第二个。 This way, Redis prevents an expensive resize operation. 这样,Redis可以防止昂贵的调整大小操作。

The Set data structure uses a Dictionary to guarantee there are no duplicates. Set数据结构使用Dictionary来保证没有重复项。 The Sorted Set uses a dictionary to map an element to its score, which is why ZSCORE is an O(1) operation. Sorted Set使用字典将元素映射到其分数,这就是ZSCORE是O(1)操作的原因。

3. Doubly Linked Lists 3.双重链接列表

The list data type is implemented using Doubly Linked Lists . list数据类型使用双向链接列表实现 Redis' implementation is straight-from-the-algorithm-textbook. Redis的实现是直接来自算法的教科书。 The only change is that Redis stores the length in the list data structure. 唯一的变化是Redis将长度存储在列表数据结构中。 This ensures that LLEN has O(1) complexity. 这确保了LLEN具有O(1)复杂度。

4. Skip Lists 4.跳过列表

Redis uses Skip Lists as the underlying data structure for Sorted Sets. Redis使用“ 跳过列表”作为“排序集”的基础数据结构。 Wikipedia has a good introduction. 维基百科有一个很好的介绍。 William Pugh's paper Skip Lists: A Probabilistic Alternative to Balanced Trees has more details. William Pugh的论文Skip Lists:平衡树的概率替代方案有更多细节。

Sorted Sets use both a Skip List and a Dictionary. 排序集使用“跳过列表”和“词典”。 The dictionary stores the score of each element. 字典存储每个元素的分数。

Redis' Skip List implementation is different from the standard implementation in the following ways: Redis的Skip List实现在以下方面与标准实现不同:

  1. Redis allows duplicate scores. Redis允许重复分数。 If two nodes have the same score, they are sorted by the lexicographical order . 如果两个节点具有相同的分数, 则按字典顺序对它们进行排序。
  2. Each node has a back pointer at level 0. This allows you to traverse elements in reverse order of the score. 每个节点都有一个0级的后向指针。这允许您以与分数相反的顺序遍历元素。

5. Zip List 5.邮编列表

A Zip List is like a doubly linked list, except it does not use pointers and stores the data inline. Zip列表就像一个双向链表,除了它不使用指针并将数据内联存储。

Each node in a doubly linked list has at 3 pointers - one forward pointer, one backward pointer and one pointer to reference the data stored at that node. 双向链表中的每个节点具有3个指针 - 一个前向指针,一个后向指针和一个指向存储在该节点处的数据的指针。 Pointers require memory (8 bytes on a 64 bit system), and so for small lists, a doubly linked list is very inefficient. 指针需要内存(64位系统上为8个字节),因此对于小型列表,双向链表效率非常低。

A Zip List stores elements sequentially in a Redis String. Zip列表按顺序在Redis字符串中存储元素。 Each element has a small header that stores the length and data type of the element, the offset to the next element and the offset to the previous element. 每个元素都有一个小标题,用于存储元素的长度和数据类型,到下一个元素的偏移量以及到前一个元素的偏移量。 These offsets replace the forward and backward pointers. 这些偏移取代了前向和后向指针。 Since the data is stored inline, we don't need a data pointer. 由于数据是内联存储的,因此我们不需要数据指针。

The Zip list is used to store small lists, sorted sets and hashes. Zip列表用于存储小列表,有序集和散列。 Sorted sets are flattened into a list like [element1, score1, element2, score2, element3, score3] and stored in the Zip List. 排序集被展平为[element1, score1, element2, score2, element3, score3]等列表并存储在Zip列表中。 Hashes are flattened into a list like [key1, value1, key2, value2] etc. 哈希被展平为[key1, value1, key2, value2]等列表。

With Zip Lists you have the power to make a tradeoff between CPU and Memory. 使用Zip Lists,您可以在CPU和内存之间进行权衡。 Zip Lists are memory-efficient, but they use more CPU than a linked list (or Hash table/Skip List). Zip列表具有内存效率,但它们使用的CPU比链表(或哈希表/跳过列表)多。 Finding an element in the zip list is O(n). 在zip列表中查找元素是O(n)。 Inserting a new element requires reallocating memory. 插入新元素需要重新分配内存。 Because of this, Redis uses this encoding only for small lists, hashes and sorted sets. 因此,Redis仅将此编码用于小型列表,哈希和有序集。 You can tweak this behaviour by altering the values of <datatype>-max-ziplist-entries and <datatype>-max-ziplist-value> in redis.conf. 您可以通过在<datatype>-max-ziplist-entries更改<datatype>-max-ziplist-entries<datatype>-max-ziplist-value>的值来调整此行为。 See Redis Memory Optimization, section "Special encoding of small aggregate data types" for more information. 有关详细信息请参阅Redis内存优化,“小型聚合数据类型的特殊编码”部分

The comments on ziplist.c are excellent, and you can understand this data structure completely without having to read the code. 对ziplist.c评论非常好,您可以完全理解这种数据结构,而无需阅读代码。

6. Int Sets 6. Int集

Int Sets are a fancy name for "Sorted Integer Arrays". Int Sets是“Sorted Integer Arrays”的奇特名称。

In Redis, sets are usually implemented using hash tables. 在Redis中,集合通常使用哈希表来实现。 For small sets, a hash table is inefficient memory wise. 对于小集合,散列表是低效的内存。 When the set is composed of integers only, an array is often more efficient. 当集合仅由整数组成时,数组通常更有效。

An Int Set is a sorted array of integers. Int Set是整数的排序数组。 To find an element a binary search algorithm is used. 为了找到元素,使用二进制搜索算法 This has a complexity of O(log N). 这具有O(log N)的复杂性。 Adding new integers to this array may require a memory reallocation, which can become expensive for large integer arrays. 向此数组添加新整数可能需要重新分配内存,这对于大型整数数组而言可能会变得昂贵。

As a further memory optimization, Int Sets come in 3 variants with different integer sizes: 16 bits, 32 bits and 64 bits. 作为进一步的存储器优化,Int Sets有3种不同整数的变体:16位,32位和64位。 Redis is smart enough to use the right variant depending on the size of the elements. Redis足够聪明,可以根据元素的大小使用正确的变体。 When a new element is added and it exceeds the current size, Redis automatically migrates it to the next size. 添加新元素并且它超过当前大小时,Redis会自动将其迁移到下一个大小。 If a string is added, Redis automatically converts the Int Set to a regular Hash Table based set. 如果添加了字符串,Redis会自动将Int Set转换为基于常规哈希表的集合。

Int Sets are a tradeoff between CPU and Memory. Int集是CPU和内存之间的权衡。 Int Sets are extremely memory efficient, and for small sets they are faster than a hash table. Int集合具有极高的内存效率,对于小集合,它们比散列表更快。 But after a certain number of elements, the O(log N) retrieval time and the cost of reallocating memory become too much. 但是经过一定数量的元素后,O(log N)检索时间和重新分配内存的成本变得过高。 Based on experiments, the optimal threshold to switch over to a regular hash table was found to be 512. However, you can increase this threshold (decreasing it doesn't make sense) based on your application's needs. 根据实验,切换到常规哈希表的最佳阈值为512.但是,您可以根据应用程序的需要增加此阈值(减少它没有意义)。 See set-max-intset-entries in redis.conf. 请参阅set-max-intset-entries

7. Zip Maps 7. Zip地图

Zip Maps are dictionaries flattened and stored in a list. Zip地图是平面化的词典并存储在列表中。 They are very similar to Zip Lists. 它们与Zip Lists非常相似。

Zip Maps have been deprecated since Redis 2.6, and small hashes are stored in Zip Lists. 自Redis 2.6以来,Zip地图已弃用,小哈希存储在Zip列表中。 To learn more about this encoding, refer to the comments in zipmap.c . 要了解有关此编码的更多信息,请参阅zipmap.c中注释


#3楼

Redis stores keys pointing to values. Redis存储指向值的键。 Keys can be any binary value up to a reasonable size (using short ASCII strings is recommended for readability and debugging purposes). 密钥可以是任何二进制值,直到合理的大小(出于可读性和调试目的,建议使用短ASCII字符串)。 Values are one of five native Redis data types. 值是五种本机Redis数据类型之一。

1.strings — a sequence of binary safe bytes up to 512 MB 1.strings - 一系列二进制安全字节,最大512 MB

2.hashes — a collection of key value pairs 2.hashes - 键值对的集合

3.lists — an in-insertion-order collection of strings 3.lists - 字符串的插入顺序集合

4.sets — a collection of unique strings with no ordering 4.sets - 没有排序的唯一字符串的集合

5.sorted sets — a collection of unique strings ordered by user defined scoring 5.sorted sets - 由用户定义的评分排序的唯一字符串的集合

Strings 字符串

A Redis string is a sequence of bytes. Redis字符串是一个字节序列。

Strings in Redis are binary safe (meaning they have a known length not determined by any special terminating characters), so you can store anything up to 512 megabytes in one string. Redis中的字符串是二进制安全的(意味着它们的已知长度不是由任何特殊的终止字符决定的),因此您可以在一个字符串中存储最多512兆字节的内容。

Strings are the cannonical "key value store" concept. 字符串是典型的“关键价值商店”概念。 You have a key pointing to a value, where both key and value are text or binary strings. 您有一个指向值的键,其中键和值都是文本或二进制字符串。

For all possible operations on strings, see the http://redis.io/commands/#string 有关字符串的所有可能操作,请参阅http://redis.io/commands/#string

Hashes 哈希

A Redis hash is a collection of key value pairs. Redis哈希是键值对的集合。

A Redis hash holds many key value pairs, where each key and value is a string. Redis散列包含许多键值对,其中每个键和值都是一个字符串。 Redis hashes do not support complex values directly (meaning, you can't have a hash field have a value of a list or set or another hash), but you can use hash fields to point to other top level complex values. Redis哈希不直接支持复杂值(意思是,您不能让哈希字段具有列表或集合的值或其他哈希值),但您可以使用哈希字段指向其他顶级复杂值。 The only special operation you can perform on hash field values is atomic increment/decrement of numeric contents. 您可以对哈希字段值执行的唯一特殊操作是数字内容的原子递增/递减。

You can think of a Redis hashes in two ways: as a direct object representation and as a way to store many small values compactly. 您可以通过两种方式考虑Redis哈希:作为直接对象表示和紧凑地存储许多小值的方法。

Direct object representations are simple to understand. 直接对象表示很容易理解。 Objects have a name (the key of the hash) and a collection of internal keys with values. 对象具有名称(哈希的键)和具有值的内部键的集合。 See the example below for, well, an example. 以下是示例,请参阅下面的示例。

Storing many small values using a hash is a clever Redis massive data storage technique. 使用散列存储许多小值是一种聪明的Redis海量数据存储技术。 When a hash has a small number of fields (~100), Redis optimizes the storage and access efficency of the entire hash. 当散列具有少量字段(~100)时,Redis优化整个散列的存储和访问效率。 Redis's small hash storage optimization raises an interesting behavior: it's more efficient to have 100 hashes each with 100 internal keys and values rather than having 10,000 top level keys pointing to string values. Redis的小型哈希存储优化提出了一个有趣的行为:拥有100个哈希值,每个哈希值有100个内部键和值,而不是让10,000个顶级键指向字符串值,效率更高。 Using Redis hashes to optimize your data storage this way does require additional programming overhead for tracking where data ends up, but if your data storage is primarly string based, you can save a lot of memory overhead using this one weird trick. 使用Redis哈希来优化数据存储这种方式确实需要额外的编程开销来跟踪数据最终的位置,但如果您的数据存储主要基于字符串,则可以使用这一个奇怪的技巧节省大量内存开销。

For all possible operations on hashes, see the hash docs 有关散列的所有可能操作,请参阅散列文档

Lists 清单

Redis lists act like linked lists. Redis列表就像链接列表一样。

You can insert to, delete from, and traverse lists from either the head or tail of a list. 您可以从列表的头部或尾部插入,删除和遍历列表。

Use lists when you need to maintain values in the order they were inserted. 需要按照插入顺序维护值时使用列表。 (Redis does give you the option to insert into any arbitrary list position if you need to, but your insertion performance will degrade if you insert far from your start position.) (如果需要,Redis会为您提供插入任意列表位置的选项,但如果插入远离起始位置,插入性能会降低。)

Redis lists are often used as producer/consumer queues. Redis列表通常用作生产者/消费者队列。 Insert items into a list then pop items from the list. 将项目插入列表,然后从列表中弹出项目。 What happens if your consumers try to pop from a list with no elements? 如果您的消费者尝试从没有元素的列表中弹出会发生什么? You can ask Redis to wait for an element to appear and return it to you immediately when it gets added. 您可以要求Redis等待元素出现,并在添加元素后立即将其返回给您。 This turns Redis into a real time message queue/event/job/task/notification system. 这将Redis转变为实时消息队列/事件/作业/任务/通知系统。

You can atomically remove elements off either end of a list, enabling any list to be treated as a stack or a queue. 您可以自动从列表的任一端删除元素,使任何列表都可以被视为堆栈或队列。

You can also maintain fixed-length lists (capped collections) by trimming your list to a specific size after every insertion. 您还可以通过在每次插入后将列表修剪为特定大小来维护固定长度列表(上限集合)。

For all possible operations on lists, see the lists docs 有关列表上的所有可能操作,请参阅列表文档

Sets

Redis sets are, well, sets. Redis集合就是集合。

A Redis set contains unique unordered Redis strings where each string only exists once per set. Redis集包含唯一的无序Redis字符串,其中每个字符串每集仅存在一次。 If you add the same element ten times to a set, it will only show up once. 如果将相同的元素添加十次到一个集合,它将只显示一次。 Sets are great for lazily ensuring something exists at least once without worrying about duplicate elements accumulating and wasting space. 套装非常适合懒洋洋地确保某些东西存在至少一次,而不必担心重复元素的累积和浪费空间。 You can add the same string as many times as you like without needing to check if it already exists. 您可以根据需要多次添加相同的字符串,而无需检查它是否已存在。

Sets are fast for membership checking, insertion, and deletion of members in the set. 对于集合中成员的成员资格检查,插入和删除,集合很快。

Sets have efficient set operations, as you would expect. 正如您所料,集合具有高效的集合操作。 You can take the union, intersection, and difference of multiple sets at once. 您可以同时获取多个集合的并集,交集和差异。 Results can either be returned to the caller or results can be stored in a new set for later usage. 结果可以返回给调用者,也可以将结果存储在新的集合中供以后使用。

Sets have constant time access for membership checks (unlike lists), and Redis even has convenient random member removal and returning ("pop a random element from the set") or random member returning without replacement ("give me 30 random-ish unique users") or with replacement ("give me 7 cards, but after each selection, put the card back so it can potentially be sampled again"). 集合具有常量时间访问以进行成员资格检查(与列表不同),并且Redis甚至可以方便地随机删除成员并返回(“从集合中弹出随机元素”)或随机成员返回而无需替换(“给我30个随机成员唯一用户“)或替换(”给我7张卡片,但在每次选择后,将卡片放回原位,以便可能再次取样“)。

For all possible operations on sets, see the sets docs . 有关集合上的所有可能操作,请参阅集文档

Sorted Sets 排序集

Redis sorted sets are sets with a user-defined ordering. Redis排序集是具有用户定义排序的集合。

For simplicity, you can think of a sorted set as a binary tree with unique elements. 为简单起见,您可以将有序集视为具有唯一元素的二叉树。 (Redis sorted sets are actually skip lists .) The sort order of elements is defined by each element's score. (Redis排序集实际上是跳过列表 。)元素的排序顺序由每个元素的分数定义。

Sorted sets are still sets. 排序集仍然是集。 Elements may only appear once in a set. 元素只能在一组中出现一次。 An element, for uniqueness purposes, is defined by its string contents. 出于唯一性目的,元素由其字符串内容定义。 Inserting element "apple" with sorting score 3, then inserting element "apple" with sorting score 500 results in one element "apple" with sorting score 500 in your sorted set. 插入具有排序分数3的元素“apple”,然后插入具有排序分数500的元素“apple”导致在您的分类集合中具有分类分数500的一个元素“apple”。 Sets are only unique based on Data, not based on (Score, Data) pairs. 集合仅基于数据是唯一的,而不是基于(分数,数据)对。

Make sure your data model relies on the string contents and not the element's score for uniqueness. 确保您的数据模型依赖于字符串内容而不是元素的唯一性分数。 Scores are allowed to be repeated (or even zero), but, one last time, set elements can only exist once per sorted set. 允许分数重复(甚至为零),但最后一次,每个有序集合只能存在一次。 For example, if you try to store the history of every user login as a sorted set by making the score the epoch of the login and the value the user id, you will end up storing only the last login epoch for all your users. 例如,如果您尝试将每个用户登录的历史记录存储为有序集,方法是将分数设置为登录的时期和用户ID的值,则最终将仅为所有用户存储上次登录时期。 Your set would grow to size of your userbase and not your desired size of userbase * logins. 您的设置将增长到您的用户群的大小,而不是您想要的userbase *登录大小。

Elements are added to your set with scores. 元素将添加到您的集合中。 You can update the score of any element at any time, just add the element again with a new score. 您可以随时更新任何元素的分数,只需使用新分数再次添加元素即可。 Scores are represented by floating point doubles, so you can specify granularity of high precision timestamps if needed. 分数由浮点双精度表示,因此您可以根据需要指定高精度时间戳的粒度。 Multiple elements may have the same score. 多个元素可能具有相同的分数。

You can retrieve elements in a few different ways. 您可以通过几种不同的方式检索元素。 Since everything is sorted, you can ask for elements starting at the lowest scores. 由于所有内容都已排序,因此您可以要求从最低分数开始的元素。 You can ask for elements starting at the highest scores ("in reverse"). 你可以要求从最高分开始的元素(“反向”)。 You can ask for elements by their sort score either in natural or reverse order. 您可以按自然顺序或反向顺序询问元素的排序分数。

For all possible operations on sorted sets, see the sorted sets docs. 有关排序集的所有可能操作,请参阅排序集文档。


#4楼

I'll try to answer your question, but I'll start with something that may look strange at first: if you are not interested in Redis internals you should not care about how data types are implemented internally. 我会尝试回答你的问题,但我会先从一些看起来很奇怪的东西开始:如果你对Redis内部不感兴趣,你不应该关心内部如何实现数据类型。 This is for a simple reason: for every Redis operation you'll find the time complexity in the documentation and, if you have the set of operations and the time complexity, the only other thing you need is some clue about memory usage (and because we do many optimizations that may vary depending on data, the best way to get these latter figures are doing a few trivial real world tests). 这是一个简单的原因:对于每个Redis操作,您都会在文档中找到时间复杂度,如果您有一组操作和时间复杂度,那么您需要的唯一其他内容就是关于内存使用的一些线索(并且因为我们做了许多可能因数据而异的优化,获得后面这些数字的最佳方法是进行一些简单的实际测试。

But since you asked, here is the underlying implementation of every Redis data type. 但是既然你问过,这里是每个Redis数据类型的底层实现。

  • Strings are implemented using a C dynamic string library so that we don't pay (asymptotically speaking) for allocations in append operations. 字符串是使用C动态字符串库实现的,这样我们就不会为了追加操作中的分配而支付(渐态)。 This way we have O(N) appends, for instance, instead of having quadratic behavior. 这样我们就可以添加O(N),而不是具有二次行为。
  • Lists are implemented with linked lists. 列表通过链接列表实现。
  • Sets and Hashes are implemented with hash tables. 哈希值是用哈希表来实现。
  • Sorted sets are implemented with skip lists (a peculiar type of balanced trees). 使用跳过列表 (一种特殊类型的平衡树)实现排序集

But when lists, sets, and sorted sets are small in number of items and size of the largest values, a different, much more compact encoding is used. 但是,当列表,集合和有序集合的项目数量和最大值的大小较小时,使用不同的,更紧凑的编码。 This encoding differs for different types, but has the feature that it is a compact blob of data that often forces an O(N) scan for every operation. 这种编码因不同类型而异,但其特点是它是一个紧凑的数据块,通常会对每个操作强制执行O(N)扫描。 Since we use this format only for small objects this is not an issue; 由于我们仅将此格式用于小型对象,因此这不是问题; scanning a small O(N) blob is cache oblivious so practically speaking it is very fast, and when there are too many elements the encoding is automatically switched to the native encoding (linked list, hash, and so forth). 扫描一个小的O(N)blob是缓存不经意,所以实际上它非常快,当有太多元素时,编码会自动切换到本机编码(链表,哈希等)。

But your question was not really just about internals, your point was What type to use to accomplish what? 但你的问题并不仅仅是关于内部问题,你的观点是用什么类型来完成什么? .

Strings 字符串

This is the base type of all the types. 这是所有类型的基本类型。 It's one of the four types but is also the base type of the complex types, because a List is a list of strings, a Set is a set of strings, and so forth. 它是四种类型中的一种,但也是复杂类型的基本类型,因为List是字符串列表,Set是一组字符串,依此类推。

A Redis string is a good idea in all the obvious scenarios where you want to store an HTML page, but also when you want to avoid converting your already encoded data. 在您想要存储HTML页面的所有明显场景中,Redis字符串都是一个好主意,但是当您想要避免转换已编码的数据时也是如此。 So for instance, if you have JSON or MessagePack you may just store objects as strings. 因此,例如,如果您有JSON或MessagePack,则可以将对象存储为字符串。 In Redis 2.6 you can even manipulate this kind of object server side using Lua scripts. 在Redis 2.6中,您甚至可以使用Lua脚本操作此类对象服务器端。

Another interesting usage of strings is bitmaps, and in general random access arrays of bytes, since Redis exports commands to access random ranges of bytes, or even single bits. 字符串的另一个有趣用法是位图,并且通常是字节的随机访问数组,因为Redis导出命令以访问字节的随机范围,甚至是单个位。 For instance check this good blog post: Fast Easy real time metrics using Redis . 例如,查看这篇好文章:使用Redis进行Fast Easy实时指标

Lists 清单

Lists are good when you are likely to touch only the extremes of the list: near tail, or near head. 当您可能只接触列表的极端时,列表很好:靠近尾部或靠近头部。 Lists are not very good to paginate stuff, because random access is slow, O(N). 列表不是很好的分页东西,因为随机访问很慢,O(N)。 So good uses of lists are plain queues and stacks, or processing items in a loop using RPOPLPUSH with same source and destination to "rotate" a ring of items. 因此,列表的良好用法是普通队列和堆栈,或者使用具有相同源和目标的RPOPLPUSH来循环处理项目以“旋转”项目环。

Lists are also good when we want just to create a capped collection of N items where usually we access just the top or bottom items, or when N is small. 当我们想要创建N个项目的上限集合时,列表也很好, 通常我们只访问顶部或底部项目,或者当N很小时。

Sets

Sets are an unordered data collection, so they are good every time you have a collection of items and it is very important to check for existence or size of the collection in a very fast way. 集合是一个无序的数据集合,因此每次有一组项目时它们都很好,并且以非常快的方式检查集合的存在或大小非常重要。 Another cool thing about sets is support for peeking or popping random elements (SRANDMEMBER and SPOP commands). 关于集合的另一个很酷的事情是支持窥视或弹出随机元素(SRANDMEMBER和SPOP命令)。

Sets are also good to represent relations, eg, "What are friends of user X?" 集合也很好地表示关系,例如,“什么是用户X的朋友?” and so forth. 等等。 But other good data structures for this kind of stuff are sorted sets as we'll see. 但是,正如我们所看到的,这类东西的其他优秀数据结构都是有序集合。

Sets support complex operations like intersections, unions, and so forth, so this is a good data structure for using Redis in a "computational" manner, when you have data and you want to perform transformations on that data to obtain some output. 设置支持复杂的操作,如交叉点,联合等等,因此这是一个很好的数据结构,用于以“计算”方式使用Redis,当您有数据并且您想要对该数据执行转换以获得某些输出时。

Small sets are encoded in a very efficient way. 小集以非常有效的方式编码。

Hashes 哈希

Hashes are the perfect data structure to represent objects, composed of fields and values. 哈希是表示由字段和值组成的对象的完美数据结构。 Fields of hashes can also be atomically incremented using HINCRBY. 哈希字段也可以使用HINCRBY以原子方式递增。 When you have objects such as users, blog posts, or some other kind of item , hashes are likely the way to go if you don't want to use your own encoding like JSON or similar. 当您拥有诸如用户,博客帖子或其他类型项目之类的对象时,如果您不想使用自己的编码(如JSON或类似代码),则可能需要使用哈希。

However, keep in mind that small hashes are encoded very efficiently by Redis, and you can ask Redis to atomically GET, SET or increment individual fields in a very fast fashion. 但是,请记住,Redis会非常有效地编码小哈希,并且您可以要求Redis以非常快速的方式原子地获取,设置或增加单个字段。

Hashes can also be used to represent linked data structures, using references. 哈希也可以用于表示使用引用的链接数据结构。 For instance check the lamernews.com implementation of comments. 例如,检查lamernews.com评论的实现。

Sorted Sets 排序集

Sorted sets are the only other data structures, besides lists, to maintain ordered elements . 除了列表之外,排序集是唯一的其他数据结构,用于维护有序元素 You can do a number of cool stuff with sorted sets. 你可以用排序集做很多很酷的东西。 For instance, you can have all kinds of Top Something lists in your web application. 例如,您可以在Web应用程序中拥有各种Top Something列表。 Top users by score, top posts by pageviews, top whatever, but a single Redis instance will support tons of insertion and get-top-elements operations per second. 按分数排名最高的用户,按浏览量排名靠前的帖子,顶部用户,但单个Redis实例每秒将支持大量的插入和get-top-elements操作。

Sorted sets, like regular sets, can be used to describe relations, but they also allow you to paginate the list of items and to remember the ordering. 排序集(如常规集)可用于描述关系,但它们还允许您对项列表进行分页并记住排序。 For instance, if I remember friends of user X with a sorted set I can easily remember them in order of accepted friendship. 例如,如果我记得用户X的朋友有一个排序的集合,我可以很容易地按照接受的友谊记住它们。

Sorted sets are good for priority queues. 排序集适用于优先级队列。

Sorted sets are like more powerful lists where inserting, removing, or getting ranges from the the middle of the list is always fast. 排序集就像更强大的列表,其中从列表中间插入,删除或获取范围总是很快。 But they use more memory, and are O(log(N)) data structures. 但是它们使用更多内存,并且是O(log(N))数据结构。

Conclusion 结论

I hope that I provided some info in this post, but it is far better to download the source code of lamernews from http://github.com/antirez/lamernews and understand how it works. 我希望我在这篇文章中提供了一些信息,但是从http://github.com/antirez/lamernews下载lamernews的源代码要好得多,并了解它是如何工作的。 Many data structures from Redis are used inside Lamer News, and there are many clues about what to use to solve a given task. 来自Redis的许多数据结构都在Lamer News中使用,并且有许多关于如何使用来解决给定任务的线索。

Sorry for grammar typos, it's midnight here and too tired to review the post ;) 对不起语法拼写错误,这是午夜,太累了,无法查看帖子;)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值