结合redis设计与实现的redis源码学习-10-hyperloglog(基数统计)

hyperloglog是redis用来做基数统计的算法,优点是在输入元素的数量或者体积非常大时,基数所需的空间是固定的,并且是很小的,在redis中,每个hyperloglog键只需要花费12kb内存,就可以计算接近2^64个不同元素的基数;但是因为它智慧根据输入元素来计算基数,而不会储存输入元素本身,所以不能像集合那样,返回输入的各个元素。
我在网上找到一篇关于这个算法的博客比较易懂,在这里附上链接:http://blog.csdn.net/firenet1/article/details/77247649
它定义在hyperloglog.c中,因为对这个算法完全没有概念,所以将作者的注释全部翻译了一遍:

/* The Redis HyperLogLog implementation is based on the following ideas:
 *这个方法的实现基于一个想法
 * * The use of a 64 bit hash function as proposed in [1], in order to don't limited to cardinalities up to 10^9, at the cost of just 1 additional bit per register.使用[1]中提出的64位散列函数,为了不限制基数升级到10^9,每个寄存器只需要额外的增加1位
 * * The use of 16384 6-bit registers for a great level of accuracy, using a total of 12k per key.使用163846位寄存器的精度非常高,每个key使用12k
 * * The use of the Redis string data type. No new type is introduced.
 * * No attempt is made to compress the data structure as in [1]. Also the algorithm used is the original HyperLogLog Algorithm as in [2], with the only difference that a 64 bit hash function is used, so no correction is performed for values near 2^32 as in [1]. 没有尝试去像[1]那样压缩数据结构,使用的算法也是原始的就像[2]一样,唯一不同的就是64位哈希函数的使用,所以对于2^32附近的值不进行矫正,如[1]。
 * [1] Heule, Nunkesser, Hall: HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.基数估计算法的算法引擎
 * [2] P. Flajolet, 脡ric Fusy, O. Gandouet, and F. Meunier. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm.基数最优基数估计算法的分析
 * Redis uses two representations:redis使用两个表现
 * 1) A "dense" representation where every entry is represented by
 *    a 6-bit integer.密集表现每个条目是用6位整数表示的
 * 2) A "sparse" representation using run length compression suitable for representing HyperLogLogs with many registers set to 0 in a memory efficient way.使用运行长度雅俗的稀疏表示适用于大量寄存器的高效赋值
 * HLL header
 * ===
 * Both the dense and sparse representation have a 16 byte header as follows:这两种表示方法都有一个16自己的头
 *
 * +------+---+-----+----------+
 * | HYLL | E | N/U | Cardin.  |
 * +------+---+-----+----------+
 * The first 4 bytes are a magic string set to the bytes "HYLL".头4个字节是一个“HYLL”字符串
 * "E" is one byte encoding, currently set to HLL_DENSE or HLL_SPARSE. N/U are three not used bytes.E是一个字节的编码,当前设置为稠密或者稀疏,N/U是3个没有使用的字节
 ###我猜想这里是为了加快CPU读取的效率而采取的内存对齐方式,所以空了3个字节
 * The "Cardin." field is a 64 bit integer stored in little endian format with the latest cardinality computed that can be reused if the data structure was not modified since the last computation (this is useful because there are high probabilities that HLLADD operations don't modify the actual data structure and hence the approximated cardinality).Cardin是以小端格式存储的64位整数,最新计算的基数,如果自上次计算以来没有修改数据结构,则可以重新使用
 * When the most significant bit in the most significant byte of the cached cardinality is set, it means that the data structure was modified and we can't reuse the cached value that must be recomputed.当缓存基数的最高有效字节中的最高有效位备设置时,这意味着数据结构备修改,我们必须重新计算
 *
 * Dense representation
 * ===
 * The dense representation used by Redis is the following:
 * +--------+--------+--------+------//      //--+
 * |11000000|22221111|33333322|55444444 ....     |
 * +--------+--------+--------+------//      //--+
 * The 6 bits counters are encoded one after the other starting from the LSB to the MSB, and using the next bytes as needed.6位的计数器congLSB到MSB,并根据需要使用下一个字节
 *
 * Sparse representation
 * ===
 * The sparse representation encodes registers using a run lengt encoding composed of three opcodes, two using one byte, and one using of two bytes. The opcodes are called ZERO, XZERO and VAL.ZERO opcode is represented as 00xxxxxx. The 6-bit integer represented by the six bits 'xxxxxx', plus 1, means that there are N registers set to 0. This opcode can represent from 1 to 64 contiguous registers set to the value of 0.稀疏表示使用三个操作码组成的运行长度编码来编码寄存器,两个使用一个字节,一个使用两个字节。该操作吗可以表示从164个连续的寄存器设置为0。
 * XZERO opcode is represented by two bytes 01xxxxxx yyyyyyyy. The 14-bit integer represented by the bits 'xxxxxx' as most significant bits and 'yyyyyyyy' as least significant bits, plus 1, means that there are N registers set to 0. This opcode can represent from 0 to 16384 contiguous registers set to the value of 0.XZERO操作码有两个字节 01xxxxxx yyyyyyyy表示,由xxxxxx位表示的14位整数为最好有效位,yyyyyyyy为最低有效位,加上1表示由N个寄存器设置为0,高操作吗可以表示从016384个连续的寄存器设置为0。
 * VAL opcode is represented as 1vvvvvxx. It contains a 5-bit integer representing the value of a register, and a 2-bit integer representing the number of contiguous registers set to that value 'vvvvv'. To obtain the value and run length, the integers vvvvv and xx must be incremented by one. This opcode can represent values from 1 to 32, repeated from 1 to 4 times.VAL操作码表示为1vvvvvxx。它包含表示寄存器值的五位整数,以及表示设置为该值vvvv的连句寄存器数的两位整数,要获取值和运行长度,vvvvv和xx的整数必须增加1。该操作码可以表示132位的值,重复14次。
 *
 * The sparse representation can't represent registers with a value greater than 32, however it is very unlikely that we find such a register in an HLL with a cardinality where the sparse representation is still more memory efficient than the dense representation. When this happens the HLL is converted to the dense representation.稀疏表示不能表示大于32的值得寄存器,但是我们不太可能在具有基数的HLL中找到这样的寄存器,其中稀疏表示比密集表示有更高的存储器效率。当这种情况发生时,HLL被转换为密集表示。
 *
 * The sparse representation is purely positional. For example a sparse  representation of an empty HLL is just: XZERO:16384.洗出表示的是纯粹的位置,例如空HLL的表示是:XZERO:16384
 *
 * An HLL having only 3 non-zero registers at position 1000, 1020, 1021 respectively set to 2, 3, 3, is represented by the following three opcodes:在位置100010201021分别设置为233的只有3个非零计算器的HLL由一下三个操作码表示:
 *
 * XZERO:1000 (Registers 0-999 are set to 0)
 * VAL:2,1    (1 register set to value 2, that is register 1000)
 * ZERO:19    (Registers 1001-1019 set to 0)
 * VAL:3,2    (2 registers set to value 3, that is registers 1020,1021)
 * XZERO:15362 (Registers 1022-16383 set to 0)
 *
 * In the example the sparse representation used just 7 bytes instead of 12k in order to represent the HLL registers. In general for low cardinality there is a big win in terms of space efficiency, traded with CPU time since the sparse representation is slower to access: The following table shows average cardinality vs bytes used, 100 samples per cardinality (when the set was not representable because of registers with too big value, the dense representation size was used as a sample).在例子中,为了表示HLL寄存器,稀疏表示使用了7个字节而不是12K,一般来说,低基数在空间效率方面有一个很大的优势,因为稀疏表示的访问速度比较慢,所以用CPU时间换取:小表显示了平均基数与字节的使用,每个基数100个样本,由于具有太大值的寄存器,密集表示大小用作样本。
 *
 * 100 267
 * 200 485
 * 300 678
 * 400 859
 * 500 1033
 * 600 1205
 * 700 1375
 * 800 1544
 * 900 1713
 * 1000 1882
 * 2000 3480
 * 3000 4879
 * 4000 6089
 * 5000 7138
 * 6000 8042
 * 7000 8823
 * 8000 9500
 * 9000 10088
 * 10000 10591
 * The dense representation uses 12288 bytes, so there is a big win up to
 a cardinality of ~2000-3000. For bigger cardinalities the constant times involved in updating the sparse representation is not justified by the memory savings. The exact maximum length of the sparse representation when this implementation switches to the dense representation is configured via the define server.hll_sparse_max_bytes.密集表示使用12288字节,所以有一个很大的优势,基数为2000~3000。对于更大的基数,更新稀疏表示所用的时间不是通过节省内存来证明的。当该实现切换到密集表示时,稀疏表示的确切最大长度通过 server.hll_sparse_max_bytes配置。
 */
#include "server.h"
#include <stdint.h>
#include <math.h>
//主要结构体
struct hllhdr {
    char magic[4];      /* "HYLL"4个字节是一个“HYLL”字符串*/
    uint8_t encoding;   /* HLL_DENSE or HLL_SPARSE. E是一个字节的编码,当前设置为稠密或者稀疏*/
    uint8_t notused[3]; /* Reserved for future use, must be zero. N/U是3个没有使用的字节*/
    uint8_t card[8];    /* Cached cardinality, little endian. 小端基数*/
    uint8_t registers[]; /* Data bytes. 数据字节,空数组*/
};

/* The cached cardinality MSB is used to signal validity of the cached value. 基数缓存MSB用来做缓存值得信号有效性*/
#define HLL_INVALIDATE_CACHE(hdr) (hdr)->card[7] |= (1<<7)
#define HLL_VALID_CACHE(hdr) (((hdr)->card[7] & (1<<7)) == 0)

#define HLL_P 14 /* The greater is P, the smaller the error. */
#define HLL_REGISTERS (1<<HLL_P) /* With P=14, 16384 registers. */
#define HLL_P_MASK (HLL_REGISTERS-1) /* Mask to index register. */
#define HLL_BITS 6 /* Enough to count up to 63 leading zeroes. */
#define HLL_REGISTER_MAX ((1<<HLL_BITS)-1)
#define HLL_HDR_SIZE sizeof(struct hllhdr)
#define HLL_DENSE_SIZE (HLL_HDR_SIZE+((HLL_REGISTERS*HLL_BITS+7)/8))
#define HLL_DENSE 0 /* Dense encoding. */
#define HLL_SPARSE 1 /* Sparse encoding. */
#define HLL_RAW 255 /* Only used internally, never exposed. */
#define HLL_MAX_ENCODING 1
static char *invalid_hll_err = "-INVALIDOBJ Corrupted HLL object detected\r\n";

/* =========================== Low level bit macros ========================= */
/* Macros to access the dense representation.
 *
 * We need to get and set 6 bit counters in an array of 8 bit bytes. We use macros to make sure the code is inlined since speed is critical especially in order to compute the approximated cardinality in HLLCOUNT where we need to access all the registers at once.我们需要得到并设置8位的6位计数器,我们使用洪濑确保代码是内敛的,因为速度至关重要,特别是为了计算HLLCOUNT中的近似基数,我们需要一次访问所有寄存器;
 * For the same reason we also want to avoid conditionals in this code path.由于同样的原因,我们也希望再次代码路径中避免条件。
 * +--------+--------+--------+------//
 * |11000000|22221111|33333322|55444444
 * +--------+--------+--------+------//
 * Note: in the above representation the most significant bit (MSB) of every byte is on the left. We start using bits from the LSB to MSB, and so forth passing to the next byte.在上述表示中,每个字节的最高有效位在左侧。我们开始使用从LSB到MSB的为,等等传递到下一个字节。
 *
 * Example, we want to access to counter at pos = 1 ("111111" in the illustration above).例如我们想访问pos=1的位置
 * The index of the first byte b0 containing our data is:
 *  b0 = 6 * pos / 8 = 0
 *   +--------+
 *   |11000000|  <- Our byte at b0
 *   +--------+
 * The position of the first bit (counting from the LSB = 0) in the byte is given by:第一个位在字节中的位置由下式给出
 *  fb = 6 * pos % 8 -> 6
 * Right shift b0 of 'fb' bits.右移b0的fb位
 *   +--------+
 *   |11000000|  <- Initial value of b0
 *   |00000011|  <- After right shift of 6 pos.
 *   +--------+
 * Left shift b1 of bits 8-fb bits (2 bits)左移吧
 * 的8-fb位
 *   +--------+
 *   |22221111|  <- Initial value of b1
 *   |22111100|  <- After left shift of 2 bits.
 *   +--------+
 * OR the two bits, and finally AND with 111111 (63 in decimal) to clean the higher order bits we are not interested in:或2位,最后与上111111清理我们不感兴趣的高位
 *   +--------+
 *   |00000011|  <- b0 right shifted
 *   |22111100|  <- b1 left shifted
 *   |22111111|  <- b0 OR b1
 *   |  111111|  <- (b0 OR b1) AND 63, our value.
 *   +--------+
 * We can try with a different example, like pos = 0. In this case the 6-bit counter is actually contained in a single byte.我们可以用一个不同的例子,像pos = 0。在这里6位计数器实际包含在一个字节中
 *  b0 = 6 * pos / 8 = 0
 *   +--------+
 *   |11000000|  <- Our byte at b0
 *   +--------+
 *  fb = 6 * pos % 8 = 0
 *  So we right shift of 0 bits (no shift in practice) and left shift the next byte of 8 bits, even if we don't use it,  but this has the effect of clearing the bits so the result will not be affacted after the OR.所以我们右移0位,左移8位的下一个字节,及时我们不使用他,但这样做具有清除位的效果,因此在或之后不会影响结果。
 * ------------------------------------------------------------------------
 * Setting the register is a bit more complex, let's assume that 'val' is the value we want to set, already in the right range. We need two steps, in one we need to clear the bits, and in the other we need to bitwise-OR the new bits.设置寄存器有点复杂,我们假设val是我们要设置的值,已经在正确的范围内,我们需要两个步骤,一个是我们需要清除这些为,另一个是我们需要安慰或新的位。
 * Let's try with 'pos' = 1, so our first byte at 'b' is 0, "fb" is 6 in this case.
 *   +--------+
 *   |11000000|  <- Our byte at b0
 *   +--------+
 * To create a AND-mask to clear the bits about this position, we just initialize the mask with the value 63, left shift it of "fs" bits, and finally invert the result.要创建一个与掩码去清除这个位置的位,我们只需要初始化值为63的掩码,左移fs位,最后反转结果。
 *   +--------+
 *   |00111111|  <- "mask" starts at 63
 *   |11000000|  <- "mask" after left shift of "ls" bits.
 *   |00111111|  <- "mask" after invert.
 *   +--------+
 * Now we can bitwise-AND the byte at "b" with the mask, and bitwise-OR it with "val" left-shifted of "ls" bits to set the new bits.现在我们可以和掩码位与这个字节在b出,位或val左移ls位去设置一个新的位;
 * Now let's focus on the next byte b1:
 *   +--------+
 *   |22221111|  <- Initial value of b1
 *   +--------+
 * To build the AND mask we start again with the 63 value, right shift it by 8-fb bits, and invert it.要构建与掩码,我们再次一63开始,降级右移8位,并且反转他
 *   +--------+
 *   |00111111|  <- "mask" set at 2&6-1
 *   |00001111|  <- "mask" after the right shift by 8-fb = 2 bits
 *   |11110000|  <- "mask" after bitwise not.
 *   +--------+
 * Now we can mask it with b+1 to clear the old bits, and bitwise-OR with "val" left-shifted by "rs" bits to set the new value.现在我们可以用b+1来清除旧的位,按位val左移rs位去设置新的值。
 */
/* Note: if we access the last counter, we will also access the b+1 byte that is out of the array, but sds strings always have an implicit null term, so the byte exists, and we can skip the conditional (or the need to allocate 1 byte more explicitly). 如果我们访问最后一个计数器,我们还将访问数组外的b+1个字节,但sds字符串时钟具有隐式空值,因此字节存在,我们可以跳过条件,或者更明确的分配1个字节*/
/* Store the value of the register at position 'regnum' into variable 'target'.下面是使用的一些宏函数
 * 'p' is an array of unsigned bytes. */
#define HLL_DENSE_GET_REGISTER(target,p,regnum) do { \
    uint8_t *_p = (uint8_t*) p; \
    unsigned long _byte = regnum*HLL_BITS/8; \
    unsigned long _fb = regnum*HLL_BITS&7; \
    unsigned long _fb8 = 8 - _fb; \
    unsigned long b0 = _p[_byte]; \
    unsigned long b1 = _p[_byte+1]; \
    target = ((b0 >> _fb) | (b1 << _fb8)) & HLL_REGISTER_MAX; \
} while(0)

/* Set the value of the register at position 'regnum' to 'val'.
 * 'p' is an array of unsigned bytes. */
#define HLL_DENSE_SET_REGISTER(p,regnum,val) do { \
    uint8_t *_p = (uint8_t*) p; \
    unsigned long _byte = regnum*HLL_BITS/8; \
    unsigned long _fb = regnum*HLL_BITS&7; \
    unsigned long _fb8 = 8 - _fb; \
    unsigned long _v = val; \
    _p[_byte] &= ~(HLL_REGISTER_MAX << _fb); \
    _p[_byte] |= _v << _fb; \
    _p[_byte+1] &= ~(HLL_REGISTER_MAX >> _fb8); \
    _p[_byte+1] |= _v >> _fb8; \
} while(0)

/* Macros to access the sparse representation.
 * The macros parameter is expected to be an uint8_t pointer. */
#define HLL_SPARSE_XZERO_BIT 0x40 /* 01xxxxxx */
#define HLL_SPARSE_VAL_BIT 0x80 /* 1vvvvvxx */
#define HLL_SPARSE_IS_ZERO(p) (((*(p)) & 0xc0) == 0) /* 00xxxxxx */
#define HLL_SPARSE_IS_XZERO(p) (((*(p)) & 0xc0) == HLL_SPARSE_XZERO_BIT)
#define HLL_SPARSE_IS_VAL(p) ((*(p)) & HLL_SPARSE_VAL_BIT)
#define HLL_SPARSE_ZERO_LEN(p) (((*(p)) & 0x3f)+1)
#define HLL_SPARSE_XZERO_LEN(p) (((((*(p)) & 0x3f) << 8) | (*((p)+1)))+1)
#define HLL_SPARSE_VAL_VALUE(p) ((((*(p)) >> 2) & 0x1f)+1)
#define HLL_SPARSE_VAL_LEN(p) (((*(p)) & 0x3)+1)
#define HLL_SPARSE_VAL_MAX_VALUE 32
#define HLL_SPARSE_VAL_MAX_LEN 4
#define HLL_SPARSE_ZERO_MAX_LEN 64
#define HLL_SPARSE_XZERO_MAX_LEN 16384
#define HLL_SPARSE_VAL_SET(p,val,len) do { \
    *(p) = (((val)-1)<<2|((len)-1))|HLL_SPARSE_VAL_BIT; \
} while(0)
#define HLL_SPARSE_ZERO_SET(p,len) do { \
    *(p) = (len)-1; \
} while(0)
#define HLL_SPARSE_XZERO_SET(p,len) do { \
    int _l = (len)-1; \
    *(p) = (_l>>8) | HLL_SPARSE_XZERO_BIT; \
    *((p)+1) = (_l&0xff); \
} while(0)
/* ========================= HyperLogLog algorithm  ========================= */
//下面是主要的hyperloglog算法
/* Our hash function is MurmurHash2, 64 bit version. It was modified for Redis in order to provide the same result in big and little endian archs (endian neutral). 使用的哈希函数是64位版本的MurmurHash2,他为redis修改为在大端或者小端提供相同结果的函数(忽略大小端)*/
uint64_t MurmurHash64A (const void * key, int len, unsigned int seed) {
    const uint64_t m = 0xc6a4a7935bd1e995;//哈希初始值
    const int r = 47;
    uint64_t h = seed ^ (len * m);
    const uint8_t *data = (const uint8_t *)key;
    const uint8_t *end = data + (len-(len&7));

    while(data != end) {
        uint64_t k;

#if (BYTE_ORDER == LITTLE_ENDIAN)
        k = *((uint64_t*)data);
#else
        k = (uint64_t) data[0];
        k |= (uint64_t) data[1] << 8;
        k |= (uint64_t) data[2] << 16;
        k |= (uint64_t) data[3] << 24;
        k |= (uint64_t) data[4] << 32;
        k |= (uint64_t) data[5] << 40;
        k |= (uint64_t) data[6] << 48;
        k |= (uint64_t) data[7] << 56;
#endif

        k *= m;
        k ^= k >> r;
        k *= m;
        h ^= k;
        h *= m;
        data += 8;
    }

    switch(len & 7) {
    case 7: h ^= (uint64_t)data[6] << 48;
    case 6: h ^= (uint64_t)data[5] << 40;
    case 5: h ^= (uint64_t)data[4] << 32;
    case 4: h ^= (uint64_t)data[3] << 24;
    case 3: h ^= (uint64_t)data[2] << 16;
    case 2: h ^= (uint64_t)data[1] << 8;
    case 1: h ^= (uint64_t)data[0];
            h *= m;
    };

    h ^= h >> r;
    h *= m;
    h ^= h >> r;
    return h;
}
/* Given a string element to add to the HyperLogLog, returns the length of the pattern 000..1 of the element hash. As a side effect 'regp' is set to the register index this element hashes to. 把一个字符串元素加给hyperloglog,返回元素哈希的模式长度000..1,作为副作用,regp被设置为寄存器索引*/
int hllPatLen(unsigned char *ele, size_t elesize, long *regp) {
    uint64_t hash, bit, index;
    int count;
    /* Count the number of zeroes starting from bit HLL_REGISTER  (that is a power of two corresponding to the first bit we don't use as index). The max run can be 64-P+1 bits.从为HLL_REGISTER开始计算0的数量。
     Note that the final "1" ending the sequence of zeroes must be included in the count, so if we find "001" the count is 3, and the smallest count possible is no zeroes at all, just a 1 bit at the first position, that is a count of 1.结束0序列的最终1必须包含在技术其中,所以如果我们发现001的数量为3,并且最小计数器不为0,在第一个位置只有1为,这是1的计数。
    This may sound like inefficient, but actually in the average case there are high probabilities to find a 1 after a few iterations. 这听起来可能效率低下,但实际上在平均情况下,在几次迭代后找到1的可能性很高*/
    hash = MurmurHash64A(ele,elesize,0xadc83b19ULL);
    index = hash & HLL_P_MASK; /* Register index. 寄存器索引*/
    hash |= ((uint64_t)1<<63); /* Make sure the loop terminates. 确定终止*/
    bit = HLL_REGISTERS; /* First bit not used to address the register. 第一个位没有用来标记寄存器地址*/
    count = 1; /* Initialized to 1 since we count the "00000...1" pattern. */
    while((hash & bit) == 0) {
        count++;
        bit <<= 1;
    }
    *regp = (int) index;
    return count;
}

* ================== Dense representation implementation  ================== */
//稠密表示实现
/* "Add" the element in the dense hyperloglog data structure. Actually nothing is added, but the max 0 pattern counter of the subset the element belongs to is incremented if needed. 'registers' is expected to have room for HLL_REGISTERS plus an additional byte on the right. This requirement is met by sds strings automatically since they are implicitly null terminated.在密集数据结构中添加元素,实际上没有添加任何东西,但是如果需要,元素所属的子集的最大0模式计数器将被递增。寄存器预计将有HLL_REGISTERS的空间加上右边的一个附加字节。这个要求由sds字符串自动满足,因为他们是隐式的空终止。
 * The function always succeed, however if as a result of the operation the approximated cardinality changed, 1 is returned. Otherwise 0 is returned. 这个函数总是能成功,但如果基数改变,返回1,否则返回0*/
int hllDenseAdd(uint8_t *registers, unsigned char *ele, size_t elesize) {
    uint8_t oldcount, count;
    long index;

    /* Update the register if this element produced a longer run of zeroes. 如果运行很久生成的元素都是0的话就更新寄存器*/
    count = hllPatLen(ele,elesize,&index);
    HLL_DENSE_GET_REGISTER(oldcount,registers,index);
    if (count > oldcount) {
        HLL_DENSE_SET_REGISTER(registers,index,count);
        return 1;
    } else {
        return 0;
    }
}

/* Compute SUM(2^-reg) in the dense representation. PE is an array with a pre-computer table of values 2^-reg indexed by reg. As a side effect the integer pointed by 'ezp' is set to the number of zero registers. 在密集表示中计算sum,pe是一个数组,在前一次计算的置为2^-reg,由reg指定,作为副作用,ezp指向的整数设置为0寄存器的数量。*/
double hllDenseSum(uint8_t *registers, double *PE, int *ezp) {
    double E = 0;
    int j, ez = 0;

    /* Redis default is to use 16384 registers 6 bits each. The code works
     * with other values by modifying the defines, but for our target value
     * we take a faster path with unrolled loops. */
    if (HLL_REGISTERS == 16384 && HLL_BITS == 6) {
        uint8_t *r = registers;
        unsigned long r0, r1, r2, r3, r4, r5, r6, r7, r8, r9,
                      r10, r11, r12, r13, r14, r15;
        for (j = 0; j < 1024; j++) {
            /* Handle 16 registers per iteration. */
            r0 = r[0] & 63; if (r0 == 0) ez++;
            r1 = (r[0] >> 6 | r[1] << 2) & 63; if (r1 == 0) ez++;
            r2 = (r[1] >> 4 | r[2] << 4) & 63; if (r2 == 0) ez++;
            r3 = (r[2] >> 2) & 63; if (r3 == 0) ez++;
            r4 = r[3] & 63; if (r4 == 0) ez++;
            r5 = (r[3] >> 6 | r[4] << 2) & 63; if (r5 == 0) ez++;
            r6 = (r[4] >> 4 | r[5] << 4) & 63; if (r6 == 0) ez++;
            r7 = (r[5] >> 2) & 63; if (r7 == 0) ez++;
            r8 = r[6] & 63; if (r8 == 0) ez++;
            r9 = (r[6] >> 6 | r[7] << 2) & 63; if (r9 == 0) ez++;
            r10 = (r[7] >> 4 | r[8] << 4) & 63; if (r10 == 0) ez++;
            r11 = (r[8] >> 2) & 63; if (r11 == 0) ez++;
            r12 = r[9] & 63; if (r12 == 0) ez++;
            r13 = (r[9] >> 6 | r[10] << 2) & 63; if (r13 == 0) ez++;
            r14 = (r[10] >> 4 | r[11] << 4) & 63; if (r14 == 0) ez++;
            r15 = (r[11] >> 2) & 63; if (r15 == 0) ez++;

            /* Additional parens will allow the compiler to optimize the
             * code more with a loss of precision that is not very relevant
             * here (floating point math is not commutative!). */
            E += (PE[r0] + PE[r1]) + (PE[r2] + PE[r3]) + (PE[r4] + PE[r5]) +
                 (PE[r6] + PE[r7]) + (PE[r8] + PE[r9]) + (PE[r10] + PE[r11]) +
                 (PE[r12] + PE[r13]) + (PE[r14] + PE[r15]);
            r += 12;
        }
    } else {
        for (j = 0; j < HLL_REGISTERS; j++) {
            unsigned long reg;

            HLL_DENSE_GET_REGISTER(reg,registers,j);
            if (reg == 0) {
                ez++;
                /* Increment E at the end of the loop. */
            } else {
                E += PE[reg]; /* Precomputed 2^(-reg[j]). */
            }
        }
        E += ez; /* Add 2^0 'ez' times. */
    }
    *ezp = ez;
    return E;
}

/* ================== Sparse representation implementation  ================= */
//稀疏表示的实现
/* Convert the HLL with sparse representation given as input in its dense representation. Both representations are represented by SDS strings, and the input representation is freed as a side effect.将稀疏表示的HLL转换为密集表示,连个表示鈞由SDS字符串表示,副作用是输入表示被释放。
 * The function returns C_OK if the sparse representation was valid, otherwise C_ERR is returned if the representation was corrupted.告诉返回值 */
int hllSparseToDense(robj *o) {
    sds sparse = o->ptr, dense;
    struct hllhdr *hdr, *oldhdr = (struct hllhdr*)sparse;
    int idx = 0, runlen, regval;
    uint8_t *p = (uint8_t*)sparse, *end = p+sdslen(sparse);

    /* If the representation is already the right one return ASAP. */
    hdr = (struct hllhdr*) sparse;
    if (hdr->encoding == HLL_DENSE) return C_OK;

    /* Create a string of the right size filled with zero bytes.
     * Note that the cached cardinality is set to 0 as a side effect
     * that is exactly the cardinality of an empty HLL. */
    dense = sdsnewlen(NULL,HLL_DENSE_SIZE);
    hdr = (struct hllhdr*) dense;
    *hdr = *oldhdr; /* This will copy the magic and cached cardinality. */
    hdr->encoding = HLL_DENSE;

    /* Now read the sparse representation and set non-zero registers
     * accordingly. */
    p += HLL_HDR_SIZE;
    while(p < end) {
        if (HLL_SPARSE_IS_ZERO(p)) {
            runlen = HLL_SPARSE_ZERO_LEN(p);
            idx += runlen;
            p++;
        } else if (HLL_SPARSE_IS_XZERO(p)) {
            runlen = HLL_SPARSE_XZERO_LEN(p);
            idx += runlen;
            p += 2;
        } else {
            runlen = HLL_SPARSE_VAL_LEN(p);
            regval = HLL_SPARSE_VAL_VALUE(p);
            while(runlen--) {
                HLL_DENSE_SET_REGISTER(hdr->registers,idx,regval);
                idx++;
            }
            p++;
        }
    }

    /* If the sparse representation was valid, we expect to find idx
     * set to HLL_REGISTERS. */
    if (idx != HLL_REGISTERS) {
        sdsfree(dense);
        return C_ERR;
    }

    /* Free the old representation and set the new one. */
    sdsfree(o->ptr);
    o->ptr = dense;
    return C_OK;
}
/* "Add" the element in the sparse hyperloglog data structure.给稀疏表示数据增加一个元素
 * Actually nothing is added, but the max 0 pattern counter of the subset
 * the element belongs to is incremented if needed.
 *
 * The object 'o' is the String object holding the HLL. The function requires
 * a reference to the object in order to be able to enlarge the string if
 * needed.
 *
 * On success, the function returns 1 if the cardinality changed, or 0
 * if the register for this element was not updated.
 * On error (if the representation is invalid) -1 is returned.
 *
 * As a side effect the function may promote the HLL representation from
 * sparse to dense: this happens when a register requires to be set to a value
 * not representable with the sparse representation, or when the resulting
 * size would be greater than server.hll_sparse_max_bytes. */
int hllSparseAdd(robj *o, unsigned char *ele, size_t elesize) {
    struct hllhdr *hdr;
    uint8_t oldcount, count, *sparse, *end, *p, *prev, *next;
    long index, first, span;
    long is_zero = 0, is_xzero = 0, is_val = 0, runlen = 0;

    /* Update the register if this element produced a longer run of zeroes. */
    count = hllPatLen(ele,elesize,&index);

    /* If the count is too big to be representable by the sparse representation
     * switch to dense representation. */
    if (count > HLL_SPARSE_VAL_MAX_VALUE) goto promote;

    /* When updating a sparse representation, sometimes we may need to
     * enlarge the buffer for up to 3 bytes in the worst case (XZERO split
     * into XZERO-VAL-XZERO). Make sure there is enough space right now
     * so that the pointers we take during the execution of the function
     * will be valid all the time. */
    o->ptr = sdsMakeRoomFor(o->ptr,3);

    /* Step 1: we need to locate the opcode we need to modify to check
     * if a value update is actually needed. */
    sparse = p = ((uint8_t*)o->ptr) + HLL_HDR_SIZE;
    end = p + sdslen(o->ptr) - HLL_HDR_SIZE;

    first = 0;
    prev = NULL; /* Points to previos opcode at the end of the loop. */
    next = NULL; /* Points to the next opcode at the end of the loop. */
    span = 0;
    while(p < end) {
        long oplen;

        /* Set span to the number of registers covered by this opcode.
         *
         * This is the most performance critical loop of the sparse
         * representation. Sorting the conditionals from the most to the
         * least frequent opcode in many-bytes sparse HLLs is faster. */
        oplen = 1;
        if (HLL_SPARSE_IS_ZERO(p)) {
            span = HLL_SPARSE_ZERO_LEN(p);
        } else if (HLL_SPARSE_IS_VAL(p)) {
            span = HLL_SPARSE_VAL_LEN(p);
        } else { /* XZERO. */
            span = HLL_SPARSE_XZERO_LEN(p);
            oplen = 2;
        }
        /* Break if this opcode covers the register as 'index'. */
        if (index <= first+span-1) break;
        prev = p;
        p += oplen;
        first += span;
    }
    if (span == 0) return -1; /* Invalid format. */

    next = HLL_SPARSE_IS_XZERO(p) ? p+2 : p+1;
    if (next >= end) next = NULL;

    /* Cache current opcode type to avoid using the macro again and
     * again for something that will not change.
     * Also cache the run-length of the opcode. */
    if (HLL_SPARSE_IS_ZERO(p)) {
        is_zero = 1;
        runlen = HLL_SPARSE_ZERO_LEN(p);
    } else if (HLL_SPARSE_IS_XZERO(p)) {
        is_xzero = 1;
        runlen = HLL_SPARSE_XZERO_LEN(p);
    } else {
        is_val = 1;
        runlen = HLL_SPARSE_VAL_LEN(p);
    }

    /* Step 2: After the loop:
     *
     * 'first' stores to the index of the first register covered
     *  by the current opcode, which is pointed by 'p'.
     *
     * 'next' ad 'prev' store respectively the next and previous opcode,
     *  or NULL if the opcode at 'p' is respectively the last or first.
     *
     * 'span' is set to the number of registers covered by the current
     *  opcode.
     *
     * There are different cases in order to update the data structure
     * in place without generating it from scratch:
     *
     * A) If it is a VAL opcode already set to a value >= our 'count'
     *    no update is needed, regardless of the VAL run-length field.
     *    In this case PFADD returns 0 since no changes are performed.
     *
     * B) If it is a VAL opcode with len = 1 (representing only our
     *    register) and the value is less than 'count', we just update it
     *    since this is a trivial case. */
    if (is_val) {
        oldcount = HLL_SPARSE_VAL_VALUE(p);
        /* Case A. */
        if (oldcount >= count) return 0;

        /* Case B. */
        if (runlen == 1) {
            HLL_SPARSE_VAL_SET(p,count,1);
            goto updated;
        }
    }

    /* C) Another trivial to handle case is a ZERO opcode with a len of 1.
     * We can just replace it with a VAL opcode with our value and len of 1. */
    if (is_zero && runlen == 1) {
        HLL_SPARSE_VAL_SET(p,count,1);
        goto updated;
    }

    /* D) General case.
     *
     * The other cases are more complex: our register requires to be updated
     * and is either currently represented by a VAL opcode with len > 1,
     * by a ZERO opcode with len > 1, or by an XZERO opcode.
     *
     * In those cases the original opcode must be split into muliple
     * opcodes. The worst case is an XZERO split in the middle resuling into
     * XZERO - VAL - XZERO, so the resulting sequence max length is
     * 5 bytes.
     *
     * We perform the split writing the new sequence into the 'new' buffer
     * with 'newlen' as length. Later the new sequence is inserted in place
     * of the old one, possibly moving what is on the right a few bytes
     * if the new sequence is longer than the older one. */
    uint8_t seq[5], *n = seq;
    int last = first+span-1; /* Last register covered by the sequence. */
    int len;

    if (is_zero || is_xzero) {
        /* Handle splitting of ZERO / XZERO. */
        if (index != first) {
            len = index-first;
            if (len > HLL_SPARSE_ZERO_MAX_LEN) {
                HLL_SPARSE_XZERO_SET(n,len);
                n += 2;
            } else {
                HLL_SPARSE_ZERO_SET(n,len);
                n++;
            }
        }
        HLL_SPARSE_VAL_SET(n,count,1);
        n++;
        if (index != last) {
            len = last-index;
            if (len > HLL_SPARSE_ZERO_MAX_LEN) {
                HLL_SPARSE_XZERO_SET(n,len);
                n += 2;
            } else {
                HLL_SPARSE_ZERO_SET(n,len);
                n++;
            }
        }
    } else {
        /* Handle splitting of VAL. */
        int curval = HLL_SPARSE_VAL_VALUE(p);

        if (index != first) {
            len = index-first;
            HLL_SPARSE_VAL_SET(n,curval,len);
            n++;
        }
        HLL_SPARSE_VAL_SET(n,count,1);
        n++;
        if (index != last) {
            len = last-index;
            HLL_SPARSE_VAL_SET(n,curval,len);
            n++;
        }
    }

    /* Step 3: substitute the new sequence with the old one.
     *
     * Note that we already allocated space on the sds string
     * calling sdsMakeRoomFor(). */
     int seqlen = n-seq;
     int oldlen = is_xzero ? 2 : 1;
     int deltalen = seqlen-oldlen;

     if (deltalen > 0 &&
         sdslen(o->ptr)+deltalen > server.hll_sparse_max_bytes) goto promote;
     if (deltalen && next) memmove(next+deltalen,next,end-next);
     sdsIncrLen(o->ptr,deltalen);
     memcpy(p,seq,seqlen);
     end += deltalen;

updated:
    /* Step 4: Merge adjacent values if possible.
     *
     * The representation was updated, however the resulting representation
     * may not be optimal: adjacent VAL opcodes can sometimes be merged into
     * a single one. */
    p = prev ? prev : sparse;
    int scanlen = 5; /* Scan up to 5 upcodes starting from prev. */
    while (p < end && scanlen--) {
        if (HLL_SPARSE_IS_XZERO(p)) {
            p += 2;
            continue;
        } else if (HLL_SPARSE_IS_ZERO(p)) {
            p++;
            continue;
        }
        /* We need two adjacent VAL opcodes to try a merge, having
         * the same value, and a len that fits the VAL opcode max len. */
        if (p+1 < end && HLL_SPARSE_IS_VAL(p+1)) {
            int v1 = HLL_SPARSE_VAL_VALUE(p);
            int v2 = HLL_SPARSE_VAL_VALUE(p+1);
            if (v1 == v2) {
                int len = HLL_SPARSE_VAL_LEN(p)+HLL_SPARSE_VAL_LEN(p+1);
                if (len <= HLL_SPARSE_VAL_MAX_LEN) {
                    HLL_SPARSE_VAL_SET(p+1,v1,len);
                    memmove(p,p+1,end-p);
                    sdsIncrLen(o->ptr,-1);
                    end--;
                    /* After a merge we reiterate without incrementing 'p'
                     * in order to try to merge the just merged value with
                     * a value on its right. */
                    continue;
                }
            }
        }
        p++;
    }

    /* Invalidate the cached cardinality. */
    hdr = o->ptr;
    HLL_INVALIDATE_CACHE(hdr);
    return 1;

promote: /* Promote to dense representation. */
    if (hllSparseToDense(o) == C_ERR) return -1; /* Corrupted HLL. */
    hdr = o->ptr;

    /* We need to call hllDenseAdd() to perform the operation after the
     * conversion. However the result must be 1, since if we need to
     * convert from sparse to dense a register requires to be updated.
     *
     * Note that this in turn means that PFADD will make sure the command
     * is propagated to slaves / AOF, so if there is a sparse -> dense
     * convertion, it will be performed in all the slaves as well. */
    int dense_retval = hllDenseAdd(hdr->registers, ele, elesize);
    serverAssert(dense_retval == 1);
    return dense_retval;
}
/* Compute SUM(2^-reg) in the sparse representation.稀疏表示计算sum
 * PE is an array with a pre-computer table of values 2^-reg indexed by reg.
 * As a side effect the integer pointed by 'ezp' is set to the number of zero registers. 副作用是ezp被设置为0寄存器*/
double hllSparseSum(uint8_t *sparse, int sparselen, double *PE, int *ezp, int *invalid) {
    double E = 0;
    int ez = 0, idx = 0, runlen, regval;
    uint8_t *end = sparse+sparselen, *p = sparse;

    while(p < end) {
        if (HLL_SPARSE_IS_ZERO(p)) {
            runlen = HLL_SPARSE_ZERO_LEN(p);
            idx += runlen;
            ez += runlen;
            /* Increment E at the end of the loop. */
            p++;
        } else if (HLL_SPARSE_IS_XZERO(p)) {
            runlen = HLL_SPARSE_XZERO_LEN(p);
            idx += runlen;
            ez += runlen;
            /* Increment E at the end of the loop. */
            p += 2;
        } else {
            runlen = HLL_SPARSE_VAL_LEN(p);
            regval = HLL_SPARSE_VAL_VALUE(p);
            idx += runlen;
            E += PE[regval]*runlen;
            p++;
        }
    }
    if (idx != HLL_REGISTERS && invalid) *invalid = 1;
    E += ez; /* Add 2^0 'ez' times. */
    *ezp = ez;
    return E;
}
/* ========================= HyperLogLog Count ==============================
 * This is the core of the algorithm where the approximated count is computed.这是算法的核心,近似值是怎么算出来的
 * The function uses the lower level hllDenseSum() and hllSparseSum() functions as helpers to compute the SUM(2^-reg) part of the computation, which is representation-specific, while all the rest is common. 这个函数使用低级的hllDenseSum和hllSparseSum函数帮助计算*/
/* Implements the SUM operation for uint8_t data type which is only used internally as speedup for PFCOUNT with multiple keys. 实现uint8_t
数据类型的sum操作,改数据类型只在内部使用,具有多个键的PFCOUNT加速*/
double hllRawSum(uint8_t *registers, double *PE, int *ezp) {
    double E = 0;
    int j, ez = 0;
    uint64_t *word = (uint64_t*) registers;
    uint8_t *bytes;

    for (j = 0; j < HLL_REGISTERS/8; j++) {
        if (*word == 0) {
            ez += 8;
        } else {
            bytes = (uint8_t*) word;
            if (bytes[0]) E += PE[bytes[0]]; else ez++;
            if (bytes[1]) E += PE[bytes[1]]; else ez++;
            if (bytes[2]) E += PE[bytes[2]]; else ez++;
            if (bytes[3]) E += PE[bytes[3]]; else ez++;
            if (bytes[4]) E += PE[bytes[4]]; else ez++;
            if (bytes[5]) E += PE[bytes[5]]; else ez++;
            if (bytes[6]) E += PE[bytes[6]]; else ez++;
            if (bytes[7]) E += PE[bytes[7]]; else ez++;
        }
        word++;
    }
    E += ez; /* 2^(-reg[j]) is 1 when m is 0, add it 'ez' times for every
                zero register in the HLL. */
    *ezp = ez;
    return E;
}
/* Return the approximated cardinality of the set based on the harmonic mean of the registers values. 'hdr' points to the start of the SDS representing the String object holding the HLL representation.基于寄存器值得平均值返回集合的近似基数,hdr指向表示持有HLL表示的对象的SDS开头
 * If the sparse representation of the HLL object is not valid, the integer pointed by 'invalid' is set to non-zero, otherwise it is left untouched.如果HLL对象的稀疏表示是无效的,整数指针会被置为非0,否则保持不变
 * hllCount() supports a special internal-only encoding of HLL_RAW, that is, hdr->registers will point to an uint8_t array of HLL_REGISTERS element.hllCount支持HLL_RAW的特殊内部编码,即hdr->寄存器将指向HLL_REGISTERS元素的uint8_t数组。
 * This is useful in order to speedup PFCOUNT when called against multiple keys (no need to work with 6-bit integers encoding). 这个用来加速调用多键时的PFCOUNT*/
uint64_t hllCount(struct hllhdr *hdr, int *invalid) {
    double m = HLL_REGISTERS;
    double E, alpha = 0.7213/(1+1.079/m);
    int j, ez; /* Number of registers equal to 0. */

    /* We precompute 2^(-reg[j]) in a small table in order to
     * speedup the computation of SUM(2^-register[0..i]). */
    static int initialized = 0;
    static double PE[64];
    if (!initialized) {
        PE[0] = 1; /* 2^(-reg[j]) is 1 when m is 0. */
        for (j = 1; j < 64; j++) {
            /* 2^(-reg[j]) is the same as 1/2^reg[j]. */
            PE[j] = 1.0/(1ULL << j);
        }
        initialized = 1;
    }

    /* Compute SUM(2^-register[0..i]). */
    if (hdr->encoding == HLL_DENSE) {
        E = hllDenseSum(hdr->registers,PE,&ez);
    } else if (hdr->encoding == HLL_SPARSE) {
        E = hllSparseSum(hdr->registers,
                         sdslen((sds)hdr)-HLL_HDR_SIZE,PE,&ez,invalid);
    } else if (hdr->encoding == HLL_RAW) {
        E = hllRawSum(hdr->registers,PE,&ez);
    } else {
        serverPanic("Unknown HyperLogLog encoding in hllCount()");
    }

    /* Muliply the inverse of E for alpha_m * m^2 to have the raw estimate. */
    E = (1/E)*alpha*m*m;

    /* Use the LINEARCOUNTING algorithm for small cardinalities.
     * For larger values but up to 72000 HyperLogLog raw approximation is
     * used since linear counting error starts to increase. However HyperLogLog
     * shows a strong bias in the range 2.5*16384 - 72000, so we try to
     * compensate for it. */
    if (E < m*2.5 && ez != 0) {
        E = m*log(m/ez); /* LINEARCOUNTING() */
    } else if (m == 16384 && E < 72000) {
        /* We did polynomial regression of the bias for this range, this
         * way we can compute the bias for a given cardinality and correct
         * according to it. Only apply the correction for P=14 that's what
         * we use and the value the correction was verified with. */
        double bias = 5.9119*1.0e-18*(E*E*E*E)
                      -1.4253*1.0e-12*(E*E*E)+
                      1.2940*1.0e-7*(E*E)
                      -5.2921*1.0e-3*E+
                      83.3216;
        E -= E*(bias/100);
    }
    /* We don't apply the correction for E > 1/30 of 2^32 since we use
     * a 64 bit function and 6 bit counters. To apply the correction for
     * 1/30 of 2^64 is not needed since it would require a huge set
     * to approach such a value. */
    return (uint64_t) E;
}
/* Call hllDenseAdd() or hllSparseAdd() according to the HLL encoding. 基于HLL的编码调用相应的函数*/
int hllAdd(robj *o, unsigned char *ele, size_t elesize) {
    struct hllhdr *hdr = o->ptr;
    switch(hdr->encoding) {
    case HLL_DENSE: return hllDenseAdd(hdr->registers,ele,elesize);
    case HLL_SPARSE: return hllSparseAdd(o,ele,elesize);
    default: return -1; /* Invalid representation. */
    }
}
/* Merge by computing MAX(registers[i],hll[i]) the HyperLogLog 'hll' with an array of uint8_t HLL_REGISTERS registers pointed by 'max'.通过使用MAX来计算寄存器的大值
 * The hll object must be already validated via isHLLObjectOrReply() or in some other way.hll对象必须是已经验证过的
 * If the HyperLogLog is sparse and is found to be invalid, C_ERR
 is returned, otherwise the function always succeeds. 如果机构是稀疏的而且找不到有效值,会返回C_ERR,其他情况下函数总是成功执行*/
int hllMerge(uint8_t *max, robj *hll) {
    struct hllhdr *hdr = hll->ptr;
    int i;

    if (hdr->encoding == HLL_DENSE) {
        uint8_t val;

        for (i = 0; i < HLL_REGISTERS; i++) {
            HLL_DENSE_GET_REGISTER(val,hdr->registers,i);
            if (val > max[i]) max[i] = val;
        }
    } else {
        uint8_t *p = hll->ptr, *end = p + sdslen(hll->ptr);
        long runlen, regval;

        p += HLL_HDR_SIZE;
        i = 0;
        while(p < end) {
            if (HLL_SPARSE_IS_ZERO(p)) {
                runlen = HLL_SPARSE_ZERO_LEN(p);
                i += runlen;
                p++;
            } else if (HLL_SPARSE_IS_XZERO(p)) {
                runlen = HLL_SPARSE_XZERO_LEN(p);
                i += runlen;
                p += 2;
            } else {
                runlen = HLL_SPARSE_VAL_LEN(p);
                regval = HLL_SPARSE_VAL_VALUE(p);
                while(runlen--) {
                    if (regval > max[i]) max[i] = regval;
                    i++;
                }
                p++;
            }
        }
        if (i != HLL_REGISTERS) return C_ERR;
    }
    return C_OK;
}
/* ========================== HyperLogLog commands ========================== */
/* Create an HLL object. We always create the HLL using sparse encoding. This will be upgraded to the dense representation as needed. 创建一个HLL对象,我们总是使用稀疏编码创建,在需要的情况下回升级为密集表示*/
robj *createHLLObject(void) {
    robj *o;
    struct hllhdr *hdr;
    sds s;
    uint8_t *p;
    int sparselen = HLL_HDR_SIZE +
                    (((HLL_REGISTERS+(HLL_SPARSE_XZERO_MAX_LEN-1)) /
                     HLL_SPARSE_XZERO_MAX_LEN)*2);
    int aux;

    /* Populate the sparse representation with as many XZERO opcodes as
     * needed to represent all the registers. */
    aux = HLL_REGISTERS;
    s = sdsnewlen(NULL,sparselen);
    p = (uint8_t*)s + HLL_HDR_SIZE;
    while(aux) {
        int xzero = HLL_SPARSE_XZERO_MAX_LEN;
        if (xzero > aux) xzero = aux;
        HLL_SPARSE_XZERO_SET(p,xzero);
        p += 2;
        aux -= xzero;
    }
    serverAssert((p-(uint8_t*)s) == sparselen);

    /* Create the actual object. */
    o = createObject(OBJ_STRING,s);
    hdr = o->ptr;
    memcpy(hdr->magic,"HYLL",4);
    hdr->encoding = HLL_SPARSE;
    return o;
}
/* Check if the object is a String with a valid HLL representation. Return C_OK if this is true, otherwise reply to the client with an error and return C_ERR. 检查对象是否是一个有效的HLL表示的字符串,返回成功或者失败的值*/
int isHLLObjectOrReply(client *c, robj *o) {
    struct hllhdr *hdr;

    /* Key exists, check type */
    if (checkType(c,o,OBJ_STRING))
        return C_ERR; /* Error already sent. */

    if (!sdsEncodedObject(o)) goto invalid;
    if (stringObjectLen(o) < sizeof(*hdr)) goto invalid;
    hdr = o->ptr;

    /* Magic should be "HYLL". */
    if (hdr->magic[0] != 'H' || hdr->magic[1] != 'Y' ||
        hdr->magic[2] != 'L' || hdr->magic[3] != 'L') goto invalid;

    if (hdr->encoding > HLL_MAX_ENCODING) goto invalid;

    /* Dense representation string length should match exactly. */
    if (hdr->encoding == HLL_DENSE &&
        stringObjectLen(o) != HLL_DENSE_SIZE) goto invalid;

    /* All tests passed. */
    return C_OK;

invalid:
    addReplySds(c,
        sdsnew("-WRONGTYPE Key is not a valid "
               "HyperLogLog string value.\r\n"));
    return C_ERR;
}

/* PFADD var ele ele ele ... ele => :0 or :1  增加元素*/
void pfaddCommand(client *c) {
    robj *o = lookupKeyWrite(c->db,c->argv[1]);
    struct hllhdr *hdr;
    int updated = 0, j;

    if (o == NULL) {
        /* Create the key with a string value of the exact length to
         * hold our HLL data structure. sdsnewlen() when NULL is passed
         * is guaranteed to return bytes initialized to zero. */
        o = createHLLObject();
        dbAdd(c->db,c->argv[1],o);
        updated++;
    } else {
        if (isHLLObjectOrReply(c,o) != C_OK) return;
        o = dbUnshareStringValue(c->db,c->argv[1],o);
    }
    /* Perform the low level ADD operation for every element. */
    for (j = 2; j < c->argc; j++) {
        int retval = hllAdd(o, (unsigned char*)c->argv[j]->ptr,
                               sdslen(c->argv[j]->ptr));
        switch(retval) {
        case 1:
            updated++;
            break;
        case -1:
            addReplySds(c,sdsnew(invalid_hll_err));
            return;
        }
    }
    hdr = o->ptr;
    if (updated) {
        signalModifiedKey(c->db,c->argv[1]);
        notifyKeyspaceEvent(NOTIFY_STRING,"pfadd",c->argv[1],c->db->id);
        server.dirty++;
        HLL_INVALIDATE_CACHE(hdr);
    }
    addReply(c, updated ? shared.cone : shared.czero);
}
/* PFCOUNT var -> approximated cardinality of set. 返回集合的一个近似的基数*/
void pfcountCommand(client *c) {
    robj *o;
    struct hllhdr *hdr;
    uint64_t card;

    /* Case 1: multi-key keys, cardinality of the union.
     *
     * When multiple keys are specified, PFCOUNT actually computes
     * the cardinality of the merge of the N HLLs specified. */
    if (c->argc > 2) {
        uint8_t max[HLL_HDR_SIZE+HLL_REGISTERS], *registers;
        int j;

        /* Compute an HLL with M[i] = MAX(M[i]_j). */
        memset(max,0,sizeof(max));
        hdr = (struct hllhdr*) max;
        hdr->encoding = HLL_RAW; /* Special internal-only encoding. */
        registers = max + HLL_HDR_SIZE;
        for (j = 1; j < c->argc; j++) {
            /* Check type and size. */
            robj *o = lookupKeyRead(c->db,c->argv[j]);
            if (o == NULL) continue; /* Assume empty HLL for non existing var.*/
            if (isHLLObjectOrReply(c,o) != C_OK) return;

            /* Merge with this HLL with our 'max' HHL by setting max[i]
             * to MAX(max[i],hll[i]). */
            if (hllMerge(registers,o) == C_ERR) {
                addReplySds(c,sdsnew(invalid_hll_err));
                return;
            }
        }

        /* Compute cardinality of the resulting set. */
        addReplyLongLong(c,hllCount(hdr,NULL));
        return;
    }

    /* Case 2: cardinality of the single HLL.
     *
     * The user specified a single key. Either return the cached value
     * or compute one and update the cache. */
    o = lookupKeyWrite(c->db,c->argv[1]);
    if (o == NULL) {
        /* No key? Cardinality is zero since no element was added, otherwise
         * we would have a key as HLLADD creates it as a side effect. */
        addReply(c,shared.czero);
    } else {
        if (isHLLObjectOrReply(c,o) != C_OK) return;
        o = dbUnshareStringValue(c->db,c->argv[1],o);

        /* Check if the cached cardinality is valid. */
        hdr = o->ptr;
        if (HLL_VALID_CACHE(hdr)) {
            /* Just return the cached value. */
            card = (uint64_t)hdr->card[0];
            card |= (uint64_t)hdr->card[1] << 8;
            card |= (uint64_t)hdr->card[2] << 16;
            card |= (uint64_t)hdr->card[3] << 24;
            card |= (uint64_t)hdr->card[4] << 32;
            card |= (uint64_t)hdr->card[5] << 40;
            card |= (uint64_t)hdr->card[6] << 48;
            card |= (uint64_t)hdr->card[7] << 56;
        } else {
            int invalid = 0;
            /* Recompute it and update the cached value. */
            card = hllCount(hdr,&invalid);
            if (invalid) {
                addReplySds(c,sdsnew(invalid_hll_err));
                return;
            }
            hdr->card[0] = card & 0xff;
            hdr->card[1] = (card >> 8) & 0xff;
            hdr->card[2] = (card >> 16) & 0xff;
            hdr->card[3] = (card >> 24) & 0xff;
            hdr->card[4] = (card >> 32) & 0xff;
            hdr->card[5] = (card >> 40) & 0xff;
            hdr->card[6] = (card >> 48) & 0xff;
            hdr->card[7] = (card >> 56) & 0xff;
            /* This is not considered a read-only command even if the
             * data structure is not modified, since the cached value
             * may be modified and given that the HLL is a Redis string
             * we need to propagate the change. */
            signalModifiedKey(c->db,c->argv[1]);
            server.dirty++;
        }
        addReplyLongLong(c,card);
    }
}
/* PFMERGE dest src1 src2 src3 ... srcN => OK 合并命令*/
void pfmergeCommand(client *c) {
    uint8_t max[HLL_REGISTERS];
    struct hllhdr *hdr;
    int j;

    /* Compute an HLL with M[i] = MAX(M[i]_j).
     * We we the maximum into the max array of registers. We'll write
     * it to the target variable later. */
    memset(max,0,sizeof(max));
    for (j = 1; j < c->argc; j++) {
        /* Check type and size. */
        robj *o = lookupKeyRead(c->db,c->argv[j]);
        if (o == NULL) continue; /* Assume empty HLL for non existing var. */
        if (isHLLObjectOrReply(c,o) != C_OK) return;

        /* Merge with this HLL with our 'max' HHL by setting max[i]
         * to MAX(max[i],hll[i]). */
        if (hllMerge(max,o) == C_ERR) {
            addReplySds(c,sdsnew(invalid_hll_err));
            return;
        }
    }

    /* Create / unshare the destination key's value if needed. */
    robj *o = lookupKeyWrite(c->db,c->argv[1]);
    if (o == NULL) {
        /* Create the key with a string value of the exact length to
         * hold our HLL data structure. sdsnewlen() when NULL is passed
         * is guaranteed to return bytes initialized to zero. */
        o = createHLLObject();
        dbAdd(c->db,c->argv[1],o);
    } else {
        /* If key exists we are sure it's of the right type/size
         * since we checked when merging the different HLLs, so we
         * don't check again. */
        o = dbUnshareStringValue(c->db,c->argv[1],o);
    }

    /* Only support dense objects as destination. */
    if (hllSparseToDense(o) == C_ERR) {
        addReplySds(c,sdsnew(invalid_hll_err));
        return;
    }

    /* Write the resulting HLL to the destination HLL registers and
     * invalidate the cached value. */
    hdr = o->ptr;
    for (j = 0; j < HLL_REGISTERS; j++) {
        HLL_DENSE_SET_REGISTER(hdr->registers,j,max[j]);
    }
    HLL_INVALIDATE_CACHE(hdr);

    signalModifiedKey(c->db,c->argv[1]);
    /* We generate an PFADD event for PFMERGE for semantical simplicity
     * since in theory this is a mass-add of elements. */
    notifyKeyspaceEvent(NOTIFY_STRING,"pfadd",c->argv[1],c->db->id);
    server.dirty++;
    addReply(c,shared.ok);
}
//下面是一些测试的函数,在此不表,粗略的看完执行过程,翻译函数解释后还没有完全理解,还需要再进行巩固
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值