ClickHouse源码阅读(0000 1111) —— ClickHouse中的bitmap实现

B_e_a_u_tiful1205

已于 2022-12-29 10:15:41 修改

阅读量2.1k

点赞数

分类专栏： Dive into ClickHouse 文章标签： clickhouse java 哈希算法

于 2020-06-06 17:27:33 首次发布

本文链接：https://blog.csdn.net/B_e_a_u_tiful1205/article/details/106589586

版权

Dive into ClickHouse 专栏收录该内容

16 篇文章 26 订阅

订阅专栏

项目中使用到了ClickHouse的bimtap结构，来分析下ClickHouse中bimtap的具体实现。

ClickHouse中bitmap结构的类型为 AggregateFunction(groupBitmap, UInt32)，对应如下源码：

    template<typename T>
    struct AggregateFunctionGroupBitmapData
    {
        RoaringBitmapWithSmallSet<T, 32> rbs;

        static const char *name()
        { return "groupBitmap"; }
    };

重点看下 RoaringBitmapWithSmallSet的实现：

/**
  * For a small number of values - an array of fixed size "on the stack". 对于少量的值(small_set_size = 32), 在堆栈上分配固定大小的数组进行存储
  * For large, roaring_bitmap_t is allocated.                             对于大量的值, 使用RoaringBitmap进行存储
  * For a description of the roaring_bitmap_t, see: https://github.com/RoaringBitmap/CRoaring
  */
    template<typename T, UInt8 small_set_size>
    class RoaringBitmapWithSmallSet : private boost::noncopyable
    {
    private:
        using Small = SmallSet<T, small_set_size>;
        using ValueBuffer = std::vector<T>;
        Small small;
        roaring_bitmap_t *rb = nullptr;

        void toLarge()
        {
            rb = roaring_bitmap_create();

            for (const auto &x : small)
                roaring_bitmap_add(rb, x.getValue());
        }

    public:
        bool isLarge() const
        { return rb != nullptr; }

        bool isSmall() const
        { return rb == nullptr; }
        
        ......
    private:
        /// To read and write the DB Buffer directly, migrate code from CRoaring
        //拷贝了CRoaring的代码, 用于直接读写DB Buffer
        void db_roaring_bitmap_add_many(DB::ReadBuffer &dbBuf, roaring_bitmap_t *r, size_t n_args)

        ......

   }

可以看到RoaringBitmapWithSmallSet有2个主要的成员变量：Small和roaring_bitmap_t，其中roaring_bitmap_t就是著名的RoaringBimtap(RBM)了，CK中对于大量的值（个数大于32）的存储是基于RoaringBimtap的。

对于少量数据呢，CK是基于自己实现的SmallSet<T, small_set_size>结果来进行存储的。在针对bitmap的and/or等操作的一些方法中，也可以看到代码中会使用isSmall()或isLarge()方法去判断两个bitmap的实现，必要时使用toLarge()方法将bitmap转化成RoaringBimtap的实现，再调用RBM中的方法实现不同的操作。

对于SmallSet<T, small_set_size>，看下其具体实现：

template
<
    typename Key,
    size_t capacity
>
using SmallSet = SmallTable<Key, HashTableCell<Key, HashUnused>, capacity>;

/** Replacement of the hash table for a small number (<10) of keys.     key的数量<10的时候, 可使用SmallTable替换hash table.
  * Implemented as an array with linear search.                         基于线性搜索的数组实现的
  * The array is located inside the object.                             数组位于对象内部
  * The interface is a subset of the HashTable interface.               这个接口是HashTable接口的一个子集
  *
  * Insert is possible only if the `full` method returns false.
  * With an unknown number of different keys, you should check if the table is not full,
  *  and do a `fallback` in this case (for example, use a real hash table).
  *  只有在full()方法返回false时才可以插入.
  *  对于未知数量的不同keys, 应检查表是否未满. 如果满了, 执行“回退”, 再使用真正的哈希表
  */
template
<
    typename Key,
    typename Cell,
    size_t capacity
>
class SmallTable :
    private boost::noncopyable,
    protected Cell::State
{
protected:
    friend class const_iterator;
    friend class iterator;
    friend class Reader;

    using Self = SmallTable;
    using cell_type = Cell;

    size_t m_size = 0;        /// Amount of elements.
    Cell buf[capacity];       /// A piece of memory for all elements. buf是一个Cell数组, 数组大小为capacity(capacity = 32).


    /// Find a cell with the same key or an empty cell, starting from the specified position and then by the collision resolution chain.
    /// 从指定位置开始, 找到一个Cell, 该Cell可能为空, 也可能包含一些keys. (如果不为空)遍历这些keys, 找到和x相同的key.
    const Cell * ALWAYS_INLINE findCell(const Key & x) const
    {
        const Cell * it = buf;
        while (it < buf + m_size)
        {
            if (it->keyEquals(x))
                break;
            ++it;
        }
        return it;
    }

    Cell * ALWAYS_INLINE findCell(const Key & x)
    {
        Cell * it = buf;
        while (it < buf + m_size)
        {
            if (it->keyEquals(x))
                break;
            ++it;
        }
        return it;
    }


public:
    using key_type = Key;
    using value_type = typename Cell::value_type;

......

}

可以看到SmallTable两个主要的成员变量：

    size_t m_size = 0;        /// Amount of elements.
    Cell buf[capacity];       /// A piece of memory for all elements. buf是一个Cell数组, 数组大小为capacity(capacity = 32).

其中，buf是一个Cell数组, 数组大小为capacity(capacity = 32).

其中Cell是HashTableCell<Key, HashUnused>，其具体实现为：

/** hash table中cell的编译时接口
  * 不同的cell类型用于实现不同的哈希表
  * cell中必须包含键
  * cell中还可以包含一个值和任意的附加数据（例如：存储的哈希值；ClearableHashMap的版本号）
 */
/** Compile-time interface for cell of the hash table.
  * Different cell types are used to implement different hash tables.
  * The cell must contain a key.
  * It can also contain a value and arbitrary additional data
  *  (example: the stored hash value; version number for ClearableHashMap).
  */
template <typename Key, typename Hash, typename TState = HashTableNoState>
struct HashTableCell
{
    using State = TState;

    using value_type = Key;
    Key key;

    HashTableCell() {}

    ......

}

HashTableCell中也只有一个成员变量Key。

（以上类很多都是基于C++ 类模版来实现的，暂时只写这么多）

前面提到，在针对bitmap的and/or等操作的一些方法中，也可以看到代码中会使用isSmall()或isLarge()方法去判断两个bitmap的实现，必要时使用toLarge()方法将bitmap转化成RoaringBimtap的实现，再调用RBM中的方法实现不同的操作。

下面分析几个具体的方法：

第一个，向bitmap中添加元素的时候, 根据bitmap的不同实现方式去判断.

        void add(T value) //向bitmap中添加元素的时候, 根据bitmap的不同实现方式去判断.
        {
            if (isSmall())
            {
                if (small.find(value) == small.end())
                {
                    if (!small.full())
                        small.insert(value);
                    else
                    {
                        toLarge();
                        roaring_bitmap_add(rb, value);
                    }
                }
            } else
                roaring_bitmap_add(rb, value);
        }

相对简单，一看就很清楚了。

第二个，bitmapAnd的实现

/**
         * Computes the intersection between two bitmaps
         * bitmapAnd的实现
         */
        void rb_and(const RoaringBitmapWithSmallSet &r1)
        {
            ValueBuffer buffer;
            if (isSmall() && r1.isSmall())
            {
                // intersect
                for (const auto &x : small)
                    if (r1.small.find(x.getValue()) != r1.small.end())
                        buffer.push_back(x.getValue());

                // Clear out the original values
                small.clear();

                for (const auto &value : buffer)
                    small.insert(value);

                buffer.clear();
            } else if (isSmall() && r1.isLarge())
            {
                for (const auto &x : small)
                    if (roaring_bitmap_contains(r1.rb, x.getValue()))
                        buffer.push_back(x.getValue());

                // Clear out the original values
                small.clear();

                for (const auto &value : buffer)
                    small.insert(value);

                buffer.clear();
            } else
            {
                roaring_bitmap_t *rb1 = r1.isSmall() ? r1.getNewRbFromSmall() : r1.getRb();
                roaring_bitmap_and_inplace(rb, rb1);
                if (r1.isSmall())
                    roaring_bitmap_free(rb1);
            }
        }

r0 AND r1，具体过程为，判断r0的实现，如果r0.isSmall()=true，遍历r0，当前值r1中也存在则将该值放入中间结果集合buffer中，最后将buffer中的值，放回r0中；如果r0.isSmall()=false，调用RBM的roaring_bitmap_and_inplace(rb, rb1);方法进行计算。

第三个，

/**
         * Computes the cardinality of the intersection between two bitmaps.
         * bitmapAndCardinality的实现
         */
        UInt64 rb_and_cardinality(const RoaringBitmapWithSmallSet &r1) const
        {
            UInt64 retSize = 0;
            if (isSmall() && r1.isSmall())
            {
                for (const auto &x : small)
                    if (r1.small.find(x.getValue()) != r1.small.end())
                        retSize++;
            } else if (isSmall() && r1.isLarge())
            {
                for (const auto &x : small)
                    if (roaring_bitmap_contains(r1.rb, x.getValue()))
                        retSize++;
            } else
            {
                roaring_bitmap_t *rb1 = r1.isSmall() ? r1.getNewRbFromSmall() : r1.getRb();
                retSize = roaring_bitmap_and_cardinality(rb, rb1);
                if (r1.isSmall())
                    roaring_bitmap_free(rb1);
            }
            return retSize;
        }

该方法是2个bitmap做完AND运算后返回结果集的基数。重点看下retSize = roaring_bitmap_and_cardinality(rb, rb1); 方法，看下RBM中的具体实现，源码如下：

uint64_t roaring_bitmap_and_cardinality(const roaring_bitmap_t *x1,
                                        const roaring_bitmap_t *x2) {
    const int length1 = x1->high_low_container.size,
              length2 = x2->high_low_container.size;
    uint64_t answer = 0;
    int pos1 = 0, pos2 = 0;

    while (pos1 < length1 && pos2 < length2) {
        const uint16_t s1 = ra_get_key_at_index(&x1->high_low_container, pos1);
        const uint16_t s2 = ra_get_key_at_index(&x2->high_low_container, pos2);

        if (s1 == s2) {
            uint8_t container_type_1, container_type_2;
            void *c1 = ra_get_container_at_index(&x1->high_low_container, pos1,
                                                 &container_type_1);
            void *c2 = ra_get_container_at_index(&x2->high_low_container, pos2,
                                                 &container_type_2);
            answer += container_and_cardinality(c1, container_type_1, c2,
                                                container_type_2);
            ++pos1;
            ++pos2;
        } else if (s1 < s2) {  // s1 < s2
            pos1 = ra_advance_until(&x1->high_low_container, s2, pos1);
        } else {  // s1 > s2
            pos2 = ra_advance_until(&x2->high_low_container, s1, pos2);
        }
    }
    return answer;
}

这里需要了解一下RBM的high_low_container实现。

typedef struct roaring_array_s {
    int32_t size;
    int32_t allocation_size;
    void **containers;
    uint16_t *keys;
    uint8_t *typecodes;
} roaring_array_t;

（摘抄两句话）RBM的主要思路是：将32位无符号整数按照高16位分桶，即最多可能有216=65536个桶，论文内称为container。存储数据时，按照数据的高16位找到container（找不到就会新建一个），再将低16位放入container中。也就是说，一个RBM就是很多container的集合。

每个32位的整形，高16位会被作为key存储到short[] keys中，低16位则被看做value，存储到Container[] values中的某个Container中。keys和values通过下标一一对应。
size则标示了当前包含的key-value pair的数量，即keys和values中有效数据的数量。
keys数组永远保持有序，方便二分查找。

container_and_cardinality()就是针对3中Container可能任意组合成的9中情况进行分别的判断。（后面就更底层了，暂时就不求甚解了）。

有兴趣的还可以看下这篇文章RoaringBitmap数据结构及原理。

B_e_a_u_tiful1205

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
ClickHouse源码阅读(0000 1111) —— ClickHouse中的bitmap实现

项目中使用到了ClickHouse的bimtap结构，来分析下ClickHouse中bimtap的具体实现。ClickHouse中bitmap结构的类型为 AggregateFunction(groupBitmap, UInt32)，对应如下源码： template<typename T> struct AggregateFunctionGroupBitmapData { RoaringBitmapWithSmallSet<T, 32>
复制链接

扫一扫