Clickhouse笔记——数据字典

最新推荐文章于 2023-09-05 21:41:35 发布

qq_23016999

最新推荐文章于 2023-09-05 21:41:35 发布

阅读量1.6k

点赞数

分类专栏： Clickhouse 文章标签：数据库

本文链接：https://blog.csdn.net/qq_23016999/article/details/107933885

版权

Clickhouse 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章目录

数据字典

数据字典

1. 内置字典

2. 外部扩展字典

2.1. 准备

以CSV数据为准，在ch的配置文件config.xml下，<dictionaries_config>*_dictionary.xml</dictionaries_config>, 默认将加载所有该匹配到的配置文件。并且支持不停机更新。完整的配置结构如下：

<?xml version="1.0"?>
<dictionaties>
    <dictionary>
        <name>geo</name>

        <structure>
            <!--典的数据结构-->
            <!-- <id> 或者 <key> 用于定义字典的键值,相当于数据库的主键,
包括数值型UInt64, 支持flat、hashed、range_hashed、cache-->
            <id>
                <!--Key属性-->
            </id>

            <!-- 符合型 -->
            <key>
                <attribute>
                    <name>field1</name>
                    <type>String</type>
                </attribute>
                <attribute>
                    <name>field2</name>
                    <type>UInt64</type>
                </attribute>
            </key>

            <attribute>
                <!--字段属性-->
            </attribute>
        </structure>

        <layout>
            <!--内存中的数据格式类型 7种-->
        </layout>

        <source>
            <!--数据源配置 文件、数据库和其他 三类数据来源-->
        </source>

        <lifetime>
            <!--字典的自动更新频率-->
        </lifetime>
    </dictionaty>
</dictionaties>

其中，标签attribute中的配置说明如下：

配置名称	是否必填	默认值	说明
name	是	–	字段名称
type	是	–	字段类型
null_value	是	–	在查询时，条件key没有对应元素时的默认值
expression	否	无表达式	表达式，可以调用或者表达式
hierarchical	否	false	是否支持层次结构
injective	否	false	是否支持集合单映射优化，开启后在后续的GROUP BY查询中，如果调用了dictGet函数通过key获取value，则该value直接从GROUP BY数据返回
is_object_id	否	false	是否开启MongoDB优化，通过ObjectID对MongoDB文档执行查询

2.2. 扩展字典的类型

扩展字典的类型使用layout元素定义，目前支持7种类型。一个字段类型，决定了其数据在内存中的存储结构和字典支持的key类型。一种是单数值的key类型，包括flat、hashed、range_hashed和cache;另一类是复合key类型，包括complex_key_hashed、complex_key_cache和ip_trie。

2.2.1. flat

flat字典是性能最高的字典类型，只能以UInt64数值型key,使用数组结构存储,初始大小为1024上限为500000,在创建字典时数据量超出其上限，那么字典会创建失败。以下为flat字典的示例：

测试数据：

1,"a0001","研发部"
2,"a0002","产品部"
3,"a0003","数据部"
4,"a0004","测试部"
5,"a0005","运维部"
6,"a0006","规划部"

配置文件：

<?xml version="1.0"?>
<dictionaries>
    <dictionary>
        <name>test_flat_dict</name>
        <!--数据源-->
        <source>
            <file>
                <path>/home/data/ch/organization.csv</path>
                <format>CSV</format>
            </file>
        </source>

        <!--字典类型-->
        <layout>
            <flat/>
        </layout>

        <!--与数据结构对应-->
        <structure>
            <id>
                <name>id</name>
            </id>
            
            <attribute>
                <name>code</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>

            <attribute>
                <name>name</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
        </structure>

        <lifetime>
            <min>300</min>
            <max>360</max>
        </lifetime>
    </dictionary>
</dictionaries>

由于字典数据是懒加载，起初的状态为 NOT_LOADED：
在这里插入图片描述

在查询了数据后：
在这里插入图片描述

2.2.2. hashed

hashed字典与flat不同的是，flat是以数组的方式存储，hashed则是散列结构，没有上限制约。
以下是hashed字典的配置示例：

<?xml version="1.0"?>
<dictionaries>
    <dictionary>
        <name>test_flat_dict</name>
        <!--数据源-->
        <source>
            <file>
                <path>/home/data/ch/organization.csv</path>
                <format>CSV</format>
            </file>
        </source>

        <!--字典类型  只有这个地方不一样-->
        <layout>
            <hashed/>
        </layout>

        <!--与数据结构对应-->
        <structure>
            <id>
                <name>id</name>
            </id>

            <attribute>
                <name>code</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>

            <attribute>
                <name>name</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
        </structure>

        <lifetime>
            <min>300</min>
            <max>360</max>
        </lifetime>
    </dictionary>
</dictionaries>

2.2.3. range_hashed

在hashed字典的基础上增加了指定时间区间的特性，数据会以散列结构存储并按照时间排序。时间的区间通过range_min和range_max元素指定，所指定的字段必须是Date或者DateTime类型。

<?xml version="1.0"?>
<dictionaries>
    <dictionary>
        <name>test_range_hashed_dict</name>
        <!--数据源-->
        <source>
            <file>
                <path>/home/data/ch/sales.csv</path>
                <format>CSV</format>
            </file>
        </source>

        <!--字典类型  只有这个地方不一样-->
        <layout>
            <range_hashed/>
        </layout>

        <!--与数据结构对应-->
        <structure>
            <id>
                <name>id</name>
            </id>

            <range_min>
                <name>start</name>
                <!--如果 type 如果没有指定，则默认类型将使用-Date-->
            </range_min>

            <range_max>
                <name>end</name>
            </range_max>

            <attribute>
                <name>price</name>
                <type>Float32</type>
                <null_value></null_value>
            </attribute>
        </structure>

        <lifetime>
            <min>300</min>
            <max>360</max>
        </lifetime>
    </dictionary>
</dictionaries>

2.2.4. cache

在内存中会通过固定长度的向量数组保存，长度为2 的整数倍并会自动向上取整，并不会像其他字典查询一次后一次性全部直接加载到内存，而是命中一次加载一次，所以性能最不稳定，完全取决于命中率（缓存命中率=命中次数/查询次数）

2.2.5. complex_key_hashed

该类型的字典在功能上与hashed字典完全相同，只是将单个的数值的key替换成了复合型

<?xml version="1.0"?>
<dictionaries>
    <dictionary>
        <name>test_complex_hashed_dict</name>
        <!--数据源-->
        <source>
            <file>
                <path>/home/data/ch/organization.csv</path>
                <format>CSV</format>
            </file>
        </source>

        <!--字典类型  只有这个地方不一样-->
        <layout>
            <complex_key_hashed/>
        </layout>

        <!--与数据结构对应-->
        <structure>
            <key>
                <attribute>
                    <name>id</name>
                    <type>UInt64</type>
                </attribute>
                <attribute>
                    <name>code</name>
                    <type>String</type>
                </attribute>
            </key>

            <attribute>
                <name>name</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
        </structure>

        <lifetime>
            <min>300</min>
            <max>360</max>
        </lifetime>
    </dictionary>
</dictionaries>

查询方式如下：
在这里插入图片描述

2.2.6. complex_key_cache

在cache字典的基础上，将单数值的key替换为复合型。

2.2.7. ip_trie

专门用于IP前缀查询的场景，配置如下：

<?xml version="1.0"?>
<dictionaries>
    <dictionary>
        <name>test_ip_trie_dict</name>
        <!--数据源-->
        <source>
            <file>
                <path>/home/data/ch/asn.csv</path>
                <format>CSV</format>
            </file>
        </source>

        <!--字典类型-->
        <layout>
            <ip_trie/>
        </layout>

        <!--与数据结构对应-->
        <structure>
            <key>
                <attribute>
                    <name>prefix</name>
                    <type>String</type>
                </attribute>
            </key>

            <attribute>
                <name>asn</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>

            <attribute>
                <name>country</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
        </structure>

        <lifetime>
            <min>300</min>
            <max>360</max>
        </lifetime>
    </dictionary>
</dictionaries>

在这里插入图片描述

2.2.8. 总结

名称	存储结构	字典键类型	支持的数据源
flat	数组	UInt64	Local file、Executable file、HTTP、DBMS
hashed	散列	UInt64	Local file、Executable file、HTTP、DBMS
range_hashed	散列按时间排序	UInt64和时间	Local file、Executable file、HTTP、DBMS
complex_key_hashed	散列	复合型key	Local file、Executable file、HTTP、DBMS
ip_trie	层次结构	复合型key(单个String)	Local file、Executable file、HTTP、DBMS
cache	固定大小数组	UInt64	Executable file、HTTP、ClickHouse、MySQL
complex_key_cache	固定大小数组	复合型key	Executable file、HTTP、ClickHouse、MySQL

2.3 数据源

qq_23016999

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Clickhouse笔记——数据字典

文章目录数据字典1. 内置字典2. 外部扩展字典2.1. 准备2.2. 扩展字典的类型2.2.1. flat2.2.2. hashed2.2.3. range_hashed2.2.4. cache2.2.5. complex_key_hashed2.2.6. complex_key_cache2.2.7. ip_trie2.2.8. 总结2.3 数据源数据字典1. 内置字典2. 外部扩展字典2.1. 准备以CSV数据为准，在ch的配置文件config.xml下，<dictionaries_c
复制链接

扫一扫