clickhouse 在10亿医疗数据的应用

clickhouse 在10亿医疗数据的应用

分布式集群

shard

shard_counts = node_counts - replica_counts + 1
replica_counts 至少是2才有意义
if nc=3 rc=2
sc =2
配置就有问题 所以 nc至少是4

配置

config.xml,文件太长只显示关键配置

<listen_host>::</listen_host>
<path>/data/clickhouse/</path>
<max_server_memory_usage_to_ram_ratio>10000000000</max_server_memory_usage_to_ram_ratio>

user.xml

<yandex>
    <!-- Profiles of settings. -->
    <profiles>
        <!-- Default settings. -->
        <default>
            <!-- Maximum memory usage for processing single query, in bytes. -->
            <max_memory_usage>10000000000</max_memory_usage>

            <!-- Use cache of uncompressed blocks of data. Meaningful only for processing many of very short queries. -->
            <use_uncompressed_cache>0</use_uncompressed_cache>

            <!-- How to choose between replicas during distributed query processing.
                 random - choose random replica from set of replicas with minimum number of errors
                 nearest_hostname - from set of replicas with minimum number of errors, choose replica
                  with minimum number of different symbols between replica's hostname and local hostname
                  (Hamming distance).
                 in_order - first live replica is chosen in specified order.
                 first_or_random - if first replica one has higher number of errors, pick a random one from replicas with minimum number of errors.
            -->
            <load_balancing>random</load_balancing>
            <max_partitions_per_insert_block>1000</max_partitions_per_insert_block>
            <prefer_localhost_replica>1</prefer_localhost_replica>
            <max_parallel_replicas>2</max_parallel_replicas>
        </default>

        <!-- Profile that allows only read queries. -->
        <readonly>
            <readonly>1</readonly>
        </readonly>
    </profiles>

    <!-- Users and ACL. -->
    <users>
        <!-- If user name was not specified, 'default' user is used. -->
        <default>
            <!-- Password could be specified in plaintext or in SHA256 (in hex format).

                 If you want to specify password in plaintext (not recommended), place it in 'password' element.
                 Example: <password>qwerty</password>.
                 Password could be empty.

                 If you want to specify SHA256, place it in 'password_sha256_hex' element.
                 Example: <password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex>
                 Restrictions of SHA256: impossibility to connect to ClickHouse using MySQL JS client (as of July 2019).

                 If you want to specify double SHA1, place it in 'password_double_sha1_hex' element.
                 Example: <password_double_sha1_hex>e395796d6546b1b65db9d665cd43f0e858dd4303</password_double_sha1_hex>

                 If you want to specify a previously defined LDAP server (see 'ldap_servers' in main config) for authentication, place its name in 'server' element inside 'ldap' element.
                 Example: <ldap><server>my_ldap_server</server></ldap>

                 How to generate decent password:
                 Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
                 In first line will be password and in second - corresponding SHA256.

                 How to generate double SHA1:
                 Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
                 In first line will be password and in second - corresponding double SHA1.
            -->
            <password>xxxx</password>

            <!-- List of networks with open access.

                 To open access from everywhere, specify:
                    <ip>::/0</ip>

                 To open access only from localhost, specify:
                    <ip>::1</ip>
                    <ip>127.0.0.1</ip>

                 Each element of list has one of the following forms:
                 <ip> IP-address or network mask. Examples: 213.180.204.3 or 10.0.0.1/8 or 10.0.0.1/255.255.255.0
                     2a02:6b8::3 or 2a02:6b8::3/64 or 2a02:6b8::3/ffff:ffff:ffff:ffff::.
                 <host> Hostname. Example: server01.yandex.ru.
                     To check access, DNS query is performed, and all received addresses compared to peer address.
                 <host_regexp> Regular expression for host names. Example, ^server\d\d-\d\d-\d\.yandex\.ru$
                     To check access, DNS PTR query is performed for peer address and then regexp is applied.
                     Then, for result of PTR query, another DNS query is performed and all received addresses compared to peer address.
                     Strongly recommended that regexp is ends with $
                 All results of DNS requests are cached till server restart.
            -->
            <networks incl="networks" replace="replace">
                <ip>::/0</ip>
            </networks>

            <!-- Settings profile for user. -->
            <profile>default</profile>

            <!-- Quota for user. -->
            <quota>default</quota>

            <!-- User can create other users and grant rights to them. -->
            <!-- <access_management>1</access_management> -->
        </default>
    </users>

    <!-- Quotas. -->
    <quotas>
        <!-- Name of quota. -->
        <default>
            <!-- Limits for time interval. You could specify many intervals with different limits. -->
            <interval>
                <!-- Length of interval. -->
                <duration>3600</duration>

                <!-- No limits. Just calculate resource usage for time interval. -->
                <queries>0</queries>
                <errors>0</errors>
                <result_rows>0</result_rows>
                <read_rows>0</read_rows>
                <execution_time>0</execution_time>
            </interval>
        </default>
    </quotas>
</yandex>

metrika.xml 分布式配置

<yandex>
        <clickhouse_remote_servers>
                <hqms_clusters>
                        <shard>
                                <internal_replication>true</internal_replication>
                                        <replica>
                                                <host>10.19.54.102</host>
                                                <port>9000</port>
                                                <user>default</user>
                                                <password>hqms</password>
                                        </replica>
                                        <replica>
                                                <host>10.19.54.103</host>
                                                <port>9000</port>
                                                <user>default</user>
                                                <password>hqms</password>
                                        </replica>
                        </shard>
                        <shard>
                                <internal_replication>true</internal_replication>
                                        <replica>
                                                <host>10.19.54.104</host>
                                                <port>9000</port>
                                                <user>default</user>
                                                <password>hqms</password>
                                        </replica>
                                        <replica>
                                                <host>10.19.54.105</host>
                                                <port>9000</port>
                                                <user>default</user>
                                                <password>hqms</password>
                                        </replica>
                        </shard>
                        <!--<shard>
                                <internal_replication>true</internal_replication>
                                        <replica>
                                                <host>10.19.54.104</host>
                                                <port>9000</port>
                                                <user>default</user>
                                                <password>hqms</password>
                                        </replica>
                                        <replica>
                                                <host>10.19.54.102</host>
                                                <port>9000</port>
                                                <user>default</user>
                                                <password>hqms</password>
                                        </replica>
                        </shard>-->
                </hqms_clusters>
        </clickhouse_remote_servers>

        <zookeeper-servers>
                <node index="1">
                        <host>10.19.54.102</host>
                        <port>2181</port>
                </node>
                <node index="2">
                        <host>10.19.54.103</host>
                        <port>2181</port>
                </node>
                <node index="3">
                        <host>10.19.54.104</host>
                        <port>2181</port>
                </node>
        </zookeeper-servers>

        <macros>
                <shard>01</shard>
                <replica>10.19.54.102</replica>
        </macros>

        <networks>
                <ip>::/0</ip>
        </networks>

</yandex>

创建分布式表

官网

本地表

create table test1.jtest1(id Int,age Int) ENGINE = ReplicatedMergeTree('/clickhouse/tables/02/test1.jtest1', '10.19.54.105') PARTITION BY id order by id

分布式表

CREATE TABLE IF NOT EXISTS test1.jtest_all as jtest ENGINE = Distributed(hqms_clusters, test1, jtest1, rand())

问题

问题1

code: 241, host: 192.168.29.124, port: 18123; Code: 241, e.displayText() = DB::Exception: Memory limit (total) exceeded: would use 15.44 GiB (attempt to
	allocate chunk of 4294967296 bytes), maximum: 13.95 GiB (version 20.6.3.28 (official build))

参考

官网参数解释

max_server_memory_usage
限制ClickHouse服务器的总RAM使用量。
可能的值:
正整数。
0-自动。
预设值:0。
附加信息
默认max_server_memory_usage值计算为memory_amount * max_server_memory_usage_to_ram_ratio。
也可以看看
max_memory_usage
max_server_memory_usage_to_ram_ratio

max_server_memory_usage_to_ram_ratio
定义可用于Clickhouse服务器的总物理RAM量的一部分。如果服务器尝试利用更多资源,则会将内存减少到适当的数量。
可能的值:
正双。
0-Clickhouse服务器可以使用所有可用的RAM。
预设值:0。

用法
在具有低RAM和交换max_server_memory_usage_to_ram_ratio容量的主机上,您可能需要设置大于1的值。
例子

<max_server_memory_usage_to_ram_ratio>0.9</max_server_memory_usage_to_ram_ratio>

解决

1.调大max_server_memory_usage默认10G,调到20G问题解决,但是数据是一直增加的,不能可能无限调大
2.修改max_server_memory_usage_to_ram_ratio,默认0.9,是不是看起来像是百分比,其实不是设置到2G即可,解决问题,定义:max_server_memory_usage_to_ram_ratio
定义可用于Clickhouse服务器的总物理RAM量的一部分。如果服务器尝试利用更多资源,则会将内存减少到适当的数量。
可能的值。

问题2

数据量太大,写入数据太快,资源有限,写入已经达到1Gbs网络带宽,导致clickhouse,连接异常。
带宽打满

解决

采用分布式表,轮询写入不同分片的机器,分担带宽压力。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值