Centos 6.4 / ClickHouse高可用集群配置

ClickHouse高可用集群配置

参考如下文档:

 ClickHouse高可用集群的配置:    https://www.cnblogs.com/freeweb/p/9352947.html 

 副本表以及分布式表简单实践 :  https://www.cnblogs.com/leodaxin/p/10023641.html

 ClickHouse之集群搭建以及数据复制:https://blog.csdn.net/weixin_34269583/article/details/85880207

ClickHouse分布式高可用集群搭建: https://cloud.tencent.com/developer/article/1376125

 

一,clickhouse高可用集群的部署方案

       这里主要介绍clickhouse高可用集群的部署方案,因为对于默认的分布式表的配置,每个分片只有一份,这样如果挂掉一个节点,则查询分布式表的时候直接会报错,这个是基于clickhouse自己实现的多分片单副本集群,配置也比较简单,这里说的高可用是指,每个分片具有2个或以上副本,当某个节点挂掉时,该节点分片仍可以由其他机器上的副本替代工作,所以这样实现的分布式集群可以在挂掉至少1个节点时机器正常运行,随着集群节点数量的增加,则集群挂掉2个节点或以上可提供服务的概率也越大,至少能避免单点故障问题,集群的稳定性也更高。

       clickhouse集群的理想方案是如下所示:

   这里有3个集群,每个集群n个节点,每个节点的数据依靠zookeeper协调同步,比如cluster1提供服务,如果cluster1里面挂掉多台机器那么cluster2的副本可以切换过来提供服务,如果cluster2的分片再挂了,那么cluster3中的副本也可以提供服务,cluster1~3同时挂掉的概率就非常小了,所以集群的稳定性可以非常高,其中单个集群的节点个数n决定了clickhouse的性能,性能是可以线性扩展的,具体副本集群的个数根据机器资源配置.

    如果机器资源确实特别少,想每个节点都用上提供服务的话,那么可以每个节点存储两个以上的副本,即提供服务的分片和其他机器的副本,实现相互备份,但是clickhouse不支持单个节点多个分片的配置,我们可以人为设置在每个节点上启动两个实例来实现,设计图如下:

 可以看出来3个节点每个节点的tcp 9000对外提供服务,9001提供副本,其中2提供1的备份,3提供2的备份,1提供3的备份,这样假设挂掉1个节点,集群也可以正常使用,但是挂掉2个几点,就不正常了,这样的话是机器越多越稳定一些.

    上面两种方案,官网上还是推荐的第一种方案可用性最高,这里为了演示采用第二种方式配置,其实两种方式的配置是完全一样的,第二种配置反而更繁琐一些,下面详细说一下配置的流程,软件包结构就采用上一篇文章打包好的.

     zookeeper 需要单独部署在其他机器上,以免clickhouse 节点挂掉,引起zookeeper 挂掉。

    0. 高可用原理:zookeeper + ReplicatedMergeTree(复制表) + Distributed(分布式表)

    1. 前提准备:所有节点防火墙关闭或者开放端口;所有节点建立互信(免密码登录);hosts表和主机名一定要集群保持一致正确配置,因为zookeeper返回的是主机名,配置错误或不配置复制表时会失败.

     clickhouse测试节点3个:10.121.8.6  dc-server06(clickhouse1), 10.121.8.7  dc-server07 (clickhouse2),10.121.8.8  dc-server08 (clickhouse3)

     zookeeper测试节点1个:10.121.8.8  dc-server08  (zookeeper)

     配置方案:3个节点点各配置两个clickhouse实例,相互备份.

    dc-server06: 实例1, 端口: tcp 9000, http 8123, 同步端口9009, 类型: 分片1, 副本1

    dc-server06: 实例2, 端口: tcp 9001, http 8124, 同步端口9010, 类型: 分片3, 副本3 (clickhouse3的副本)

    dc-server07: 实例1, 端口: tcp 9000, http 8123, 同步端口9009, 类型: 分片2, 副本1

    dc-server07: 实例2, 端口: tcp 9001, http 8124, 同步端口9010, 类型: 分片1, 副本2 (clickhouse1的副本)

    dc-server08: 实例1, 端口: tcp 9000, http 8123, 同步端口9009, 类型: 分片3, 副本1

    dc-server08: 实例2, 端口: tcp 9001, http 8124, 同步端口9010, 类型: 分片2, 副本2 (clickhouse2的副本)

 

二,环境准备

1)主机配置(根据自身情况)

10.121.8.6  dc-server06  centos 6.4/6.6    32G   200G

10.121.8.7  dc-server07  centos 6.4/6.6    32G   200G

10.121.8.8  dc-server08  centos 6.4/6.6    32G   200G

2)hosts表和主机名

3)所有节点防火墙关闭或者开放端口;

# 1.关闭防火墙

  service iptables stop

  chkconfig iptables off

  chkconfig ip6tables off

# 2.关闭selinux
  vi /etc/sysconfig/selinux
    修改内容:【SELINUX=disabled】

4)所有节点建立互信(免密码登录);

1、在各节点通过ssh-keygen生成RSA密钥和公钥
ssh-keygen -q -t rsa  -N "" -f  ~/.ssh/id_rsa
2、将所有的公钥文件汇总到一个总的授权key文件中,在M01机器执行如下命令,必须逐行执行:
ssh deserver06 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh deserver07 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh deserver08 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3、设置授权key文件的权限,在M01机器执行如下命令:
chmod 600 ~/.ssh/authorized_keys
4、分发授权key文件到所有服务器,必须逐行执行:
scp ~/.ssh/authorized_keys  deserver06:~/.ssh/

scp ~/.ssh/authorized_keys  deserver07:~/.ssh/

scp ~/.ssh/authorized_keys  deserver08:~/.ssh/

三,安装包准备

说明: 

1)当系统为centos  6.4  时 ,需要安装libicu-4.2.1-14.el6.x86_64.rpm 包,clickhouse-common-static-18.14.12-1.el6.x86_64.rpm依赖此包。

2)当系统为centos  6.6  时 ,不需要安装libicu-4.2.1-14.el6.x86_64.rpm包,直接安装其他包即可。

3) 安装资源获取:   https://packagecloud.io/Altinity/clickhouse。

4)也可以直接在csdn 下载:https://download.csdn.net/download/u013205003/10968848

四,安装clickhouse(centos 6.4 为例)

1) 分别在个服务器上进行如下安装和配置

2)安装clickhouse

   安装 libicu

    mkdir -p /usr/local/icu/
    cd  /usr/local/icu/libicu-4.2.1-14.el6.x86_64.rpm
    rpm -ivh  libicu-4.2.1-14.el6.x86_64.rpm

安装clickhouse

rpm -ivh  clickhouse-server-common-18.14.12-1.el6.x86_64.rpm
rpm -ivh  clickhouse-compressor-1.1.54336-3.el6.x86_64.rpm
rpm -ivh  clickhouse-common-static-18.14.12-1.el6.x86_64.rpm   
rpm -ivh  clickhouse-server-18.14.12-1.el6.x86_64.rpm
rpm -ivh  clickhouse-client-18.14.12-1.el6.x86_64.rpm
rpm -ivh  clickhouse-test-18.14.12-1.el6.x86_64.rpm
rpm -ivh  clickhouse-debuginfo-18.14.12-1.el6.x86_64.rpm

 3) 修改启动脚本和配置文件

    首先将启动脚本复制一个出来,除了上一篇文章说的外,主要修改配置文件位置和pid文件位置,如下:

    启动脚本路径:/etc/init.d/clickhouse-server

   复制

修改

这里配置文件比如使用config1.xml,pid使用clickhouse-server-1.pid

    然后进入到配置文件目录,将原有配置文件拷贝一份,这里是config1.xml,然后修改配置:

    主要修改内容是:日志文件(和之前不要冲突)、http端口、tcp端口、副本同步端口(这个改完之后clickhouse按照当前实例的端口自动和其他实例同步)、数据文件和tmp目录、users.xml(这个如果都一样可以用同一个)、最后就是集群配置了,下面重点叙述:

    集群配置默认为:<remote_servers incl="clickhouse_remote_servers" />

    zookeeper默认为:<zookeeper incl="zookeeper-servers" optional="true" />

    macros默认为:<macros incl="macros" optional="true" />

首先是集群分片的配置,这个配置所有节点的所有实例完全保持一致:

<remote_servers>
     <distable>
        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>true</internal_replication>
            <replica>
                <host>10.121.8.6</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>10.121.8.7</host>
                <port>9001</port>
            </replica>
        </shard>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>10.121.8.7</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>10.121.8.8</host>
                <port>9001</port>
            </replica>
        </shard>
		<shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>10.121.8.8</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>10.121.8.6</host>
                <port>9001</port>
            </replica>
        </shard>
    </distable>
</remote_servers>

 配置里面的<distable>是分布式标识标签,可以自定义,到最后创建分布式表的时候会用到;然后weight是分片权重,即写数据时有多大的概率落到此分片,因为这里所有分片权重相同所有都设置为1,然后是internal_replication,表示是否只将数据写入其中一个副本,默认为false,表示写入所有副本,在复制表的情况下可能会导致重复和不一致,所以这里一定要改为true,clickhouse分布式表只管写入一个副本,其余同步表的事情交给复制表和zookeeper来进行,然后是replica配置这个好理解,就是一个分片下的所有副本,这里副本的分布一定要手动设计好,保证相互备份,然后再次说明是所有的节点配置一致. 此部分配置严格按照官网配置,参考链接:https://clickhouse.yandex/docs/en/operations/table_engines/distributed/

    然后是zookeeper配置,这个也是所有示例配置都一样:

<zookeeper>
    <node index="1">
        <host>10.121.8.8</host>
        <port>2181</port>
    </node>
</zookeeper>

 这里zookeeper只有一个,如果多个的话继续往下写,就像官网上给出的一样,参考下图:

 然后是复制标识的配置,也称为宏配置,这里唯一标识一个副本名称,每个实例都要配置并且都是唯一的,这里配置如下:

clickhouse1 9000 分片1, 副本1:

<macros>
    <layer>01</layer>
    <shard>01</shard>
    <replica>cluster01-01-1</replica>
</macros>

  clickhouse1 9001 分片3, 副本2:

<macros>
    <layer>01</layer>
    <shard>03</shard>
    <replica>cluster01-03-2</replica>
</macros>

  clickhouse2 9000 分片2, 副本1:

<macros>
    <layer>01</layer>
    <shard>02</shard>
    <replica>cluster01-02-1</replica>
</macros>

  clickhouse2 9001 分片1, 副本2:

<macros>
    <layer>01</layer>
    <shard>01</shard>
    <replica>cluster01-01-2</replica>
</macros>

clickhouse3 9000 分片3, 副本1:

<macros>
    <layer>01</layer>
    <shard>03</shard>
    <replica>cluster01-03-1</replica>
</macros>

    clickhouse3 9001 分片2, 副本2:

<macros>
    <layer>01</layer>
    <shard>02</shard>
    <replica>cluster01-02-2</replica>
</macros>

 由上面配置可以看到replica的分布规律,其中layer是双级分片设置,在Yandex公司的集群中用到,因为我们这里是单集群所以这个值对我们没有影响全部一样即可,这里是01;然后是shard表示分片编号;最后是replica是副本标识,这里使用了cluster{layer}-{shard}-{replica}的表示方式,比如cluster01-02-1表示cluster01集群的02分片下的1号副本,这样既非常直观的表示又唯一确定副本. 副本的文档链接下面会给出.

2,依次启动3个节点上的 各2 个实例,总共 6个实例   

# 启动实例一
[root@dc-server06 clickhouse-server]# service clickhouse-server start
# 启动实例二
[root@dc-server06 clickhouse-server]# service clickhouse-server2 start

  依次检查 3个节点上的 各2 个实例,总共 6个实例

请参考配置文件(详细):/etc/clickhouse-server/config.xml  

 dc-server06  / clickhouse1  实例一

<?xml version="1.0"?>
<yandex>
    <logger>
        <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
        <level>trace</level>
        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
        <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
        <size>1000M</size>
        <count>10</count>
        <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
    </logger>
    <!--display_name>production</display_name--> <!-- It is the name that will be shown in the client -->
    <http_port>8123</http_port>
    <tcp_port>9000</tcp_port>

    <!-- For HTTPS and SSL over native protocol. -->
    <!--
    <https_port>8443</https_port>
    <tcp_port_secure>9440</tcp_port_secure>
    -->

    <!-- Used with https_port and tcp_port_secure. Full ssl options list: https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h#L71 -->
    <openSSL>
        <server> <!-- Used for https server AND secure tcp port -->
            <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
            <certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
            <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>
            <!-- openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096 -->
            <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>
            <verificationMode>none</verificationMode>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
        </server>

        <client> <!-- Used for connecting to https dictionary source -->
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
            <invalidCertificateHandler>
                <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
                <name>RejectCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>

    <!-- Default root page on http[s] server. For example load UI from https://tabix.io/ when opening http://localhost:8123 -->
    <!--
    <http_server_default_response><![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]></http_server_default_response>
    -->

    <!-- Port for communication between replicas. Used for data exchange. -->
    <interserver_http_port>9009</interserver_http_port>

    <!-- Hostname that is used by other replicas to request this server.
         If not specified, than it is determined analoguous to 'hostname -f' command.
         This setting could be used to switch replication to another network interface.
      -->

    <interserver_http_host>10.121.8.6</interserver_http_host>
 

    <!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->
     <listen_host>::</listen_host> 
    <!-- Same for hosts with disabled ipv6: -->
    <!-- <listen_host>0.0.0.0</listen_host> -->
    <!--  <listen_host>0.0.0.0</listen_host> -->
    <!-- Default values - try listen localhost on ipv4 and ipv6: -->
    <!--
    <listen_host>::1</listen_host>
    <listen_host>127.0.0.1</listen_host>
    -->
    <!-- Don't exit if ipv6 or ipv4 unavailable, but listen_host with this protocol specified -->
    <!-- <listen_try>0</listen_try> -->

    <!-- Allow listen on same address:port -->
    <!-- <listen_reuse_port>0</listen_reuse_port> -->

    <!-- <listen_backlog>64</listen_backlog> -->

    <max_connections>4096</max_connections>
    <keep_alive_timeout>3</keep_alive_timeout>

    <!-- Maximum number of concurrent queries. -->
    <max_concurrent_queries>100</max_concurrent_queries>

    <!-- Set limit on number of open files (default: maximum). This setting makes sense on Mac OS X because getrlimit() fails to retrieve
         correct maximum value. -->
    <!-- <max_open_files>262144</max_open_files> -->

    <!-- Size of cache of uncompressed blocks of data, used in tables of MergeTree family.
         In bytes. Cache is single for server. Memory is allocated only on demand.
         Cache is used when 'use_uncompressed_cache' user setting turned on (off by default).
         Uncompressed cache is advantageous only for very short queries and in rare cases.
      -->
    <uncompressed_cache_size>8589934592</uncompressed_cache_size>

    <!-- Approximate size of mark cache, used in tables of MergeTree family.
         In bytes. Cache is single for server. Memory is allocated only on demand.
         You should not lower this value.
      -->
    <mark_cache_size>5368709120</mark_cache_size>


    <!-- Path to data directory, with trailing slash. -->
    <path>/data/clickhouse/</path>

    <!-- Path to temporary data for processing hard queries. -->
    <tmp_path>/data/clickhouse/tmp/</tmp_path>

    <!-- Directory with user provided files that are accessible by 'file' table function. -->
    <user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

    <!-- Path to configuration file with users, access rights, profiles of settings, quotas. -->
    <users_config>users.xml</users_config>

    <!-- Default profile of settings. -->
    <default_profile>default</default_profile>

    <!-- System profile of settings. This settings are used by internal processes (Buffer storage, Distibuted DDL worker and so on). -->
    <!-- <system_profile>default</system_profile> -->

    <!-- Default database. -->
    <default_database>default</default_database>

    <!-- Server time zone could be set here.

         Time zone is used when converting between String and DateTime types,
          when printing DateTime in text formats and parsing DateTime from text,
          it is used in date and time related functions, if specific time zone was not passed as an argument.

         Time zone is specified as identifier from IANA time zone database, like UTC or Africa/Abidjan.
         If not specified, system time zone at server startup is used.

         Please note, that server could display time zone alias instead of specified name.
         Example: W-SU is an alias for Europe/Moscow and Zulu is an alias for UTC.
    -->
    <!-- <timezone>Europe/Moscow</timezone> -->

    <!-- You can specify umask here (see "man umask"). Server will apply it on startup.
         Number is always parsed as octal. Default umask is 027 (other users cannot read logs, data files, etc; group can only read).
    -->
    <!-- <umask>022</umask> -->

    <!-- Configuration of clusters that could be used in Distributed tables.
         https://clickhouse.yandex/docs/en/table_engines/distributed/
      -->
    <remote_servers incl="clickhouse_remote_servers" >
       <distable>
        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>true</internal_replication>
            <replica>
                <host>10.121.8.6</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>10.121.8.7</host>
                <port>9001</port>
            </replica>
        </shard>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>10.121.8.7</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>10.121.8.8</host>
                <port>9001</port>
            </replica>
        </shard>
		<shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>10.121.8.8</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>10.121.8.6</host>
                <port>9001</port>
            </replica>
        </shard>
    </distable>
    </remote_servers>


    <!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
         By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
         Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
      -->

    <!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.
         Optional. If you don't use replicated tables, you could omit that.

         See https://clickhouse.yandex/docs/en/table_engines/replication/
      -->
    <zookeeper incl="zookeeper-servers" optional="true" />
	<zookeeper>
    <node index="1">
        <host>10.121.8.8</host>
        <port>2181</port>
    </node>
</zookeeper>

    <!-- Substitutions for parameters of replicated tables.
          Optional. If you don't use replicated tables, you could omit that.

         See https://clickhouse.yandex/docs/en/table_engines/replication/#creating-replicated-tables
      -->
    <macros incl="macros" optional="true" />
   <macros>
     <layer>01</layer>
     <shard>01</shard>
     <replica>cluster01-01-1</replica>
    </macros>
	
	
	<networks>
       <ip>::/0</ip>
    </networks>
    <!-- Reloading interval for embedded dictionaries, in seconds. Default: 3600. -->
    <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>


    <!-- Maximum session timeout, in seconds. Default: 3600. -->
    <max_session_timeout>3600</max_session_timeout>

    <!-- Default session timeout, in seconds. Default: 60. -->
    <default_session_timeout>60</default_session_timeout>

    <!-- Sending data to Graphite for monitoring. Several sections can be defined. -->
    <!--
        interval - send every X second
        root_path - prefix for keys
        hostname_in_path - append hostname to root_path (default = true)
        metrics - send data from table system.metrics
        events - send data from table system.events
        asynchronous_metrics - send data from table system.asynchronous_metrics
    -->
    <!--
    <graphite>
        <host>localhost</host>
        <port>42000</port>
        <timeout>0.1</timeout>
        <interval>60</interval>
        <root_path>one_min</root_path>
        <hostname_in_path>true</hostname_in_path>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
    </graphite>
    <graphite>
        <host>localhost</host>
        <port>42000</port>
        <timeout>0.1</timeout>
        <interval>1</interval>
        <root_path>one_sec</root_path>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>false</asynchronous_metrics>
    </graphite>
    -->


    <!-- Query log. Used only for queries with setting log_queries = 1. -->
    <query_log>
        <!-- What table to insert data. If table is not exist, it will be created.
             When query log structure is changed after system update,
              then old table will be renamed and new table will be created automatically.
        -->
        <database>system</database>
        <table>query_log</table>
        <!--
            PARTITION BY expr https://clickhouse.yandex/docs/en/table_engines/custom_partitioning_key/
            Example:
                event_date
                toMonday(event_date)
                toYYYYMM(event_date)
                toStartOfHour(event_time)
        -->
        <partition_by>toYYYYMM(event_date)</partition_by>
        <!-- Interval of flushing data. -->
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_log>


    <!-- Uncomment if use part_log
    <part_log>
        <database>system</database>
        <table>part_log</table>

        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </part_log>
    -->


    <!-- Parameters for embedded dictionaries, used in Yandex.Metrica.
         See https://clickhouse.yandex/docs/en/dicts/internal_dicts/
    -->

    <!-- Path to file with region hierarchy. -->
    <!-- <path_to_regions_hierarchy_file>/opt/geo/regions_hierarchy.txt</path_to_regions_hierarchy_file> -->

    <!-- Path to directory with files containing names of regions -->
    <!-- <path_to_regions_names_files>/opt/geo/</path_to_regions_names_files> -->


    <!-- Configuration of external dictionaries. See:
         https://clickhouse.yandex/docs/en/dicts/external_dicts/
    -->
    <dictionaries_config>*_dictionary.xml</dictionaries_config>

    <!-- Uncomment if you want data to be compressed 30-100% better.
         Don't do that if you just started using ClickHouse.
      -->
    <compression incl="clickhouse_compression">
    <!--
        <!- - Set of variants. Checked in order. Last matching case wins. If nothing matches, lz4 will be used. - ->
        <case>

            <!- - Conditions. All must be satisfied. Some conditions may be omitted. - ->
            <min_part_size>10000000000</min_part_size>        <!- - Min part size in bytes. - ->
            <min_part_size_ratio>0.01</min_part_size_ratio>   <!- - Min size of part relative to whole table size. - ->

            <!- - What compression method to use. - ->
            <method>zstd</method>
        </case>
    -->
    </compression>

    <!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster.
         Works only if ZooKeeper is enabled. Comment it if such functionality isn't required. -->
    <distributed_ddl>
        <!-- Path in ZooKeeper to queue with DDL queries -->
        <path>/clickhouse/task_queue/ddl</path>

        <!-- Settings from this profile will be used to execute DDL queries -->
        <!-- <profile>default</profile> -->
    </distributed_ddl>

    <!-- Settings to fine tune MergeTree tables. See documentation in source code, in MergeTreeSettings.h -->
    <!--
    <merge_tree>
        <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
    </merge_tree>
    -->

    <!-- Protection from accidental DROP.
         If size of a MergeTree table is greater than max_table_size_to_drop (in bytes) than table could not be dropped with any DROP query.
         If you want do delete one table and don't want to restart clickhouse-server, you could create special file <clickhouse-path>/flags/force_drop_table and make DROP once.
         By default max_table_size_to_drop is 50GB; max_table_size_to_drop=0 allows to DROP any tables.
         The same for max_partition_size_to_drop.
         Uncomment to disable protection.
    -->
    <!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
    <!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->

    <!-- Example of parameters for GraphiteMergeTree table engine -->
    <graphite_rollup_example>
        <pattern>
            <regexp>click_cost</regexp>
            <function>any</function>
            <retention>
                <age>0</age>
                <precision>3600</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>60</precision>
            </retention>
        </pattern>
        <default>
            <function>max</function>
            <retention>
                <age>0</age>
                <precision>60</precision>
            </retention>
            <retention>
                <age>3600</age>
                <precision>300</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>3600</precision>
            </retention>
        </default>
    </graphite_rollup_example>

    <!-- Directory in <clickhouse-path> containing schema files for various input formats.
         The directory will be created if it doesn't exist.
      -->
    <format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path>

    <!-- Uncomment to disable ClickHouse internal DNS caching. -->
    <!-- <disable_internal_dns_cache>1</disable_internal_dns_cache> -->
</yandex>

 dc-server06  / clickhouse1 实例二    /etc/clickhouse-server/config.xml 

<?xml version="1.0"?>
<yandex>
    <logger>
        <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
        <level>trace</level>
        <log>/var/log/clickhouse-server/clickhouse-server2.log</log>
        <errorlog>/var/log/clickhouse-server/clickhouse-server2.err.log</errorlog>
        <size>1000M</size>
        <count>10</count>
        <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
    </logger>
    <!--display_name>production</display_name--> <!-- It is the name that will be shown in the client -->
    <http_port>8124</http_port>
    <tcp_port>9001</tcp_port>

    <!-- For HTTPS and SSL over native protocol. -->
    <!--
    <https_port>8443</https_port>
    <tcp_port_secure>9440</tcp_port_secure>
    -->

    <!-- Used with https_port and tcp_port_secure. Full ssl options list: https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h#L71 -->
    <openSSL>
        <server> <!-- Used for https server AND secure tcp port -->
            <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
            <certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
            <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>
            <!-- openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096 -->
            <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>
            <verificationMode>none</verificationMode>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
        </server>

        <client> <!-- Used for connecting to https dictionary source -->
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
            <invalidCertificateHandler>
                <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
                <name>RejectCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>

    <!-- Default root page on http[s] server. For example load UI from https://tabix.io/ when opening http://localhost:8123 -->
    <!--
    <http_server_default_response><![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]></http_server_default_response>
    -->

    <!-- Port for communication between replicas. Used for data exchange. -->
    <interserver_http_port>9010</interserver_http_port>

    <!-- Hostname that is used by other replicas to request this server.
         If not specified, than it is determined analoguous to 'hostname -f' command.
         This setting could be used to switch replication to another network interface.
      -->

    <interserver_http_host>10.121.8.8</interserver_http_host>
 

    <!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->
     <listen_host>::</listen_host>
    <!-- Same for hosts with disabled ipv6: -->
    <!-- <listen_host>0.0.0.0</listen_host> -->
    <!--<listen_host>0.0.0.0</listen_host> -->
    <!-- Default values - try listen localhost on ipv4 and ipv6: -->
    <!--
    <listen_host>::1</listen_host>
    <listen_host>127.0.0.1</listen_host>
    -->
    <!-- Don't exit if ipv6 or ipv4 unavailable, but listen_host with this protocol specified -->
    <!-- <listen_try>0</listen_try> -->

    <!-- Allow listen on same address:port -->
    <!-- <listen_reuse_port>0</listen_reuse_port> -->

    <!-- <listen_backlog>64</listen_backlog> -->

    <max_connections>4096</max_connections>
    <keep_alive_timeout>3</keep_alive_timeout>

    <!-- Maximum number of concurrent queries. -->
    <max_concurrent_queries>100</max_concurrent_queries>

    <!-- Set limit on number of open files (default: maximum). This setting makes sense on Mac OS X because getrlimit() fails to retrieve
         correct maximum value. -->
    <!-- <max_open_files>262144</max_open_files> -->

    <!-- Size of cache of uncompressed blocks of data, used in tables of MergeTree family.
         In bytes. Cache is single for server. Memory is allocated only on demand.
         Cache is used when 'use_uncompressed_cache' user setting turned on (off by default).
         Uncompressed cache is advantageous only for very short queries and in rare cases.
      -->
    <uncompressed_cache_size>8589934592</uncompressed_cache_size>

    <!-- Approximate size of mark cache, used in tables of MergeTree family.
         In bytes. Cache is single for server. Memory is allocated only on demand.
         You should not lower this value.
      -->
    <mark_cache_size>5368709120</mark_cache_size>


    <!-- Path to data directory, with trailing slash. -->
    <path>/data/clickhouse2/</path>

    <!-- Path to temporary data for processing hard queries. -->
    <tmp_path>/data/clickhouse2/tmp/</tmp_path>

    <!-- Directory with user provided files that are accessible by 'file' table function. -->
    <user_files_path>/var/lib/clickhouse/user_files2/</user_files_path>

    <!-- Path to configuration file with users, access rights, profiles of settings, quotas. -->
    <users_config>users.xml</users_config>

    <!-- Default profile of settings. -->
    <default_profile>default</default_profile>

    <!-- System profile of settings. This settings are used by internal processes (Buffer storage, Distibuted DDL worker and so on). -->
    <!-- <system_profile>default</system_profile> -->

    <!-- Default database. -->
    <default_database>default</default_database>

    <!-- Server time zone could be set here.

         Time zone is used when converting between String and DateTime types,
          when printing DateTime in text formats and parsing DateTime from text,
          it is used in date and time related functions, if specific time zone was not passed as an argument.

         Time zone is specified as identifier from IANA time zone database, like UTC or Africa/Abidjan.
         If not specified, system time zone at server startup is used.

         Please note, that server could display time zone alias instead of specified name.
         Example: W-SU is an alias for Europe/Moscow and Zulu is an alias for UTC.
    -->
    <!-- <timezone>Europe/Moscow</timezone> -->

    <!-- You can specify umask here (see "man umask"). Server will apply it on startup.
         Number is always parsed as octal. Default umask is 027 (other users cannot read logs, data files, etc; group can only read).
    -->
    <!-- <umask>022</umask> -->

    <!-- Configuration of clusters that could be used in Distributed tables.
         https://clickhouse.yandex/docs/en/table_engines/distributed/
      -->
    <remote_servers incl="clickhouse_remote_servers" >
       <distable>
        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>true</internal_replication>
            <replica>
                <host>10.121.8.6</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>10.121.8.7</host>
                <port>9001</port>
            </replica>
        </shard>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>10.121.8.7</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>10.121.8.8</host>
                <port>9001</port>
            </replica>
        </shard>
		<shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>10.121.8.8</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>10.121.8.6</host>
                <port>9001</port>
            </replica>
        </shard>
    </distable>
    </remote_servers>


    <!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
         By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
         Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
      -->

    <!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.
         Optional. If you don't use replicated tables, you could omit that.

         See https://clickhouse.yandex/docs/en/table_engines/replication/
      -->
    <zookeeper incl="zookeeper-servers" optional="true" />
	<zookeeper>
    <node index="1">
        <host>10.121.8.8</host>
        <port>2181</port>
    </node>
</zookeeper>

    <!-- Substitutions for parameters of replicated tables.
          Optional. If you don't use replicated tables, you could omit that.

         See https://clickhouse.yandex/docs/en/table_engines/replication/#creating-replicated-tables
      -->
    <macros incl="macros" optional="true" />
   

	<macros>
     <layer>01</layer>
     <shard>02</shard>
     <replica>cluster01-02-2</replica>
    </macros>
	
	
	<networks>
       <ip>::/0</ip>
    </networks>
    <!-- Reloading interval for embedded dictionaries, in seconds. Default: 3600. -->
    <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>


    <!-- Maximum session timeout, in seconds. Default: 3600. -->
    <max_session_timeout>3600</max_session_timeout>

    <!-- Default session timeout, in seconds. Default: 60. -->
    <default_session_timeout>60</default_session_timeout>

    <!-- Sending data to Graphite for monitoring. Several sections can be defined. -->
    <!--
        interval - send every X second
        root_path - prefix for keys
        hostname_in_path - append hostname to root_path (default = true)
        metrics - send data from table system.metrics
        events - send data from table system.events
        asynchronous_metrics - send data from table system.asynchronous_metrics
    -->
    <!--
    <graphite>
        <host>localhost</host>
        <port>42000</port>
        <timeout>0.1</timeout>
        <interval>60</interval>
        <root_path>one_min</root_path>
        <hostname_in_path>true</hostname_in_path>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
    </graphite>
    <graphite>
        <host>localhost</host>
        <port>42000</port>
        <timeout>0.1</timeout>
        <interval>1</interval>
        <root_path>one_sec</root_path>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>false</asynchronous_metrics>
    </graphite>
    -->


    <!-- Query log. Used only for queries with setting log_queries = 1. -->
    <query_log>
        <!-- What table to insert data. If table is not exist, it will be created.
             When query log structure is changed after system update,
              then old table will be renamed and new table will be created automatically.
        -->
        <database>system</database>
        <table>query_log</table>
        <!--
            PARTITION BY expr https://clickhouse.yandex/docs/en/table_engines/custom_partitioning_key/
            Example:
                event_date
                toMonday(event_date)
                toYYYYMM(event_date)
                toStartOfHour(event_time)
        -->
        <partition_by>toYYYYMM(event_date)</partition_by>
        <!-- Interval of flushing data. -->
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_log>


    <!-- Uncomment if use part_log
    <part_log>
        <database>system</database>
        <table>part_log</table>

        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </part_log>
    -->


    <!-- Parameters for embedded dictionaries, used in Yandex.Metrica.
         See https://clickhouse.yandex/docs/en/dicts/internal_dicts/
    -->

    <!-- Path to file with region hierarchy. -->
    <!-- <path_to_regions_hierarchy_file>/opt/geo/regions_hierarchy.txt</path_to_regions_hierarchy_file> -->

    <!-- Path to directory with files containing names of regions -->
    <!-- <path_to_regions_names_files>/opt/geo/</path_to_regions_names_files> -->


    <!-- Configuration of external dictionaries. See:
         https://clickhouse.yandex/docs/en/dicts/external_dicts/
    -->
    <dictionaries_config>*_dictionary.xml</dictionaries_config>

    <!-- Uncomment if you want data to be compressed 30-100% better.
         Don't do that if you just started using ClickHouse.
      -->
    <compression incl="clickhouse_compression">
    <!--
        <!- - Set of variants. Checked in order. Last matching case wins. If nothing matches, lz4 will be used. - ->
        <case>

            <!- - Conditions. All must be satisfied. Some conditions may be omitted. - ->
            <min_part_size>10000000000</min_part_size>        <!- - Min part size in bytes. - ->
            <min_part_size_ratio>0.01</min_part_size_ratio>   <!- - Min size of part relative to whole table size. - ->

            <!- - What compression method to use. - ->
            <method>zstd</method>
        </case>
    -->
    </compression>

    <!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster.
         Works only if ZooKeeper is enabled. Comment it if such functionality isn't required. -->
    <distributed_ddl>
        <!-- Path in ZooKeeper to queue with DDL queries -->
        <path>/clickhouse/task_queue2/ddl</path>

        <!-- Settings from this profile will be used to execute DDL queries -->
        <!-- <profile>default</profile> -->
    </distributed_ddl>

    <!-- Settings to fine tune MergeTree tables. See documentation in source code, in MergeTreeSettings.h -->
    <!--
    <merge_tree>
        <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
    </merge_tree>
    -->

    <!-- Protection from accidental DROP.
         If size of a MergeTree table is greater than max_table_size_to_drop (in bytes) than table could not be dropped with any DROP query.
         If you want do delete one table and don't want to restart clickhouse-server, you could create special file <clickhouse-path>/flags/force_drop_table and make DROP once.
         By default max_table_size_to_drop is 50GB; max_table_size_to_drop=0 allows to DROP any tables.
         The same for max_partition_size_to_drop.
         Uncomment to disable protection.
    -->
    <!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
    <!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->

    <!-- Example of parameters for GraphiteMergeTree table engine -->
    <graphite_rollup_example>
        <pattern>
            <regexp>click_cost</regexp>
            <function>any</function>
            <retention>
                <age>0</age>
                <precision>3600</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>60</precision>
            </retention>
        </pattern>
        <default>
            <function>max</function>
            <retention>
                <age>0</age>
                <precision>60</precision>
            </retention>
            <retention>
                <age>3600</age>
                <precision>300</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>3600</precision>
            </retention>
        </default>
    </graphite_rollup_example>

    <!-- Directory in <clickhouse-path> containing schema files for various input formats.
         The directory will be created if it doesn't exist.
      -->
    <format_schema_path>/var/lib/clickhouse/format_schemas2/</format_schema_path>

    <!-- Uncomment to disable ClickHouse internal DNS caching. -->
    <!-- <disable_internal_dns_cache>1</disable_internal_dns_cache> -->
</yandex>

配置文件:/etc/clickhouse-server/users.xml 

   内存限制设置   vi /etc/clickhouse-server/users.xml 

 本次设置为 32 G 转化为B :  34359738368 B

 五  测试. 创建本地复制表和分布式表

 1.所有实例配置完上面这些之后,分别执行启动命令启动,然后所有实例都执行下面语句创建数据库:

CREATE DATABASE monchickey;

 2. 然后对于所有实例分别创建对应的复制表,这里测试创建一个简单的表

drop table monchickey.t_co_dwfxdata_0809_30000000_new;
CREATE
	TABLE
		monchickey.t_co_dwfxdata_0809_30000000_new(
		    CRETIME date,
			YEAR  String,
			MONTH String,
			COMPNAME String,
			CBYS String,
			YWHD String,
			DYDJ String,
			ZCLX String,
			AMOUNT Float64,
			YHLB String,
			YSLX String
		) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/t_co_dwfxdata_0809_30000000_new', '{replica}',
			CRETIME,
			(COMPNAME),
			8192
		);

drop table monchickey.t_co_dwfxdata_0809_30000000_all;
CREATE TABLE monchickey.t_co_dwfxdata_0809_30000000_all AS monchickey.t_co_dwfxdata_0809_30000000_new ENGINE = Distributed(distable, monchickey, t_co_dwfxdata_0809_30000000_new, rand());

select  count (*) from  monchickey.t_co_dwfxdata_0809_30000000_new;
select count(*) from  monchickey.t_co_dwfxdata_0809_30000000_all;

3.简单数据样例

分别在三台机器上插入如下数据

dc-server06 /clickhouse1

insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','其他长期职工福利','检修-输电运检','500kV及以上','架空线路',9.4441398224E8,'','工');
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','其他短期薪酬','检修-输电运检','500kV及以上','架空线路',2.0894114651E8,'','工');

dc-server07 /clickhouse2

insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','其他长期职工福利','检修-输电运检','500kV及以上','架空线路',9.4441398224E8,'','工')
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES (('2016-01-','2016年','1月','单位8','临时用工薪酬','检修-输电运检','500kV及以上','架空线路',8.5247987777E8,'','工');

dc-server08 /clickhouse3

insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','劳务派遣费','检修-输电运检','500kV及以上','架空线路',3.59378772E8,'','工');
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','委托服务费','检修-输电运检','500kV及以上','架空线路',4.7638581405E8,'','费');
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','财产保险费','检修-输电运检','500kV及以上','架空线路',9.527716281E8,'','费');

在任意一台机器上查询数据

  a.能查询到完成 7条数据

  b.查询总数为7条

select * from monchickey.t_co_dwfxdata_0809_30000000_all;--能查询到完整的7 条数据
select count(*) from monchickey.t_co_dwfxdata_0809_30000000_all; -- 总数为7

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值