clickhouse集群安装

   clickhouse集群的安装方式很多,这里采取的是简单方便的yum安装的方式进行的。

一、工具的准备

1、clustershell

首先需要安装集群管理的工具,便于的安装的时候一次性就安装好,而不用每台机器都去安装一遍,这个工具就是clustershell

yum install -y clustershell

待安装成功编辑配置文件

vi /etc/clustershell/groups.d/local.cfg

注意的是我们在安装的时候没指定版本,故而是最新版的,别的版本的配置文件不一定在这里,但是一定在/etc/clustershell/下,好好找肯定能够找到。

编辑一下该配置文件:

all: node0[1-3]
clickhouse: node0[1-3]

注意:这个文件的书写规则是

群(group)名:我们需要设置哪些主机的host名

  这里可以书写很多的group,因此这个文件才叫groups,我们这里只是测试,故而写的很少。

补充一下:就是在后来重新换的机器yum安装就不成功了,会报错:No package clustershell available

这是因为是yum源的缘故,解决方法:

sudo yum install epel-release

安装好了这个工具怎么用呢?

clush命令:clush -参数 操作

参数

参数说明

-g

后面指定设置的组

-a

表示所有的组

-w

后面跟主机节点,多个主机中间用逗号隔开

-x

表示去掉某个节点进行操作。后面跟主机节点,多个主机中间用逗号隔开

-X

表示去掉某个组进行操作,多个组之间用逗号隔开

-b

相同输出结果合并

案例:clush -g all -b 'yum install -y curl'这个命令是将所有的服务器进行yum install -y tree

这里也可以写成:

clush -a 'yum install -y curl'

-g 后面需要跟我们当初在配置文件设置的group名,从上面我们可以看到可以很方便的使用。

注意的是在用这工具之前最好设置一下免密登录:

cd .ssh/
ssh-keygen
ssh-copy-id -i id_rsa.pub root@node01

2、curl

利用clustershell工具给集群中的服务器都安装好curl

clush -a 'yum install -y curl'

二、安装clickhouse

1、获取clickhouse安装包

clush -a 'curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash'

注意:如果在执行这个命令的时候报这个错:

sudo: sorry, you must have a tty to run sudo

解决方法:

visudo -f /etc/sudoers

将Defaults    requiretty进行注释

 

2、列出clickhouse列表

clush -a 'sudo yum list "clickhouse*"'

3、安装clickhouse

clush -a 'sudo yum install -y clickhouse-server clickhouse-client clickhouse-compressor'

静待就会将集群里的机器都安装好clickhouse,是不是很简单,虽然前期需要准备一下工具。

三、修改配置

注意:我们这里的配置文件都是在node01上进行的,后面通过clustershell进行配置到集群中即可。

1、修改ulimit配置

因为默认是配置的数值很小不够用

 [root@node01 .ssh]# cat /etc/security/limits.d/clickhouse.conf

clickhouse       soft    core    1073741824

clickhouse       hard    core    1073741824

2、修改config.xml配置文件

[root@node01 .ssh]# vi /etc/clickhouse-server/config.xml

<?xml version="1.0"?>

<!--

  NOTE: User and query level settings are set up in "users.xml" file.

-->

<yandex>

    <logger>

        <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->

        <level>trace</level>

        <log>/data/clickhouse/logs/server.log</log>

        <errorlog>/data/clickhouse/logs/error.log</errorlog>

        <size>1000M</size>

        <count>10</count>

        <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->

    </logger>

    <!--display_name>production</display_name--> <!-- It is the name that will be shown in the client -->

    <http_port>8123</http_port>

    <tcp_port>9000</tcp_port>

 

    <!-- For HTTPS and SSL over native protocol. -->

    <!--

    <https_port>8443</https_port>

    <tcp_port_secure>9440</tcp_port_secure>

    -->

 

    <!-- Used with https_port and tcp_port_secure. Full ssl options list: https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h#L71 -->

    <openSSL>

        <server> <!-- Used for https server AND secure tcp port -->

            <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->

            <certificateFile>/etc/clickhouse-server/server.crt</certificateFile>

            <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>

            <!-- openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096 -->

            <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>

            <verificationMode>none</verificationMode>

            <loadDefaultCAFile>true</loadDefaultCAFile>

            <cacheSessions>true</cacheSessions>

            <disableProtocols>sslv2,sslv3</disableProtocols>

            <preferServerCiphers>true</preferServerCiphers>

        </server>

 

        <client> <!-- Used for connecting to https dictionary source -->

            <loadDefaultCAFile>true</loadDefaultCAFile>

            <cacheSessions>true</cacheSessions>

            <disableProtocols>sslv2,sslv3</disableProtocols>

            <preferServerCiphers>true</preferServerCiphers>

            <!-- Use for self-signed: <verificationMode>none</verificationMode> -->

            <invalidCertificateHandler>

                <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->

                <name>RejectCertificateHandler</name>

            </invalidCertificateHandler>

        </client>

    </openSSL>

 

    <!-- Default root page on http[s] server. For example load UI from https://tabix.io/ when opening http://localhost:8123 -->

    <!--

    <http_server_default_response><![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]></http_server_default_response>

    -->

 

    <!-- Port for communication between replicas. Used for data exchange. -->

    <interserver_http_port>9009</interserver_http_port>

 

    <!-- Hostname that is used by other replicas to request this server.

         If not specified, than it is determined analoguous to 'hostname -f' command.

         This setting could be used to switch replication to another network interface.

      -->

    <!--

    <interserver_http_host>example.yandex.ru</interserver_http_host>

    -->

 

    <!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->

     <listen_host>::</listen_host>

    <!-- Same for hosts with disabled ipv6: -->

<!--    <listen_host>0.0.0.0</listen_host> -->

 

    <!-- Default values - try listen localhost on ipv4 and ipv6: -->

    <!--

    <listen_host>::1</listen_host>

    <listen_host>127.0.0.1</listen_host>

    -->

    <!-- Don't exit if ipv6 or ipv4 unavailable, but listen_host with this protocol specified -->

    <!-- <listen_try>0</listen_try> -->

 

    <!-- Allow listen on same address:port -->

    <!-- <listen_reuse_port>0</listen_reuse_port> -->

 

    <!-- <listen_backlog>64</listen_backlog> -->

 

    <max_connections>4096</max_connections>

    <keep_alive_timeout>3</keep_alive_timeout>

 

    <!-- Maximum number of concurrent queries. -->

    <max_concurrent_queries>100</max_concurrent_queries>

 

    <!-- Set limit on number of open files (default: maximum). This setting makes sense on Mac OS X because getrlimit() fails to retrieve

         correct maximum value. -->

    <!-- <max_open_files>262144</max_open_files> -->

 

    <!-- Size of cache of uncompressed blocks of data, used in tables of MergeTree family.

         In bytes. Cache is single for server. Memory is allocated only on demand.

         Cache is used when 'use_uncompressed_cache' user setting turned on (off by default).

         Uncompressed cache is advantageous only for very short queries and in rare cases.

      -->

    <uncompressed_cache_size>8589934592</uncompressed_cache_size>

 

    <!-- Approximate size of mark cache, used in tables of MergeTree family.

         In bytes. Cache is single for server. Memory is allocated only on demand.

         You should not lower this value.

      -->

    <mark_cache_size>5368709120</mark_cache_size>

 

 

    <!-- Path to data directory, with trailing slash. -->

    <path>/data/clickhouse/</path>

 

    <!-- Path to temporary data for processing hard queries. -->

    <tmp_path>/data/clickhouse/tmp/</tmp_path>

 

    <!-- Directory with user provided files that are accessible by 'file' table function. -->

    <user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

 

    <!-- Path to configuration file with users, access rights, profiles of settings, quotas. -->

    <users_config>users.xml</users_config>

 

    <!-- Default profile of settings. -->

    <default_profile>default</default_profile>

 

    <!-- System profile of settings. This settings are used by internal processes (Buffer storage, Distibuted DDL worker and so on). -->

    <!-- <system_profile>default</system_profile> -->

 

    <!-- Default database. -->

    <default_database>default</default_database>

 

    <!-- Server time zone could be set here.

 

         Time zone is used when converting between String and DateTime types,

          when printing DateTime in text formats and parsing DateTime from text,

          it is used in date and time related functions, if specific time zone was not passed as an argument.

 

         Time zone is specified as identifier from IANA time zone database, like UTC or Africa/Abidjan.

         If not specified, system time zone at server startup is used.

 

         Please note, that server could display time zone alias instead of specified name.

         Example: W-SU is an alias for Europe/Moscow and Zulu is an alias for UTC.

    -->

    <!-- <timezone>Europe/Moscow</timezone> -->

 

    <!-- You can specify umask here (see "man umask"). Server will apply it on startup.

         Number is always parsed as octal. Default umask is 027 (other users cannot read logs, data files, etc; group can only read).

    -->

    <!-- <umask>022</umask> -->

 

    <!-- Perform mlockall after startup to lower first queries latency

          and to prevent clickhouse executable from being paged out under high IO load.

         Enabling this option is recommended but will lead to increased startup time for up to a few seconds.

    -->

    <mlock_executable>false</mlock_executable>

 

    <!-- Configuration of clusters that could be used in Distributed tables.

         https://clickhouse.yandex/docs/en/table_engines/distributed/

      -->

    <remote_servers incl="clickhouse_remote_servers" >

    </remote_servers>

 

 

    <!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.

         By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.

         Values for substitutions are specified in /yandex/name_of_substitution elements in that file.

      -->

 

    <!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.

         Optional. If you don't use replicated tables, you could omit that.

 

         See https://clickhouse.yandex/docs/en/table_engines/replication/

      -->

    <zookeeper incl="zookeeper-servers" optional="true" />

 

    <!-- Substitutions for parameters of replicated tables.

          Optional. If you don't use replicated tables, you could omit that.

 

         See https://clickhouse.yandex/docs/en/table_engines/replication/#creating-replicated-tables

      -->

    <macros incl="macros" optional="true" />

<!—添加这个标签-->

    <include_from>/etc/clickhouse-server/metrika.xml</include_from>

    <!-- Reloading interval for embedded dictionaries, in seconds. Default: 3600. -->

    <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>

 

 

    <!-- Maximum session timeout, in seconds. Default: 3600. -->

    <max_session_timeout>3600</max_session_timeout>

 

    <!-- Default session timeout, in seconds. Default: 60. -->

    <default_session_timeout>60</default_session_timeout>

 

    <!-- Sending data to Graphite for monitoring. Several sections can be defined. -->

    <!--

        interval - send every X second

        root_path - prefix for keys

        hostname_in_path - append hostname to root_path (default = true)

        metrics - send data from table system.metrics

        events - send data from table system.events

        asynchronous_metrics - send data from table system.asynchronous_metrics

    -->

    <!--

    <graphite>

        <host>localhost</host>

        <port>42000</port>

        <timeout>0.1</timeout>

        <interval>60</interval>

        <root_path>one_min</root_path>

        <hostname_in_path>true</hostname_in_path>

 

        <metrics>true</metrics>

        <events>true</events>

        <asynchronous_metrics>true</asynchronous_metrics>

    </graphite>

    <graphite>

        <host>localhost</host>

        <port>42000</port>

        <timeout>0.1</timeout>

        <interval>1</interval>

        <root_path>one_sec</root_path>

 

        <metrics>true</metrics>

        <events>true</events>

        <asynchronous_metrics>false</asynchronous_metrics>

    </graphite>

    -->

 

 

    <!-- Query log. Used only for queries with setting log_queries = 1. -->

    <query_log>

        <!-- What table to insert data. If table is not exist, it will be created.

             When query log structure is changed after system update,

              then old table will be renamed and new table will be created automatically.

        -->

        <database>system</database>

        <table>query_log</table>

        <!--

            PARTITION BY expr https://clickhouse.yandex/docs/en/table_engines/custom_partitioning_key/

            Example:

                event_date

                toMonday(event_date)

                toYYYYMM(event_date)

                toStartOfHour(event_time)

        -->

        <partition_by>toYYYYMM(event_date)</partition_by>

        <!-- Interval of flushing data. -->

        <flush_interval_milliseconds>7500</flush_interval_milliseconds>

    </query_log>

 

    <!-- Query thread log. Has information about all threads participated in query execution.

         Used only for queries with setting log_query_threads = 1. -->

    <query_thread_log>

        <database>system</database>

        <table>query_thread_log</table>

        <partition_by>toYYYYMM(event_date)</partition_by>

        <flush_interval_milliseconds>7500</flush_interval_milliseconds>

    </query_thread_log>

 

    <!-- Uncomment if use part log.

         Part log contains information about all actions with parts in MergeTree tables (creation, deletion, merges, downloads).

    <part_log>

        <database>system</database>

        <table>part_log</table>

        <flush_interval_milliseconds>7500</flush_interval_milliseconds>

    </part_log>

    -->

 

 

    <!-- Parameters for embedded dictionaries, used in Yandex.Metrica.

         See https://clickhouse.yandex/docs/en/dicts/internal_dicts/

    -->

 

    <!-- Path to file with region hierarchy. -->

    <!-- <path_to_regions_hierarchy_file>/opt/geo/regions_hierarchy.txt</path_to_regions_hierarchy_file> -->

 

    <!-- Path to directory with files containing names of regions -->

    <!-- <path_to_regions_names_files>/opt/geo/</path_to_regions_names_files> -->

 

 

    <!-- Configuration of external dictionaries. See:

         https://clickhouse.yandex/docs/en/dicts/external_dicts/

    -->

    <dictionaries_config>*_dictionary.xml</dictionaries_config>

 

    <!-- Uncomment if you want data to be compressed 30-100% better.

         Don't do that if you just started using ClickHouse.

      -->

    <compression incl="clickhouse_compression">

    <!--

        <!- - Set of variants. Checked in order. Last matching case wins. If nothing matches, lz4 will be used. - ->

        <case>

 

            <!- - Conditions. All must be satisfied. Some conditions may be omitted. - ->

            <min_part_size>10000000000</min_part_size>        <!- - Min part size in bytes. - ->

            <min_part_size_ratio>0.01</min_part_size_ratio>   <!- - Min size of part relative to whole table size. - ->

 

            <!- - What compression method to use. - ->

            <method>zstd</method>

        </case>

    -->

    </compression>

 

    <!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster.

         Works only if ZooKeeper is enabled. Comment it if such functionality isn't required. -->

    <distributed_ddl>

        <!-- Path in ZooKeeper to queue with DDL queries -->

        <path>/clickhouse/task_queue/ddl</path>

 

        <!-- Settings from this profile will be used to execute DDL queries -->

        <!-- <profile>default</profile> -->

    </distributed_ddl>

 

    <!-- Settings to fine tune MergeTree tables. See documentation in source code, in MergeTreeSettings.h -->

    <!--

    <merge_tree>

        <max_suspicious_broken_parts>5</max_suspicious_broken_parts>

    </merge_tree>

    -->

 

    <!-- Protection from accidental DROP.

         If size of a MergeTree table is greater than max_table_size_to_drop (in bytes) than table could not be dropped with any DROP query.

         If you want do delete one table and don't want to restart clickhouse-server, you could create special file <clickhouse-path>/flags/force_drop_table and make DROP once.

         By default max_table_size_to_drop is 50GB; max_table_size_to_drop=0 allows to DROP any tables.

         The same for max_partition_size_to_drop.

         Uncomment to disable protection.

    -->

    <!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->

    <!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->

 

    <!-- Example of parameters for GraphiteMergeTree table engine -->

    <graphite_rollup_example>

        <pattern>

            <regexp>click_cost</regexp>

            <function>any</function>

            <retention>

                <age>0</age>

                <precision>3600</precision>

            </retention>

            <retention>

                <age>86400</age>

                <precision>60</precision>

            </retention>

        </pattern>

        <default>

            <function>max</function>

            <retention>

                <age>0</age>

                <precision>60</precision>

            </retention>

            <retention>

                <age>3600</age>

                <precision>300</precision>

            </retention>

            <retention>

                <age>86400</age>

                <precision>3600</precision>

            </retention>

        </default>

    </graphite_rollup_example>

 

    <!-- Directory in <clickhouse-path> containing schema files for various input formats.

         The directory will be created if it doesn't exist.

      -->

    <format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path>

 

    <!-- Uncomment to disable ClickHouse internal DNS caching. -->

    <!-- <disable_internal_dns_cache>1</disable_internal_dns_cache> -->

</yandex>

需要说明的是只需要改动上面标红的部分,至此这个配置文件改完了。

3、修改metrika.xml配置文件

在上面一个配置文件里有说明,说我们可以建一个metrika.xml

的配置文件,因为我们上一个文件是没有指定集群分片,zk,副本的。

vi /etc/clickhouse-server/metrika.xml

 

<yandex>
<clickhouse_remote_servers>
    <cluster-1>
    <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>node01</host>
                <port>9000</port>
                <user>default</user>
                <password>6lYaUiFi</password>
            </replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>node02</host>
                <port>9000</port>
                <user>default</user>
                <password>6lYaUiFi</password>
            </replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
               <host>node03</host>
               <port>9000</port>
               <user>default</user>
               <password>6lYaUiFi</password>
            </replica>
        </shard>
    </cluster-1>
</clickhouse_remote_servers>

<zookeeper-servers>
    <node index="1">
        <host>zk</host>
        <port>2181</port>
    </node>
</zookeeper-servers>

<macros>
    <replica>node01</replica>
</macros>

<clickhouse_compression>
    <case>
        <min_part_size>10000000000</min_part_size>
        <min_part_size_ratio>0.01</min_part_size_ratio>
        <method>lz4</method> 
   </case>
</clickhouse_compression>
</yandex>

4、修改user.xml配置文件


<?xml version="1.0"?>
<yandex>
    <profiles>
        <!-- 读写用户设置  -->
        <default>
            <max_memory_usage>10000000000</max_memory_usage>
            <use_uncompressed_cache>0</use_uncompressed_cache>
            <load_balancing>random</load_balancing>
        </default>
        <!-- 只写用户设置  -->
        <readonly>
            <max_memory_usage>10000000000</max_memory_usage>
            <use_uncompressed_cache>0</use_uncompressed_cache>
            <load_balancing>random</load_balancing>
            <readonly>1</readonly>
        </readonly>
    </profiles>
    <!-- 配额  -->
    <quotas>
        <!-- Name of quota. -->
        <default>
            <interval>
                <duration>3600</duration>
                <queries>0</queries>
                <errors>0</errors>
                <result_rows>0</result_rows>
                <read_rows>0</read_rows>
                <execution_time>0</execution_time>
            </interval>
        </default>
    </quotas>
    <users>
        <!-- 读写用户  -->
        <default>
            <password_sha256_hex>967f3bf355dddfabfca1c9f5cab39352b2ec1cd0b05f9e1e6b8f629705fe7d6e</password_sha256_hex>
            <networks incl="networks" replace="replace">
                <ip>::/0</ip>
            </networks>
            <profile>default</profile>
            <quota>default</quota>
        </default>
        <!-- 只读用户  -->
        <readonly>
            <password_sha256_hex>967f3bf355dddfabfca1c9f5cab39352b2ec1cd0b05f9e1e6b8f629705fe7d6e</password_sha256_hex>
            <networks incl="networks" replace="replace">
                <ip>::/0</ip>
            </networks>
            <profile>readonly</profile>
            <quota>default</quota>
        </readonly>
    </users>
</yandex>

5、同步文件到集群的其他机器中

clush -a --copy /etc/security/limits.d/clickhouse.conf --dest /etc/security/limits.d/

clush -a --copy /etc/init.d/clickhouse-server --dest /etc/init.d

clush -g replica1,replica2,distributed -b --copy /etc/clickhouse-server/config.xml --dest /etc/clickhouse-server/

clush -a --copy /etc/clickhouse-server/users.xml --dest /etc/clickhouse-server/

clush -a --copy /etc/clickhouse-server/metrika.xml --dest /etc/clickhouse-server/

四、启动

1、重启服务

因为clickhouse在安装好后就已经启动了,当然也是自动设置好了开机自启的,当我们把配置文件改好肯定需要重启啦。

clush -a 'service clickhouse-server restart'

2 、在各机器连上客户端

我这边只是随便找一台看看

clickhouse-client -u default --password 6lYaUiFi7

3、查看集群

show databases

use system

show tables

select * from clusters

  看到我们集群自己搭建好了,可以看到我们有3个分片,每个分片只有一个副本。至此集群搭建成功,但是能不能进行通信需要手动建表测一下既可,只要zk没问题一般就没什么问题。

我们在zk里看一下有没有clickhouse,说明我们的clickhouse中配置的zk起作用了

 

  需要说明的是,在启动之前我们必须要有zookeeper服务,无论服务启动在哪里,在metrika.xml是进行配置了的,我这里只用的单机版的进行的,一般采用集群模式的zk,出于可靠性的保证。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值