clickhouse集群安装

最新推荐文章于 2024-07-18 09:42:03 发布

喝粥也会胖的唐僧

最新推荐文章于 2024-07-18 09:42:03 发布

阅读量2.7k

点赞数

分类专栏：大数据

本文链接：https://blog.csdn.net/zhou_438/article/details/99633145

版权

大数据专栏收录该内容

29 篇文章 1 订阅

订阅专栏

clickhouse集群的安装方式很多，这里采取的是简单方便的yum安装的方式进行的。

一、工具的准备

1、clustershell

首先需要安装集群管理的工具，便于的安装的时候一次性就安装好，而不用每台机器都去安装一遍，这个工具就是clustershell

yum install -y clustershell

待安装成功编辑配置文件

vi /etc/clustershell/groups.d/local.cfg

注意的是我们在安装的时候没指定版本，故而是最新版的，别的版本的配置文件不一定在这里，但是一定在/etc/clustershell/下，好好找肯定能够找到。

编辑一下该配置文件：

all: node0[1-3]
clickhouse: node0[1-3]

注意：这个文件的书写规则是

群（group）名:我们需要设置哪些主机的host名

这里可以书写很多的group，因此这个文件才叫groups，我们这里只是测试，故而写的很少。

补充一下：就是在后来重新换的机器yum安装就不成功了，会报错：No package clustershell available

这是因为是yum源的缘故，解决方法：

sudo yum install epel-release

安装好了这个工具怎么用呢？

clush命令：clush -参数操作

参数	参数说明
-g	后面指定设置的组
-a	表示所有的组
-w	后面跟主机节点，多个主机中间用逗号隔开
-x	表示去掉某个节点进行操作。后面跟主机节点，多个主机中间用逗号隔开
-X	表示去掉某个组进行操作，多个组之间用逗号隔开
-b	相同输出结果合并

案例：clush -g all -b 'yum install -y curl'这个命令是将所有的服务器进行yum install -y tree

这里也可以写成：

clush -a 'yum install -y curl'

-g 后面需要跟我们当初在配置文件设置的group名，从上面我们可以看到可以很方便的使用。

注意的是在用这工具之前最好设置一下免密登录：

cd .ssh/
ssh-keygen
ssh-copy-id -i id_rsa.pub root@node01

2、curl

利用clustershell工具给集群中的服务器都安装好curl

clush -a 'yum install -y curl'

二、安装clickhouse

1、获取clickhouse安装包

clush -a 'curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash'

注意：如果在执行这个命令的时候报这个错：

sudo: sorry, you must have a tty to run sudo

解决方法：

visudo -f /etc/sudoers

将Defaults requiretty进行注释

2、列出clickhouse列表

clush -a 'sudo yum list "clickhouse*"'

3、安装clickhouse

clush -a 'sudo yum install -y clickhouse-server clickhouse-client clickhouse-compressor'

静待就会将集群里的机器都安装好clickhouse，是不是很简单，虽然前期需要准备一下工具。

三、修改配置

注意：我们这里的配置文件都是在node01上进行的，后面通过clustershell进行配置到集群中即可。

1、修改ulimit配置

因为默认是配置的数值很小不够用

 [root@node01 .ssh]# cat /etc/security/limits.d/clickhouse.conf

clickhouse       soft    core    1073741824

clickhouse       hard    core    1073741824

2、修改config.xml配置文件

[root@node01 .ssh]# vi /etc/clickhouse-server/config.xml

<?xml version="1.0"?>

<!--

NOTE: User and query level settings are set up in "users.xml" file.

-->

<level>trace</level>

<log>/data/clickhouse/logs/server.log</log>

<errorlog>/data/clickhouse/logs/error.log</errorlog>

</logger>

<http_port>8123</http_port>

<tcp_port>9000</tcp_port>

<!--

<https_port>8443</https_port>

<tcp_port_secure>9440</tcp_port_secure>

-->

<certificateFile>/etc/clickhouse-server/server.crt</certificateFile>

<privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>

<dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>

</server>

<name>RejectCertificateHandler</name>

</invalidCertificateHandler>

</client>

</openSSL>

<!--

<http_server_default_response><![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]></http_server_default_response>

-->

<interserver_http_port>9009</interserver_http_port>

<!-- Hostname that is used by other replicas to request this server.

If not specified, than it is determined analoguous to 'hostname -f' command.

This setting could be used to switch replication to another network interface.

-->

<!--

<interserver_http_host>example.yandex.ru</interserver_http_host>

-->

<listen_host>::</listen_host>

<!--

<listen_host>::1</listen_host>

<listen_host>127.0.0.1</listen_host>

-->

<max_connections>4096</max_connections>

<keep_alive_timeout>3</keep_alive_timeout>

<max_concurrent_queries>100</max_concurrent_queries>

<!-- Set limit on number of open files (default: maximum). This setting makes sense on Mac OS X because getrlimit() fails to retrieve

correct maximum value. -->

<!-- Size of cache of uncompressed blocks of data, used in tables of MergeTree family.

In bytes. Cache is single for server. Memory is allocated only on demand.

Cache is used when 'use_uncompressed_cache' user setting turned on (off by default).

Uncompressed cache is advantageous only for very short queries and in rare cases.

-->

<uncompressed_cache_size>8589934592</uncompressed_cache_size>

<!-- Approximate size of mark cache, used in tables of MergeTree family.

In bytes. Cache is single for server. Memory is allocated only on demand.

You should not lower this value.

-->

<mark_cache_size>5368709120</mark_cache_size>

<path>/data/clickhouse/</path>

<tmp_path>/data/clickhouse/tmp/</tmp_path>

<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

<users_config>users.xml</users_config>

<default_profile>default</default_profile>

<default_database>default</default_database>

<!-- Server time zone could be set here.

Time zone is used when converting between String and DateTime types,

when printing DateTime in text formats and parsing DateTime from text,

it is used in date and time related functions, if specific time zone was not passed as an argument.

Time zone is specified as identifier from IANA time zone database, like UTC or Africa/Abidjan.

If not specified, system time zone at server startup is used.

Please note, that server could display time zone alias instead of specified name.

Example: W-SU is an alias for Europe/Moscow and Zulu is an alias for UTC.

-->

<!-- You can specify umask here (see "man umask"). Server will apply it on startup.

Number is always parsed as octal. Default umask is 027 (other users cannot read logs, data files, etc; group can only read).

-->

<!-- Perform mlockall after startup to lower first queries latency

and to prevent clickhouse executable from being paged out under high IO load.

Enabling this option is recommended but will lead to increased startup time for up to a few seconds.

-->

<mlock_executable>false</mlock_executable>

<!-- Configuration of clusters that could be used in Distributed tables.

https://clickhouse.yandex/docs/en/table_engines/distributed/

-->

<remote_servers incl="clickhouse_remote_servers" >

</remote_servers>

<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.

By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.

Values for substitutions are specified in /yandex/name_of_substitution elements in that file.

-->

<!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.

Optional. If you don't use replicated tables, you could omit that.

See https://clickhouse.yandex/docs/en/table_engines/replication/

-->

<zookeeper incl="zookeeper-servers" optional="true" />

<!-- Substitutions for parameters of replicated tables.

Optional. If you don't use replicated tables, you could omit that.

See https://clickhouse.yandex/docs/en/table_engines/replication/#creating-replicated-tables

-->

<macros incl="macros" optional="true" />

<!—添加这个标签-->

<include_from>/etc/clickhouse-server/metrika.xml</include_from>

<builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>

<max_session_timeout>3600</max_session_timeout>

<default_session_timeout>60</default_session_timeout>

<!--

interval - send every X second

root_path - prefix for keys

hostname_in_path - append hostname to root_path (default = true)

metrics - send data from table system.metrics

events - send data from table system.events

asynchronous_metrics - send data from table system.asynchronous_metrics

-->

<!--

<host>localhost</host>

<root_path>one_min</root_path>

<hostname_in_path>true</hostname_in_path>

<asynchronous_metrics>true</asynchronous_metrics>

</graphite>

<host>localhost</host>

<root_path>one_sec</root_path>

<asynchronous_metrics>false</asynchronous_metrics>

</graphite>

-->

<query_log>

<!-- What table to insert data. If table is not exist, it will be created.

When query log structure is changed after system update,

then old table will be renamed and new table will be created automatically.

-->

<database>system</database>

<table>query_log</table>

<!--

PARTITION BY expr https://clickhouse.yandex/docs/en/table_engines/custom_partitioning_key/

Example:

event_date

toMonday(event_date)

toYYYYMM(event_date)

toStartOfHour(event_time)

-->

<partition_by>toYYYYMM(event_date)</partition_by>

<flush_interval_milliseconds>7500</flush_interval_milliseconds>

</query_log>

<!-- Query thread log. Has information about all threads participated in query execution.

Used only for queries with setting log_query_threads = 1. -->

<query_thread_log>

<database>system</database>

<table>query_thread_log</table>

<partition_by>toYYYYMM(event_date)</partition_by>

<flush_interval_milliseconds>7500</flush_interval_milliseconds>

</query_thread_log>

<!-- Uncomment if use part log.

Part log contains information about all actions with parts in MergeTree tables (creation, deletion, merges, downloads).

<part_log>

<database>system</database>

<flush_interval_milliseconds>7500</flush_interval_milliseconds>

</part_log>

-->

<!-- Parameters for embedded dictionaries, used in Yandex.Metrica.

See https://clickhouse.yandex/docs/en/dicts/internal_dicts/

-->

<!-- Configuration of external dictionaries. See:

https://clickhouse.yandex/docs/en/dicts/external_dicts/

-->

<dictionaries_config>*_dictionary.xml</dictionaries_config>

<!-- Uncomment if you want data to be compressed 30-100% better.

Don't do that if you just started using ClickHouse.

-->

<!--

<!- - Set of variants. Checked in order. Last matching case wins. If nothing matches, lz4 will be used. - ->

<case>

<!- - Conditions. All must be satisfied. Some conditions may be omitted. - ->

<min_part_size>10000000000</min_part_size> <!- - Min part size in bytes. - ->

<min_part_size_ratio>0.01</min_part_size_ratio> <!- - Min size of part relative to whole table size. - ->

<!- - What compression method to use. - ->

</case>

-->

</compression>

<!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster.

Works only if ZooKeeper is enabled. Comment it if such functionality isn't required. -->

<distributed_ddl>

<path>/clickhouse/task_queue/ddl</path>

</distributed_ddl>

<!--

<merge_tree>

<max_suspicious_broken_parts>5</max_suspicious_broken_parts>

</merge_tree>

-->

<!-- Protection from accidental DROP.

If size of a MergeTree table is greater than max_table_size_to_drop (in bytes) than table could not be dropped with any DROP query.

If you want do delete one table and don't want to restart clickhouse-server, you could create special file <clickhouse-path>/flags/force_drop_table and make DROP once.

By default max_table_size_to_drop is 50GB; max_table_size_to_drop=0 allows to DROP any tables.

The same for max_partition_size_to_drop.

Uncomment to disable protection.

-->

<graphite_rollup_example>

<regexp>click_cost</regexp>

</retention>

</retention>

</pattern>

</retention>

</retention>

</retention>

</default>

</graphite_rollup_example>

<!-- Directory in <clickhouse-path> containing schema files for various input formats.

The directory will be created if it doesn't exist.

-->

<format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path>

</yandex>

需要说明的是只需要改动上面标红的部分，至此这个配置文件改完了。

3、修改metrika.xml配置文件

在上面一个配置文件里有说明，说我们可以建一个metrika.xml

的配置文件，因为我们上一个文件是没有指定集群分片，zk，副本的。

vi /etc/clickhouse-server/metrika.xml

<yandex>
<clickhouse_remote_servers>
    <cluster-1>
    <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>node01</host>
                <port>9000</port>
                <user>default</user>
                <password>6lYaUiFi</password>
            </replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>node02</host>
                <port>9000</port>
                <user>default</user>
                <password>6lYaUiFi</password>
            </replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
               <host>node03</host>
               <port>9000</port>
               <user>default</user>
               <password>6lYaUiFi</password>
            </replica>
        </shard>
    </cluster-1>
</clickhouse_remote_servers>

<zookeeper-servers>
    <node index="1">
        <host>zk</host>
        <port>2181</port>
    </node>
</zookeeper-servers>

<macros>
    <replica>node01</replica>
</macros>

<clickhouse_compression>
    <case>
        <min_part_size>10000000000</min_part_size>
        <min_part_size_ratio>0.01</min_part_size_ratio>
        <method>lz4</method> 
   </case>
</clickhouse_compression>
</yandex>

4、修改user.xml配置文件


<?xml version="1.0"?>
<yandex>
    <profiles>
        <!-- 读写用户设置  -->
        <default>
            <max_memory_usage>10000000000</max_memory_usage>
            <use_uncompressed_cache>0</use_uncompressed_cache>
            <load_balancing>random</load_balancing>
        </default>
        <!-- 只写用户设置  -->
        <readonly>
            <max_memory_usage>10000000000</max_memory_usage>
            <use_uncompressed_cache>0</use_uncompressed_cache>
            <load_balancing>random</load_balancing>
            <readonly>1</readonly>
        </readonly>
    </profiles>
    <!-- 配额  -->
    <quotas>
        <!-- Name of quota. -->
        <default>
            <interval>
                <duration>3600</duration>
                <queries>0</queries>
                <errors>0</errors>
                <result_rows>0</result_rows>
                <read_rows>0</read_rows>
                <execution_time>0</execution_time>
            </interval>
        </default>
    </quotas>
    <users>
        <!-- 读写用户  -->
        <default>
            <password_sha256_hex>967f3bf355dddfabfca1c9f5cab39352b2ec1cd0b05f9e1e6b8f629705fe7d6e</password_sha256_hex>
            <networks incl="networks" replace="replace">
                <ip>::/0</ip>
            </networks>
            <profile>default</profile>
            <quota>default</quota>
        </default>
        <!-- 只读用户  -->
        <readonly>
            <password_sha256_hex>967f3bf355dddfabfca1c9f5cab39352b2ec1cd0b05f9e1e6b8f629705fe7d6e</password_sha256_hex>
            <networks incl="networks" replace="replace">
                <ip>::/0</ip>
            </networks>
            <profile>readonly</profile>
            <quota>default</quota>
        </readonly>
    </users>
</yandex>

5、同步文件到集群的其他机器中

clush -a --copy /etc/security/limits.d/clickhouse.conf --dest /etc/security/limits.d/

clush -a --copy /etc/init.d/clickhouse-server --dest /etc/init.d

clush -g replica1,replica2,distributed -b --copy /etc/clickhouse-server/config.xml --dest /etc/clickhouse-server/

clush -a --copy /etc/clickhouse-server/users.xml --dest /etc/clickhouse-server/

clush -a --copy /etc/clickhouse-server/metrika.xml --dest /etc/clickhouse-server/

四、启动

1、重启服务

因为clickhouse在安装好后就已经启动了，当然也是自动设置好了开机自启的，当我们把配置文件改好肯定需要重启啦。

clush -a 'service clickhouse-server restart'

2 、在各机器连上客户端

我这边只是随便找一台看看

clickhouse-client -u default --password 6lYaUiFi7

3、查看集群

show databases

use system

show tables

select * from clusters

看到我们集群自己搭建好了，可以看到我们有3个分片，每个分片只有一个副本。至此集群搭建成功，但是能不能进行通信需要手动建表测一下既可，只要zk没问题一般就没什么问题。

我们在zk里看一下有没有clickhouse，说明我们的clickhouse中配置的zk起作用了

需要说明的是，在启动之前我们必须要有zookeeper服务，无论服务启动在哪里，在metrika.xml是进行配置了的，我这里只用的单机版的进行的，一般采用集群模式的zk，出于可靠性的保证。

喝粥也会胖的唐僧

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
clickhouse集群安装

clickhouse集群的安装方式很多，这里采取的是简单方便的yum安装的方式进行的。一、工具的准备1、clustershell首先需要安装集群管理的工具，便于的安装的时候一次性就安装好，而不用每台机器都去安装一遍，这个工具就是clustershellyum install -y clustershell待安装成功编辑配置文件vi /etc/clustershel...
复制链接

扫一扫

专栏目录