ClickHouse 高可用集群搭建
本文为作者手把手搭建 CK 集群记录,且搭建成功,但该集群环境为 QA,所以最后没有配置密码
1. 下载安装包
通过官网提供的地址下载
clickhouse-client-20.3.15.133-1.el7.x86_64.rpm
clickhouse-common-static-20.3.15.133-1.el7.x86_64.rpm
clickhouse-server-20.3.15.133-1.el7.x86_64.rpm
clickhouse-server-common-20.3.15.133-1.el7.x86_64.rpm
clickhouse-test-20.3.15.133-1.el7.x86_64.rpm
2. 下载安装依赖
wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/p/perl-JSON-XS-3.01-2.el7.x86_64.rpm
wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/p/perl-Types-Serialiser-1.0-1.el7.noarch.rpm
wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/p/perl-common-sense-3.6-4.el7.noarch.rpm
# 安装顺序不能错,不然会报错
# perl(JSON::XS) is needed by clickhouse-test-20.3.15.133-1.el7.x86_64
rpm -ivh perl-common-sense-3.6-4.el7.noarch.rpm
rpm -ivh perl-Types-Serialiser-1.0-1.el7.noarch.rpm
rpm -ivh perl-JSON-XS-3.01-2.el7.x86_64.rpm
缺失的依赖可以到该网址搜索下载,搜索的时候只用输入 perl(JSON::XS) 即可
3. 安装 ClickHouse
rpm -ivh clickhouse-server-common-20.3.15.133-1.el7.x86_64.rpm
rpm -ivh clickhouse-common-static-20.3.15.133-1.el7.x86_64.rpm
rpm -ivh clickhouse-server-20.3.15.133-1.el7.x86_64.rpm
rpm -ivh clickhouse-client-20.3.15.133-1.el7.x86_64.rpm
rpm -ivh clickhouse-test-20.3.15.133-1.el7.x86_64.rpm
4. 创建软件目录
cd /disk/data01
mkdir -p clickhouse/data
mkdir -p clickhouse/logs
mkdir -p clickhouse/tmp
mkdir -p clickhouse2/data
mkdir -p clickhouse2/logs
mkdir -p clickhouse2/tmp
chown -R clickhouse:clickhouse clickhouse*
5. 规划集群
shard_num | replica_num | host_name | host_address | port |
---|---|---|---|---|
1 | 1 | business-data-0001 | 192.168.4.193 | 9000 |
1 | 2 | business-data-0002 | 192.168.4.137 | 9001 |
2 | 1 | business-data-0002 | 192.168.4.137 | 9000 |
2 | 2 | business-data-0001 | 192.168.4.193 | 9001 |
6. 配置实例
vim /etc/clickhouse-server/config.xml
config.xml 配置如下:
<!-- config -->
<logger>
<level>trace</level>
<log>/disk/data01/clickhouse/logs/clickhouse-server.log</log>
<errorlog>/disk/data01/clickhouse/logs/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
</logger>
<!-- config -->
<include_from>/etc/clickhouse-server/metrika1.xml</include_from>
<!-- config -->
<listen_host>::</listen_host>
<!-- config -->
<path>/disk/data01/clickhouse/data</path>
<tmp_path>/disk/data01/clickhouse/tmp</tmp_path>
cd /etc/clickhouse-server/
mv config.xml config1.xml
cp config1.xml config2.xml
# 这里需要修改 config2.xml 中的端口以及数据存放的目录
vim config2.xml
config2.xml 修改内容如下
<logger>
<level>trace</level>
<!-- 修改第二台实例日志存放的路径 -->
<log>/disk/data01/clickhouse2/logs/clickhouse-server.log</log>
<errorlog>/disk/data01/clickhouse2/logs/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
</logger>
<!-- 修改端口,统一在 config1.xml 文件的基础上 +1 -->
<http_port>8124</http_port>
<tcp_port>9001</tcp_port>
<mysql_port>9005</mysql_port>
<!-- config -->
<include_from>/etc/clickhouse-server/metrika1.xml</include_from>
<!-- config -->
<interserver_http_port>9010</interserver_http_port>
<!-- 修改第二台实例的数据目录和 tmp 数据目录 -->
<path>/disk/data01/clickhouse2/data</path>
<tmp_path>/disk/data01/clickhouse2/tmp</tmp_path>
7. 添加配置附件
在 config.xml 中配置的 <include_from>/etc/clickhouse-server/metrika*.xml</include_from>
是用于加载附加配置的,将附加文件 include 进来,通过标签中的 incl
属性将对应的标签内容引入,比如 <remote_servers incl="clickhouse_remote_servers" >
就是引入 metrika*.xml 文件中的 <clickhouse_remote_servers>
标签
vim metrika1.xml
metrika1.xml 配置如下
<yandex>
<clickhouse_remote_servers>
<!-- 2分片1备份 -->
<cluster_2shards_1replicas>
<!-- 数据分片1 -->
<shard>
<replica>
<host>business-data-0001</host>
<port>9000</port>
</replica>
</shard>
<!-- 数据分片2 -->
<shard>
<replica>
<host>business-data-0002</host>
<port>9000</port>
</replica>
</shard>
</cluster_3shards_1replicas>
<!-- 1分片2备份 -->
<cluster_1shards_2replicas>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>business-data-0001</host>
<port>9000</port>
</replica>
<replica>
<host>business-data-0002</host>
<port>9000</port>
</replica>
</shard>
</cluster_1shards_2replicas>
<!-- 2分片2备份:使用表备份-->
<cluster_3shards_2replicas>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>business-data-0001</host>
<port>9000</port>
</replica>
<replica>
<host>business-data-0002</host>
<port>9001</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>business-data-0002</host>
<port>9000</port>
</replica>
<replica>
<host>business-data-0001</host>
<port>9001</port>
</replica>
</shard>
</cluster_3shards_2replicas>
</clickhouse_remote_servers>
<!-- 这里表示的是在 business-data-0001 机器上的 01 分片的副本 -->
<macros>
<layer>01</layer>
<shard>01</shard>
<replica>business-data-0001</replica>
</macros>
<!-- 配置 ZK 保证高可用 -->
<zookeeper-servers>
<node index="1">
<host>192.168.4.193</host>
<port>2181</port>
</node>
<node index="2">
<host>192.168.4.137</host>
<port>2181</port>
</node>
</zookeeper-servers>
<networks>
<ip>::/0</ip>
</networks>
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>
</yandex>
cp metrika1.xml metrika2.xml
vim metrika2.xml
<!-- 第一台机器上的第二个实例 -->
<macros>
<layer>01</layer>
<shard>02</shard>
<replica>business-data-0001</replica>
</macros>
<!-- 第二台机器上的第一个实例 -->
<macros>
<layer>01</layer>
<shard>02</shard>
<replica>business-data-0002</replica>
</macros>
<!-- 第二台机器上的第二个实例 -->
<macros>
<layer>01</layer>
<shard>01</shard>
<replica>business-data-0002</replica>
</macros>
根据以上配置即可完成在两台机器上互有分片的副本
8. 修改启动文件
vim /etc/init.d/clickhouse-server
CLICKHOUSE_BINDIR=/usr/bin
CLICKHOUSE_CRONFILE=/etc/cron.d/clickhouse-server
# 修改为对应的配置文件名
CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config1.xml
LOCKFILE=$LOCALSTATEDIR/$PROGRAM
RETVAL=0
CLICKHOUSE_PIDDIR=/var/run/$PROGRAM
# 修改为 $PROGRAM-1.pid
CLICKHOUSE_PIDFILE="$CLICKHOUSE_PIDDIR/$PROGRAM-1.pid
mv /etc/init.d/clickhouse-server /etc/init.d/clickhouse-server1
# 按照上面的配置修改配置项
cp /etc/init.d/clickhouse-server /etc/init.d/clickhouse-server2
9. 配置启动项并启动实例
cd /etc/init.d/
chkconfig --add clickhouse-server1
chkconfig --add clickhouse-server2
chkconfig --list
# 启动服务
systemctl start clickhouse-server1
systemctl start clickhouse-server1
# 通过命令查看服务状态是否启动正常
systemctl status clickhouse-server1 -l
systemctl status clickhouse-server1 -l
# 到日志目录查看是否启动成功,日志目录是否有日志输出
ll /disk/data01/clickhouse/logs
10. 通过客户端登录
# 没有配置密码,直接登录
# 这里建议不论是 QA,Pred,Prod 都把密码配置上
clickhouse-client
# 查看查询是否成功
select * from system.clusters;
11. 补充:
此种配置方式优点在于节省服务器成本,缺点在于clickhouse对于复杂查询本身占用CPU比较多,多一个服务器同时运行多个实例,可能会对性能造成一定影响。