clickhouse集群是表级别的,在节点层面上clickhouse每个节点都是独立的, 即使组成集群了也是独立的,所以永远只能连接到clickhouse的单节点上。可以类比redis,有点像是积木,可以按配置文件随意组装。只要各节点上安装clickhouse单机之后再配置config.xml和/etc/metrika.xml将其他节点的连接信息以及分片和副本情况配置好就可以了。
参考ClickHouse单节点部署 https://blog.csdn.net/shy_snow/article/details/123477519
修改config.xml
2. ## 安装zookeeper
参考zookeeper集群安装部署https://blog.csdn.net/shy_snow/article/details/124547559
zookeeper的连接信息需要配置到clickhouse的config.xml中
3. ## /etc/metrika.xml集群配置
因为我有106-109四台机器每个上面有一个clickhouse,所以我选择组成2shard分片2replicas副本集群。
副本: clickhouse的复制表会将数据同步到所有副本中,通过2副本可以在单节点故障时,其他节点中另一个副本还在,就可以保证数据安全,不丢失。
分片:分片是将一个表的记录切分到各分片中。
clickhouse单机配置config.xml中可以使用incl来包含 /etc/metrika.xml中的配置,
vi /etc/clickhouse-server/config.xml
在clickhouse标签中增加
<include_from>/etc/metrika.xml</include_from>
<!-- zookeeper用于复制表的同步 -->
<zookeeper>
<node>
<host>zk1</host>
<port>2181</port>
</node>
<node>
<host>zk2</host>
<port>2181</port>
</node>
<node>
<host>zk3</host>
<port>2181</port>
</node>
</zookeeper>
<remote_servers incl="clickhouse_remote_servers" />
<macros incl="macros" optional="true" />
vi /etc/metrika.xml
<yandex>
<clickhouse_remote_servers>
<!-- cluster_2shards_2replicas标签名就是集群名可以自定义, 查询cluster集群名称 select * from system.clusters; 2分片2副本 2副本中一个节点故障时另外一个节点上还有完整副本数据 -->
<cluster_2shards_2replicas>
<shard> <!-- shard1 -->
<internal_replication>true</internal_replication>
<replica>
<host>192.168.129.106</host>
<port>9000</port>
<user>default</user>
<password></password>
</replica>
<replica>
<host>192.168.129.108</host>
<port>9000</port>
<user>default</user>
<password></password>
</replica>
</shard>
<!-- shard2 -->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.129.107</host>
<port>9000</port>
<user>default</user>
<password></password>
</replica>
<replica>
<host>192.168.129.109</host>
<port>9000</port>
<user>default</user>
<password></password>
</replica>
</shard>
</cluster_2shards_2replicas>
</clickhouse_remote_servers>
<!-- macros需每个节点不同配置,建复制表时会使用macros中配置的变量值 -->
<!-- 查询macros配置查询到说明配置成功 SELECT * FROM system.macros; -->
<macros>
<shard>02</shard>
<replica>host107</replica>
</macros>
<networks>
<ip>::/0</ip>
</networks>
<!-- 配置压缩 -->
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>
</yandex>
复制metrika.xml到各节点上之后需要修改各节点metrika.xml文件macros下的replica的值,各节点需要不同,可以使用集群中的host值。
-- 查询cluster集群名称 比如cluster_2shards_2replicas
select * from system.clusters;
SELECT * FROM system.macros;
drop database if exists test on cluster cluster_2shards_2replicas ;
-- 建本地表(on cluster 会在集群的各个节点上建表, 但是insert数据只会在当前节点)
drop table if exists test.cmtest on cluster cluster_2shards_2replicas;
-- 复制表
CREATE TABLE test.cmtest on cluster cluster_2shards_2replicas (
`id` String COMMENT 'id', `nginxTime` DateTime COMMENT 'nginxTime'
) ENGINE = ReplicatedMergeTree() partition by toYYYYMMDD(nginxTime) primary key (id) ORDER BY (id);
-- ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/st_order_mt','{replica}')
--
drop table if exists test.cmtest_dist on cluster cluster_2shards_2replicas;
-- 分布式表,类似视图,不保存数据,但是查询时会合并各本地表的数据,插入时会根据策略均衡到各节点;
create TABLE test.cmtest_dist on cluster cluster_2shards_2replicas as test.cmtest
ENGINE = Distributed("cluster_2shards_2replicas", "test", "cmtest", rand());
-- 对本地insert只插入到本地
insert into test.cmtest values ('109',now());
-- 对分布式表插入会根据规则路由到某个节点
insert into test.cmtest values ('1004000',now()+3600*24);
-- 本地表只查询到当前节点上的数据
select * from test.cmtest;
-- 对分布式表的查询会查询到所有节点上的数据
select * from test.cmtest_dist;
-- 删除分布式表不会删除数据,重新创建分布式表后仍然可以查询到全量数据
drop table if exists test.cmtest_dist on cluster cluster_2shards_2replicas;
-- 删除本地表表会删除数据
drop table if exists test.cmtest on cluster cluster_2shards_2replicas;
drop database if exists test on cluster cluster_2shards_2replicas ;
下载chproxy
https://github.com/ContentSquare/chproxy/releases
chproxy是clickhouse官方推荐的代理工具, 只需要配置config.yml即可,chproxy官网地址https://www.chproxy.org/cn
config.yml的配置示例
vi config.yml
#o
server:
http:
listen_addr: ":9090"
allowed_networks: ["192.168.0.0/16","2.0.0.0/16"]
users:
- name: "clickhouse"
password: "123456"
to_cluster: "cluster_2shards_2replicas"
to_user: "default"
clusters:
- name: "cluster_2shards_2replicas"
replicas:
- name: "replica1"
nodes: ["192.168.129.106:8123","192.168.129.107:8123"]
- name: "replica2"
nodes: ["192.168.129.108:8123","192.168.129.109:8123"]
users:
- name: "default"
password: ""
max_concurrent_queries: 8
max_execution_time: 2m
如果没有副本可以不配置replicas,因为我这里有副本所以配置了replicas,replicas的节点可以参考下面的图.
chproxy解压、启停、热加载config.yml
tar -zxvf chproxy_1.15.1_linux_amd64.tar.gz
chmod +x chproxy
#启动
./chproxy -config=/usr/local/chproxy/config.yml
# 不停止chproxy重启热加载config.yml,想chproxy进程发送SIGHUP信号即可
kill -sighup pid
ps -ef | grep chproxy |grep -v grep | head -1 | awk '{print $2}' | xargs kill -sighup
# 停止,直接kill掉chproxy进程
ps -ef | grep chproxy |grep -v grep | head -1 | awk '{print $2}' | xargs kill
killChproxy.sh
pid=`ps -ef | grep chproxy |grep -v grep | head -1 | awk '{print $2}'`
if [ -z "$pid" ];then
echo "No found pid."
exit 1
fi
echo "kill ${pid}"
kill $pid
reloadConfig.sh
ps -ef | grep chproxy |grep -v grep | head -1 | awk '{print $2}' | xargs kill -sighup
- 参考
https://blog.csdn.net/weixin_37692493/article/details/114452689
chproxy官网https://www.chproxy.org/cn
https://blog.csdn.net/Jason_light/article/details/120888679
https://blog.csdn.net/iceyung/article/details/107524922