1、Ck资源配置:
(16C 64G 1T)* 3
2、CK 3台机器的初始化配置文件
<yandex>
<clickhouse_remote_servers>
<!--集群配置,需要在Config.xml 中配置-->
<perfprod_3shards_1replicas>
<!--集群模式配置 3 分片 1副本 -->
<shard>
<internal_replication>true</internal_replication>
<!--数据主动复制 -->
<replica>
<host>clickhouse-devops-prod-MRIG0a40</host>
<port>9000</port>
<user>xxx</user>
<password>xxxxx</password>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse-devops-prod-VNUMuh7e</host>
<port>9000</port>
<user>xxx</user>
<password>xxxxx</password>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse-devops-prod-nS7hs4ot</host>
<port>9000</port>
<user>xxx</user>
<password>xxxxx</password>
</replica>
</shard>
</perfprod_3shards_1replicas>
</clickhouse_remote_servers>
<zookeeper-servers>
<!--ZK 配置 -->
<node index="1">
<host>127.0.0.1</host>
<port>3288</port>
</node>
<node index="2">
<host>127.0.0.2</host>
<port>3288</port>
</node>
<node index="3">
<host>127.0.0.3</host>
<port>3288</port>
</node>
</zookeeper-servers>
<macros>
<!--宏定义,各节点自定义的配置 -->
<cluster>perfprod_3shards_1replicas</cluster>
<shard>01</shard>
<replica>clickhouse-devops-prod-MRIG0a40</replica>
</macros>
<clickhouse_compression>
<!--压缩算法 -->
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>
</yandex>
PS :需要将 metrika 配置文件 incl 到 节点的config 文件 集群配置才能生效。
3、设计 CK 逻辑架构 3 shard +1 replica ; 因CK 应用场景是 实时统计计算,固使用分布式表发挥分布式计算能力
分片路由规则:根据 user_Id 做 Hash 路由数据隔离。
4、CK 基础名词解释
名词 | 含义 | 备注 |
---|---|---|
节点(node) | 部署CK服务的一台机器 | |
集群(cluster) | 多台CK 服务组合方式 | |
分片(shard) | 数据分布式存储 | |
本地表 | 存放在节点本地的实体表 | |
分布式表 | 分布式逻辑表 | 类似视图 |
5、数据库初始化SQL
创建数据库:
CREATE DATABASE IF NOT EXISTS test on CLUSTER perfprod_3shards_1replicas;
创建本地表:
CREATE TABLE test.user_info_real on CLUSTER perfprod_3shards_1replicas
(
`id` Int64,
`user_id` String,
`user_name` String
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/user_info_real', '{replica}')
PRIMARY KEY id
ORDER BY id
SETTINGS index_granularity = 8192;
创建分布式表:
create table test.user_info on CLUSTER perfprod_3shards_1replicas as test.user_info_real ENGINE = Distributed(perfprod_3shards_1replicas, test, user_info_real,user_id);// 根据租户ID hash分片.
删除表:
DROP TABLE test.user_info on CLUSTER perfprod_3shards_1replicas