How to scale out your ClickHouse cluster?

随着应用流量增加,ClickHouse数据库需要扩展以确保性能。本文探讨了ClickHouse在扩展时面临的挑战,包括resharding和rebalancing,并研究了clickhouse-copier、原生导入导出和Kafka表引擎等解决方案。理想的解决方案应该是易于部署、使用和自动化,同时具备成本效益。文中提出了一种基于全备份、clickhouse-backup和clickhouse-copier的自动化扩容流程。
摘要由CSDN通过智能技术生成

Everything should be made as simple as possible, but not simpler. (Albert Einstein)

Summary

Any application/website that sees significant growth will eventually need to scale to accommodate traffic increases and deal with performance bottlenecks.

It’s critical that scaling is done in a way that ensures the integrity of their data. Before starting, you need to understand the limitations to determine which solution provides the greatest benefit for you.

Scaling has a price, either in terms of cost, performance, or usability

The problem / motivation

When you add one or more nodes to your database, ClickHouse doesn't have a built-in automatic feature able to rebalance data in the cluster.

A rebalance operation can take some time and can be very sensitive and complex if you should allow the process to be completely uninterrupted and ensure the integrity of your data.

Quick recap: How does scaling look like?

44ba4f3a58fd4e6e78ddef0075e3c344.png

Scaling main challenges

  • Resharding: adding new shards to the CH cluster

ca508ee43ac4515e3fc88f854eb372b7.png

here S is shard, R is Replica.

  • Rebalancing or not the data
    The scaling operation should be resumed to add only new shards to the cluster without any rebalancing operation.
    In this case, only new data will be distributed across all the shards but any queries on historical data will fetch data across the old topology. They will not benefit from the improvement of the scale-out operation.
    That’s why we decide to rebalance all the historical data on the new topology (across all the shards).

b9a245f79cc7d5f8072cc39c11744647.png

Few solutions explored

1 — clickhouse-copier

https://clickhouse.tech/docs/en/operations/utilities/clickhouse-copier/

  • part of standard CH server distribution

  • copies data from the tables in one cluster to tables in another (or the same) cluster.

  • initially designed for when source and target clusters are different

solution #1: build a new cluster with a new topology and migrate data with clickhouse-copier (from source_cluster to target_cluster)

3956a241de1dbab6090ccbafc5c693c8.png

solution #2 (resharding in place): add shards to the existing cluster, create a new database ( target_db ) and migrate data with clickhouse-copier ( from source_db to target_db )

0789b6fb615120d8256a7483a257f2ab.png

clickhouse-copier is a powerful tool to move data between ClickHouse clusters of any topology but migrate/copy data from your Prod environment can be risky and downgrade your performance time during the operation. Of course, you can also build a copy of your prod cluster and migrate from it but is an expensive solution.

2 — native import/export

  • export data in native format from source_cluster

  • import data to target_cluster (n shards)

Native format :

  • is the most efficient format: data is written and read by blocks in binary format

  • this format is “columnar”: it doesn’t convert columns to rows.

  • this is the format used in the native interface for interaction between servers, for using the command-line client

Another approach: ClickHouse and object storage — S3 table function

  • NEW: merged more than a year ago

  • For mergeTree table data

  • Basic import/export functionality

INSERT INTO FUNCTION s3(...) SELECT * FROM tableINSERT INTO table SELECT * FROM s3(bucket, format, structure, compression)

Native import/export is a basic and solid solution but the cost of disk storage used for it can not be ignored.

3 — Kafka table engines

  • Kafka table engine encapsulates topic within ClickHouse

  • topic contains messages

  • materialized view to transferring data to the final table in target_cluster

3cdd4990e7018dfb5d18bdc9083be466.png

Kafka table engine is not mature enough and has poor log experience to use it in the production environment. Also is supposed to install and pay for Kafka infrastructure.

Don’t make the process harder than it is. (Jack Welch)

All previous solutions described are not robust enough and simple to run in a production environment without any “unexpected” surprises, bugs, or crashes.

My desired solution suppose to be :

  • easy to deploy, to use, and simple to automate

  • scalable: be able to reduce time to proceed with this operation

  • robust: any progress should be saved to external storage

  • cost-effective: don't need expensive architecture/infrastructure

89995cd6be169866342d14b5edcd53dd.png

Let’s cook it! no better to describe the “recipe” to scale-out CH cluster from n to m shards with swim diagram as below:

61c8070293cddef816cd2857f9c79bc1.png

Ingredients / supplies

  1. The full backup copy of your original CH cluster in external storage

  2. The “resharding-cooker” :

  • create CH cluster (m shards) with 2 databases: source_db installed on n shards and default database on m shards

39760b788bd4602c0e67f0f4cd223040.png

  • using clickhouse-backup to restore data into source_db

  • using clickhouse-copier to migrate data from source_db to default (resharding in place)

  • using clickhouse-backup to backup data from default

Now we have the foundations to rebalance the data. We just need to automate the operation to be simple, robust, scalable, and cost-effective.

This solution is fully scalable: you can deploy few clusters (with m instances each) and execute in a loop the daily operation described above.

3. When you’ve finished the entire operation and already backup all your database in the new topology, just you need to :

  • create a new CH cluster

  • and restore it from external storage (operation)

That’s it — you scaled up your CH cluster !!

https://medium.com/@tsitbon/how-to-scale-out-your-clickhouse-cluster-aaccfd111a37

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值