前置准备
目的
在物理机中基于Docker服务运行两个Clickhouse容器,配置并连接Zookeeper服务器;依照官网对ReplicatedMergeTree引擎的描述分别在两个容器中创建副本表;通过Zookeeper对ClickHouse的支持达到数据同步,最终让两台ClickHouse互为数据副本(Data Replication).
准备
-
物理机基础环境
IP地址:192.168.182.10
-
Docker容器和IP信息
服务名称 网段名称 IP地址 主机名 Docker执行命令 CK节点1(第一个ClickHouse容器) bridge 172.17.0.3 7409ace09488 docker run -d --name clickhouse-server-c1 --ulimit nofile=262144:262144 --volume=/software/docker/clickhouse/ck1/config:/etc/clickhouse-server/ -v /software/docker/clickhouse/ck1/data:/var/lib/clickhouse -p 8123:8123 -p 9000:9000 -p 9009:9009 yandex/clickhouse-server CK节点2(第二个ClickHouse容器) bridge 172.17.0.4 b22679b9d346 docker run -d --name clickhouse-server-c2 --ulimit nofile=262144:262144 --volume=/software/docker/clickhouse/ck2/config:/etc/clickhouse-server/ -v /software/docker/clickhouse/ck2/data:/var/lib/clickhouse -p 8124:8123 -p 9001:9000 -p 9010:9009 yandex/clickhouse-server zookeeper bridge 172.17.0.5 不重要 docker run --name zookeeper -d -p 2181:2181 zookeeper -
ClickHouse服务配置
3.1. CK节点1和CK节点2的config/config.d目录中新建metrika-share.xml配置文件,并写入以下配置
- metrika-share.xml 内容
<?xml version="1.0"?> <clickhouse> <zookeeper> <node> <!--因zookeeper的2181端口已对外暴露,此处可直接使用本机地址访问,防止Zookeeper IP地址变动--> <host>192.168.182.10</host> <port>2181</port> </node> </zookeeper> </clickhouse>
3.2 CK节点1和CK节点2的config.xml配置中搜索<include_from> 标签,添加为以下内容
<!--因Docker挂载目录已映射成功,故此处可以直接连接到下列配置--> <include_from>/etc/clickhouse-server/config.d/metrika-share.xml</include_from>
注意:修改配置完毕后需要重启ClickHouse服务
问题复现
1.CK客户端中访问Zookeeper正常相应,打印结果如图1所示;
图1:判断Zookeeper是否正确连通
2.两个容器的CK客户端中分别创建ReplicatedMergeTree引擎的建表语句正常创建成功,建表语句如图2所示;
图2:两个节点的建表语句.
- CK节点1执行2条插入语句,结果为CK节点1存在数据,而CK节点2无数据,说明数据同步失败;同样在CK节点2执行2条插入语句,结果为CK节点2存在数据,而CK节点1无数据,使用IDEA的Zookeeper连接工具插件查看自动创建的【/clickhouse/tables/01/t_replicated_merge】目录,目前展示图如图3所示.
图3.Zookeeper中对ReplicatedMergeTree支持的节点展示图
主要目录解释
1). 【replicas】目录为副本的信息节点,包含了副本节点的连接IP、端口号、是否存活等信息.
2). 【log】目录为数据同步目录,即副本间数据同步通过该目录下的log-xxxxx进行数据交互,其中log-xxxxx可以等价为“insert into xxx values(v1,v2)”,当然还有附加属性如客户端标识、大小之类的信息.
问题定位
立即打开CK节点1的服务端错误日志,日志名为:【clickhouse-server.err.log】(Linux环境下错误日志所在位置一般为【/var/log/clickhouse-server】),发现错误【DNSResolver: Cannot resolve host (b22679b9d346), error 0: b22679b9d346.XXX.const: Code: 198. DB::Exception: Not found address of host: b22679b9d346. (DNS_ERROR)】;开发CK节点2的服务日志,也是类似错误;较完整错误日志打印如下:
2022.08.26 11:03:33.409494 [ 90 ] {} <Error> DNSResolver: Cannot resolve host (b22679b9d346), error 0: b22679b9d346.
2022.08.26 11:03:33.409738 [ 90 ] {} <Error> default.t_replicated_merge (35f8baf4-d1e4-43d9-88fc-c45e1ff8adb0): auto DB::StorageReplicatedMergeTree::processQueueEntry(ReplicatedMergeTreeQueue::SelectedEntryPtr)::(anonymous class)::operator()(DB::StorageReplicatedMergeTree::LogEntryPtr &) const: Code: 198. DB::Exception: Not found address of host: b22679b9d346. (DNS_ERROR), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa82d07a in /usr/bin/clickhouse
1. ? @ 0xa8e7951 in /usr/bin/clickhouse
2. ? @ 0xa8e8122 in /usr/bin/clickhouse
3. DB::DNSResolver::resolveHost(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xa8e71fe in /usr/bin/clickhouse
4. ? @ 0xa9d9493 in /usr/bin/clickhouse
5. ? @ 0xa9dd0f9 in /usr/bin/clickhouse
6. PoolBase<Poco::Net::HTTPClientSession>::get(long) @ 0xa9dc837 in /usr/bin/clickhouse
7. DB::makePooledHTTPSession(Poco::URI const&, Poco::URI const&, DB::ConnectionTimeouts const&, unsigned long, bool) @ 0xa9dacb8 in /usr/bin/clickhouse
8. DB::makePooledHTTPSession(Poco::URI const&, DB::ConnectionTimeouts const&, unsigned long, bool) @ 0xa9d9819 in /usr/bin/clickhouse
9. DB::UpdatablePooledSession::UpdatablePooledSession(Poco::URI, DB::ConnectionTimeouts const&, unsigned long, unsigned long) @ 0x141c76e9 in /usr/bin/clickhouse
10. DB::PooledReadWriteBufferFromHTTP::PooledReadWriteBufferFromHTTP(Poco::URI, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<void (std::__1::basic_ostream<char, std::__1::char_traits<char> >&)>, DB::ConnectionTimeouts const&, Poco::Net::HTTPBasicCredentials const&, unsigned long, unsigned long, unsigned long) @ 0x141be51e in /usr/bin/clickhouse
11. DB::DataPartsExchange::Fetcher::fetchPart(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::Context const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, DB::ConnectionTimeouts const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Throttler>, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::optional<DB::CurrentlySubmergingEmergingTagger>*, bool, std::__1::shared_ptr<DB::IDisk>) @ 0x141bb8f2 in /usr/bin/clickhouse
12. ? @ 0x1405188d in /usr/bin/clickhouse
13. DB::StorageReplicatedMergeTree::fetchPart(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, unsigned long, std::__1::shared_ptr<zkutil::ZooKeeper>) @ 0x13fae906 in /usr/bin/clickhouse
14. DB::StorageReplicatedMergeTree::executeFetch(DB::ReplicatedMergeTreeLogEntry&) @ 0x13fa12b4 in /usr/bin/clickhouse
15. DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x13f9074d in /usr/bin/clickhouse
16. ? @ 0x1404f41f in /usr/bin/clickhouse
17. DB::ReplicatedMergeTreeQueue::processEntry(std::__1::function<std::__1::shared_ptr<zkutil::ZooKeeper> ()>, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&, std::__1::function<bool (std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&)>) @ 0x14468165 in /usr/bin/clickhouse
18. DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>) @ 0x13fd1073 in /usr/bin/clickhouse
19. DB::ExecutableLambdaAdapter::executeStep() @ 0x1404fd11 in /usr/bin/clickhouse
20. DB::MergeTreeBackgroundExecutor<DB::OrdinaryRuntimeQueue>::routine(std::__1::shared_ptr<DB::TaskRuntimeData>) @ 0xa806ffa in /usr/bin/clickhouse
21. DB::MergeTreeBackgroundExecutor<DB::OrdinaryRuntimeQueue>::threadFunction() @ 0xa806eb5 in /usr/bin/clickhouse
22. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0xa8720aa in /usr/bin/clickhouse
23. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0xa873ec4 in /usr/bin/clickhouse
24. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa86f4b7 in /usr/bin/clickhouse
25. ? @ 0xa872ebd in /usr/bin/clickhouse
26. ? @ 0x7fb368403609 in ?
27. __clone @ 0x7fb36832a293 in ?
(version 22.1.3.7 (official build))
既然是找不到主机名,则要去寻找匹配的信息,正好图3提到的【replicas】目录有记录节点的主机和IP等信息,打开【replicas】目录发现【r_1(CK节点1)】和【r_2(CK节点2)】节点中记录的【host】信息正好就是错误日志中的host。【r_1】节点部分信息如图4所示,【r_2】节点部分信息如图5所示.
图4.CK节点1的节点信息截图
图5.CK节点2的节点信息截图
通过对ClickHouse官网的ReplicatedMergeTree配置参数的解释和config.xml文件的其他参数定义查阅,最终在官网文档【https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/#default_replica_name】中找到标签interserver_http_host,对此标签官网的解释为:【其他服务器访问此服务器时可以使用的主机名。如果省略,它的定义方式与hostname-f命令相同。用于脱离特定的网络接口。】。
番外: 都说默认和hostname -f命令执行相同,那就去容器节点看看吧,结果还真是!!! 如图6,图7所示.
图6. CK节点1的host
图7. CK节点2的host
问题解决
方案1
执行下列步骤:
1.修改CK节点1的config.xml文件,定位到标签【interserver_http_host】,将默认的“example.clickhouse.com”改为IP地址“172.17.0.3”
2.修改CK节点2的config.xml文件,定位到标签【interserver_http_host】,将默认的“example.clickhouse.com”改为IP地址“172.17.0.4”
3.重启服务
经过以上步骤,再重新查询任意节点的数据,发现之前未正常写入的数据都已经正确同步! 至此,问题圆满解决.
方案缺点
- Docker容器停止后,内部IP地址会发生变动,导致配置每次都需要更正;(重要)
- 副本应基于宏定义标签进行动态定值,需要额外在metrika-share.xml中配置宏标签 (废话)
- 重复配置配置文件,工作量大 (废话)
方案2
备注:该方案于2023年4月17日更新,较于【方案1】的优势便是将重要缺点补足(其他缺点解决可联系博主私发简单的执行脚本),且不用更改标签<interserver_http_host>,甚至可以应用在物理资源不足的测试/生产环境中…
思路
因Docker默认网络存在先天缺陷,导致属于该网段的Container-IP并未与Contailer-HostName做内部绑定;但Docker支持创建自定义网络,且会自动将IP同HostName进行关联,因此将多个CK容器置于同一个自定义网络中即可使其通过HostName进行通讯~
方案步骤
1.(如使用过方案1)注释CK容器的<interserver_http_host>标签
2. 执行Docker命令 docker network create clickhouse-network
3. 修改原有CK节点1和CK节点2容器的docker run 命令,在其中添加== --network clickhouse-network== 选项
4. 重启服务,over~~