列式存储数据库
用于在线分析处理查询 OLAP
SQL查询实时生成分析数据报告
参考:https://www.cnblogs.com/biehongli/p/14364802.html
https://blog.csdn.net/github_39577257/article/details/103066747
集群节点信息
搭建环境 :系统环境:centos8.1 64位
192.168.10.68 JAVATEST-MIDDLE1
192.168.10.69 JAVATEST-MIDDLE2
192.168.10.70 JAVATEST-MIDDLE3
版本:
zookeeper-3.4.14
clickhouse21.3.5.42
1.安装前的准备
1.1创建snt目录文件并挂载/Data/StorageDisk1/下:
cd /Data/StorageDisk1/
mkdir snt
ln -s /Data/StorageDisk1/snt/ /snt
cd /snt
mkdir -p soft/zookeeper
chown -R snt:snt /snt/*
1.2 创建zookeeper和clickhouse数据存放目录和日志目录
1.2.1. 创建zookeeper数据和日志目录
mkdir -p /snt/data/zookeeper/data
mkdir -p /snt/log/zookeeper
1.2.2.创建clickhouser数据和日志目录并修改属主和属组
mkdir -p /snt/data/clickhouser/
mkdir -p /snt/data/clickhouser/tmp
mkdir -p /snt/log/clickhouse-server
到对应的目录下修改属主和属组:(必须修改 不让后面启动clickhouse会报错)
chown -R clickhouse:clickhouse clickhouse-server/
chown -R clickhouse:clickhouse clickhouser/
1.3CentOS取消打开文件数限制 (默认的文件数限制是1024 用命令: ulimit -n)
在/etc/security/limits.conf /etc/security/limits.d/clickhouse.conf 这两个文件的末尾加入以下内容:
vim /etc/security/limits.conf
* soft nofile 65536 *:#指所有的框架 65536:某个框架可以打开最大文件数
* hard nofile 65536
* soft nproc 131072 # 打开最大进程数
* hard nproc 131072
vim /etc/security/limits.d/clickhouse.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
1.4 CentOS 取消SELINUX 关闭防火墙
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
systemctl status firewalld.service
systemctl stop firewalld.service && systemctl disable firewalld.service
2.搭建zookeeper集群
版本:zookeeper-3.4.14.tar.gz
2.1上传包至/snt/soft/zookeeper目录下解压tar包:
cd /snt/soft/zookeeper
tar -zxvf zookeeper-3.4.14.tar.gz
2.2 配置文件
在zookeeper的conf目录,拷贝zoo_sample.cfg为zoo.cfg
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir= /snt/data/zookeeper
dataLogDir= /snt/logs/zookeeper
autopurge.purgeInterval=0
globalOutstandingLimit=200
clientPort=2181
server.1=192.168.10.68:2888:3888
server.2=192.168.10.69:2888:3888
server.3=192.168.10.70:2888:3888
2.3. 设置myid 68 69 70 对应设置myid
创建数据目录,zoo.cfg配置文件里dataDir指定的那个目录下创建myid文件,并且指定id,改id为你zoo.cfg文件中server.1=ip:2888:3888中的1.只要在myid头部写入1即可.同理其它两台机器的id对应上。
vim /snt/data/zookeeper/myid
1
2.4启动zookeeper
cd /snt/soft/zookeeper/zookeeper/zookeeper-3.4.14/bin
启动:./zkServer.sh start
查看状态:./zkServer.sh status
成功后一台是leade 两台是follower
ZooKeeper JMX enabled by default
Using config: /snt/soft/zookeeper/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: leader
ZooKeeper JMX enabled by default
Using config: /snt/soft/zookeeper/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: follower
ZooKeeper JMX enabled by default
Using config: /snt/soft/zookeeper/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: follower
2.5. 测试zookeeper
./zkCli.sh -server 192.168.10.68
3.安装clickhouse (68.69.70)
3.1添加官方存储库
yum install yum-utils
rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG 或者curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash
yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
3.2 三种安装方式 :yum rpm包(需要在官网下载指定的包)二进制安装
#yum 安装:
yum install clickhouse-server clickhouse-client
rpm安装:
下载地址:https://github.com/ClickHouse/ClickHouse/releases/tag/v21.5.1.6463-testing
需要的clickhouse包:
rpm package clickhouse-client-21.5.1.6463-2.noarch.rpm
package clickhouse-server-21.5.1.6463-2.noarch.rpm
rpm package clickhouse-common-static-21.5.1.6463-2.x86_64.rpm
package clickhouse-common-static-dbg-21.5.1.6463-2.x86_64.rpm
上传包至:/snt/soft/clickhouse/目录下
安装:
rpm -ivh *.rpm
二进制安装:
export LATEST_VERSION=`curl https://api.github.com/repos/ClickHouse/ClickHouse/tags 2>/dev/null | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -n 1`
curl -O https://repo.clickhouse.tech/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-client-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
解压
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz (ClickHouse编译的二进制文件。)
clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz (创建clickhouse-server软连接,并安装默认配置服务)
clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz (创建clickhouse-client客户端工具软连接,并安装客户端配置文件。)
clickhouse-server-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz (带有调试信息的ClickHouse二进制文件。)
clickhouse-client-$LATEST_VERSION/install/doinst.sh
2.3 启动clickhouse:
启动:systemctl start clickhouse-server
查看是否正常:
systemctl status clickhouse-server.service
登录clickhouse:
clickhouse-client
clickhouse-client -m
注:用/etc/init.d/clickhouse-server start 或者 启动
Init script is already running ---------- 报错Init script is already running
clickhouse-client
ClickHouse client version 21.3.4.25 (official build).
Connecting to localhost:9000 as user default.
Code: 210. DB::NetException: Connection refused (localhost:9000)
在centos7之后的操作系统,需要用systemctl启动才可以
3 clickhouse集群部署 (集群clickhouse报错信息地址(必须使用root权限才能看到log信息,否则空白一片))
先在68节点配置/etc/clickhouse-server/metrika.xml(需要自己创建,默认为 /etc/metrika.xml,自己制定时需要在 config.xml 中指明),这个文件主要将ClickHouse各个服务的host和port、ZooKeeper集群的各个节点配置到文件中。69和70也同样配置,只不过需要将标签下的标签中的值改为自己节点的主机名或者ip。
目录解释:(在配置文件config.xml修改的日志数据目录等 需要新建其对应的数据目录 且修改属主属组为clickhouse )
/var/lib/clickhouse:默认的数据存储目录,如果是生产环境可以将其修改到空间较大的磁盘挂载路径。可以通过修改 /etc/clickhouse-server/config.xml 配置文件中
/var/log/clickhouse-server:默认的日志保存目录。同样可以通过修改 /etc/clickhouse-server/config.xml 配置文件中 和 标签值来设置。
/etc/cron.d/clickhouse-server:clickhouse server 的一个定时配置,用于恢复因异常中断的 ClickHouse 服务进程。
~/.clickhouse-client-history:client 执行的 sql 历史记录
3.1首先以一个节点为例配置,vim /etc/metrika.xml,添加配置信息如下:
<yandex>
<clickhouse_remote_servers>
<!-- 定义的集群名 -->
<perftest_3shards_1replicas>
<!-- 数据分片1 -->
<shard>
<internal_replication>true</internal_replication>
<!-- 主副本 -->
<replica>
<host>192.168.10.68</host>
<port>9000</port>
</replica>
</shard>
<!-- 数据分片2 -->
<shard>
<replica>
<internal_replication>true</internal_replication>
<host>192.168.10.69</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.10.70</host>
<port>9000</port>
</replica>
</shard>
</perftest_3shards_1replicas>
</clickhouse_remote_servers>
<!--zookeeper相关配置-->
<zookeeper-servers>
<node index="1">
<host>192.168.10.68</host>
<port>2181</port>
</node>
<node index="2">
<host>192.168.10.69</host>
<port>2181</port>
</node>
<node index="3">
<host>192.168.10.70</host>
<port>2181</port>
</node>
</zookeeper-servers>
<macros>
<replica>192.168.10.68</replica>
</macros>
<networks>
<ip>::/0</ip>
</networks>
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>
</yandex>
3.2 传至69 70 服务器
scp /etc/clickhouse-server/metrika.xml root@192.168.10.69:/etc/clickhouse-server/metrika.xml
需要注意将 <macros>标签下的ip改成自己(69 ,70)的ip
<macros>
<replica>192.168.10.68</replica>
</macros>
3.3配置config.xml配置文件(因为集群之间需要互相方位其它节点的服务,需要开放ClickHouse服务的ip和端口,在68、69、70三个机器上配置/etc/clickhouse-server/config.xml文件,在标签下释放 <listen_host>标签(大概在69、70行),配置如下:
备份config.xml文件:
cd /etc/clickhouse-server/ && mv config.xml config.xml.bak
vim config.xml
修改如下:
<yandex>
<! --修改日志存放目录 -->(默认的日志保存目录。同样可以通过修改 /etc/clickhouse-server/config.xml 配置文件中 <log> 和 <errorlog> 标签值来设置。)
<log>/snt/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/snt/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->
<interserver_http_host>192.168.10.68</interserver_http_host>
<!-- 开启远程访问-->
<listen_host>::</listen_host>
<!-- 设置时区为东八区,大概在第144行附近-->
<timezone>Asia/Shanghai</timezone>
<!-- 设置扩展配置文件的路径,大概在第229行附近-->
添加如下信息:
<remote_servers incl="clickhouse_remote_servers" />
<zookeeper incl="zookeeper-servers" optional="true" />
<macros incl="macros" optional="true" />
<macros incl="macros" optional="true" />
<builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>
<max_session_timeout>3600</max_session_timeout>
<default_session_timeout>300</default_session_timeout>
<max_table_size_to_drop>0</max_table_size_to_drop>
<merge_tree>
<parts_to_delay_insert>300</parts_to_delay_insert>
<parts_to_throw_insert>600</parts_to_throw_insert>
<max_delay_to_insert>2</max_delay_to_insert>
</merge_tree>
<max_table_size_to_drop>0</max_table_size_to_drop>
<max_partition_size_to_drop>0</max_partition_size_to_drop>
<distributed_ddl>
<!--在ZooKeeper中与DDL查询队列的路径 -->
<path>/snt/data/clickhouse/ddl</path>
</distributed_ddl>
<include_from>/etc/clickhouse-server/metrika.xml</include_from>
<!-- 大概在160附近,注释其中配置的用于测试分布式存储的分片配置-->
<!-- Test only shard config for testing distributed storage
<test_shard_localhost>
……
</test_unavailable_shard>
-->
<!-- 默认的数据存储目录,生产环境将其修改到空间较大的磁盘挂载路径, <path> 、<tmp_path> 和 <user_files_path> 标签值来设置 -->
<!-- Path to data directory, with trailing slash. -->
<path>/snt/data/clickhouse</path>
<!-- Path to temporary data for processing hard queries. -->
<tmp_path>/snt/data/clickhouse/tmp/</tmp_path>
</yandex>
3.4把config.xml文件传至69 ,70上 做同样的操作
先备份69 70上的config.xml文件: mv config.xml config.xml.bak
```bash
scp config.xml root@192.168.10.69:/etc/clickhouse-server/
scp config.xml root@192.168.10.70:/etc/clickhouse-server/
修改config.xml文件和68一样
<interserver_http_host>这行修改为本机的ip
<interserver_http_host>192.168.10.69</interserver_http_host>
3.5 创建default账号密码 新增sntck用户: (68 69 70 都要做同样的操作 集群方便管理三台都设置一样的密码)
密码配置有两种方式,一种是明文方式,一种是密文方式(sha256sum的Hash值),官方推荐使用密文作为密码配置,用户名和密码的配置主要是在标签中,下面的配置文件中配置了两个用户,一个是默认用户default,就是如果未指明用户时默认使用的用户,其密码配置的为sha256密文方式,第二个用户是sntck,为一个只读用户,即只能查看数据,无法建表修改数据等操作,其密码直接采用的明文方式进行配置
PASSWORD=$(base64 < /dev/urandom | head -c8); echo "snt2020"; echo -n "snt2020" | sha256sum | tr -d '-'
snt2020
5ae5a720015cb578a25ebcbab67c5174a77892eb7d4b1dce5912df737b0c413b #得到的密文密码
vim /etc/clickhouse-server/users.xml
<?xml version="1.0"?>
<yandex>
<!-- Profiles of settings. -->
<profiles>
<!-- Default settings. -->
<default>
<max_memory_usage>10000000000</max_memory_usage>
<use_uncompressed_cache>0</use_uncompressed_cache>
<load_balancing>random</load_balancing>
</default>
<!-- Profile that allows only read queries. -->
<readonly>
<readonly>1</readonly>
</readonly>
</profiles>
<!-- Users and ACL. -->
<users>
<default>
<!--
<password>snt2020</password>
通过如下命令随机执行随机获取一个: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
-->
<password_sha256_hex>5ae5a720015cb578a25ebcbab67c5174a77892eb7d4b1dce5912df737b0c413b</password_sha256_hex>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<!-- Settings profile for user. -->
<profile>default</profile>
<!-- Quota for user. -->
<quota>default</quota>
<!-- For testing the table filters -->
<databases>
<test>
<!-- Simple expression filter -->
<filtered_table1>
<filter>a = 1</filter>
</filtered_table1>
<!-- Complex expression filter -->
<filtered_table2>
<filter>a + b < 1 or c - d & 5</filter>
</filtered_table2>
<!-- Filter with ALIAS column -->
<filtered_table3>
<filter>c = 1</filter>
</filtered_table3>
</test>
</databases>
</default>
<sntck>
<password>123456</password>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<profile>readonly</profile>
<quota>default</quota>
</sntck>
</users>
<!-- Quotas. -->
<quotas>
<!-- Name of quota. -->
<default>
<!-- Limits for time interval. You could specify many intervals with different limits. -->
<interval>
<!-- Length of interval. -->
<duration>3600</duration>
<!-- No limits. Just calculate resource usage for time interval. -->
<queries>0</queries>
<errors>0</errors>
<result_rows>0</result_rows>
<read_rows>0</read_rows>
<execution_time>0</execution_time>
</interval>
</default>
</quotas>
</yandex>
4.每个节点都启动clickhouse的服务,和单节点启动一样,当出现无误后,查看clickhouse的log文件
4.1在每个节点启动clickhouse客户端,和单节点启动完全一样,查询集群信息
4.2 登录clickhouse
指定用户名和密码 :
clickhouse-client -h 192.168.10.68 -u default --password snt2020
指定sql命令方式:
clickhouse-client -h 192.168.10.68 --port 9000 -u default --password snt2020 --multiline -q "SELECT now()"
指定端口连接:
clickhouse-client -h 192.168.10.66 --port 9080 -u default --password snt2020
查看集群信息:
select * from system.clusters;
5.远程工具连接:DBeaver
新建连接
All(或者Analytical),选择ClickHouse,下一步
端口默认是8123,主机选择ClickHouse的Server节点(如果是集群,随意一个ClickHouse 服务节点都行)。填写用户认证处设置用户名和密码。
测试连接,会提示下载驱动,确认下载即可。