1.什么是ClickHouse
ClickHouse®是一个高性能,面向列的SQL数据库管理系统(DBMS),用于在线分析处理(OLAP)。它既可以作为开源软件,也可以作为云服务。
2.rpm安装,设置官方存储库
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
3.安装ClickHouse服务器和客户端
sudo yum install -y clickhouse-server clickhouse-client
4.启动ClickHouse服务端
sudo systemctl enable clickhouse-server
sudo systemctl start clickhouse-server
sudo systemctl status clickhouse-server
clickhouse-client # or "clickhouse-client --password" if you set up a password.
停止:
sudo systemctl stop clickhouse-server
重启:
sudo systemctl restart clickhouse-server
5.安装独立ClickHouse Keeper
在生产环境中,我们强烈建议在专用节点上运行ClickHouse Keeper。在测试环境中,如果您决定在同一台服务器上运行ClickHouse Server和ClickHouse Keeper,则不需要安装ClickHouse Keeper,因为它包含在ClickHouse Server中。这个命令只需要在独立ClickHouse Keeper服务器上
sudo yum install -y clickhouse-keeper
启用并启动ClickHouse Keeper:
sudo systemctl enable clickhouse-keeper
sudo systemctl start clickhouse-keeper
sudo systemctl status clickhouse-keeper
安装:
sudo yum install clickhouse-server clickhouse-client
6.管理ClickHouse的建议
-
我们建议使用至少4GB的RAM来执行重要的查询。ClickHouse服务器可以用更少的RAM运行,但是查询经常会中断
-
所需的RAM容量通常取决于:
查询的复杂性。
查询中处理的数据量。
要计算所需的RAM容量,可以估计GROUP BY、DISTINCT、JOIN和其他操作的临时数据大小。
-
为了减少内存消耗,ClickHouse可以将临时数据交换到外部存储。详细信息请参见外部内存中的GROUP BY。
-
我们建议在生产环境中禁用操作系统的交换文件。
-
ClickHouse二进制文件至少需要2.5 GB的磁盘空间进行安装。
-
您的数据所需的存储容量可能会根据以下因素单独计算
对数据量的估计。
您可以对数据进行采样,并从中获得一行的平均大小。然后将该值乘以计划存储的行数。
数据压缩系数。
要估计数据压缩系数,请将数据样本加载到ClickHouse中,并将数据的实际大小与存储的表的大小进行比较。例如,点击流数据通常被压缩6-10倍。
要计算要存储的最终数据量,请将压缩系数应用于估计的数据量。如果计划将数据存储在多个副本中,则将估计的容量乘以副本的数量。
-
对于分布式ClickHouse部署(集群),我们建议至少使用10G级网络连接。
网络带宽对于处理具有大量中间数据的分布式查询至关重要。此外,网络速度也会影响复制进程
7.测试
使用ClickHouse -client连接到ClickHouse服务:
sudo clickhouse-client
my-host :)
创建表:
CREATE TABLE trips( `trip_id` UInt32, `vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15), `pickup_date` Date, `pickup_datetime` DateTime, `dropoff_date` Date, `dropoff_datetime` DateTime, `store_and_fwd_flag` UInt8, `rate_code_id` UInt8, `pickup_longitude` Float64, `pickup_latitude` Float64, `dropoff_longitude` Float64, `dropoff_latitude` Float64, `passenger_count` UInt8, `trip_distance` Float64, `fare_amount` Float32, `extra` Float32, `mta_tax` Float32, `tip_amount` Float32, `tolls_amount` Float32, `ehail_fee` Float32, `improvement_surcharge` Float32, `total_amount` Float32, `payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4), `trip_type` UInt8, `pickup` FixedString(25), `dropoff` FixedString(25), `cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3), `pickup_nyct2010_gid` Int8, `pickup_ctlabel` Float32, `pickup_borocode` Int8, `pickup_ct2010` String, `pickup_boroct2010` String, `pickup_cdeligibil` String, `pickup_ntacode` FixedString(4), `pickup_ntaname` String, `pickup_puma` UInt16, `dropoff_nyct2010_gid` UInt8, `dropoff_ctlabel` Float32, `dropoff_borocode` UInt8, `dropoff_ct2010` String, `dropoff_boroct2010` String, `dropoff_cdeligibil` String, `dropoff_ntacode` FixedString(4), `dropoff_ntaname` String, `dropoff_puma` UInt16) ENGINE = MergeTree PARTITION BY toYYYYMM(pickup_date) ORDER BY pickup_datetime;
插入测试数据:
INSERT INTO trips SELECT * FROM s3( 'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_{1..2}.gz', 'TabSeparatedWithNames', " `trip_id` UInt32, `vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15), `pickup_date` Date, `pickup_datetime` DateTime, `dropoff_date` Date, `dropoff_datetime` DateTime, `store_and_fwd_flag` UInt8, `rate_code_id` UInt8, `pickup_longitude` Float64, `pickup_latitude` Float64, `dropoff_longitude` Float64, `dropoff_latitude` Float64, `passenger_count` UInt8, `trip_distance` Float64, `fare_amount` Float32, `extra` Float32, `mta_tax` Float32, `tip_amount` Float32, `tolls_amount` Float32, `ehail_fee` Float32, `improvement_surcharge` Float32, `total_amount` Float32, `payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4), `trip_type` UInt8, `pickup` FixedString(25), `dropoff` FixedString(25), `cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3), `pickup_nyct2010_gid` Int8, `pickup_ctlabel` Float32, `pickup_borocode` Int8, `pickup_ct2010` String, `pickup_boroct2010` String, `pickup_cdeligibil` String, `pickup_ntacode` FixedString(4), `pickup_ntaname` String, `pickup_puma` UInt16, `dropoff_nyct2010_gid` UInt8, `dropoff_ctlabel` Float32, `dropoff_borocode` UInt8, `dropoff_ct2010` String, `dropoff_boroct2010` String, `dropoff_cdeligibil` String, `dropoff_ntacode` FixedString(4), `dropoff_ntaname` String, `dropoff_puma` UInt16 ") SETTINGS input_format_try_infer_datetimes = 0
成功提示:
INSERT INTO trips
SETTINGS input_format_try_infer_datetimes = 0
SELECT *
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_{1..2}.gz', 'TabSeparatedWithNames', ` \`trip_id\` UInt32, \`vendor_id\` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15), \`pickup_date\` Date, \`pickup_datetime\` DateTime, \`dropoff_date\` Date, \`dropoff_datetime\` DateTime, \`store_and_fwd_flag\` UInt8, \`rate_code_id\` UInt8, \`pickup_longitude\` Float64, \`pickup_latitude\` Float64, \`dropoff_longitude\` Float64, \`dropoff_latitude\` Float64, \`passenger_count\` UInt8, \`trip_distance\` Float64, \`fare_amount\` Float32, \`extra\` Float32, \`mta_tax\` Float32, \`tip_amount\` Float32, \`tolls_amount\` Float32, \`ehail_fee\` Float32, \`improvement_surcharge\` Float32, \`total_amount\` Float32, \`payment_type\` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4), \`trip_type\` UInt8, \`pickup\` FixedString(25), \`dropoff\` FixedString(25), \`cab_type\` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3), \`pickup_nyct2010_gid\` Int8, \`pickup_ctlabel\` Float32, \`pickup_borocode\` Int8, \`pickup_ct2010\` String, \`pickup_boroct2010\` String, \`pickup_cdeligibil\` String, \`pickup_ntacode\` FixedString(4), \`pickup_ntaname\` String, \`pickup_puma\` UInt16, \`dropoff_nyct2010_gid\` UInt8, \`dropoff_ctlabel\` Float32, \`dropoff_borocode\` UInt8, \`dropoff_ct2010\` String, \`dropoff_boroct2010\` String, \`dropoff_cdeligibil\` String, \`dropoff_ntacode\` FixedString(4), \`dropoff_ntaname\` String, \`dropoff_puma\` UInt16 `)
SETTINGS input_format_try_infer_datetimes = 0
Query id: d70fd215-15f7-47e0-a05d-c1e469e09406
Ok.
0 rows in set. Elapsed: 63.457 sec. Processed 2.00 million rows, 163.07 MB (31.51 thousand rows/s., 2.57 MB/s.)
Peak memory usage: 1.21 GiB.
查询总数量:
SELECT count() FROM trips
更多操作参考官网高级教程:https://clickhouse.com/docs/en/tutorial
8.创建用户和角色
要使用DDL管理命名集合,用户必须具有named_control_collection权限。这可以通过添加一个文件到/etc/clickhouse-server/users.d/来分配。这个例子给了默认用户access_management和named_collection_control权限,在/etc/clickhouse-server/users.d/新建user_default.xml
<clickhouse>
<users>
<default>
<password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex replace=true>
<access_management>1</access_management>
</default>
</users>
</clickhouse>
密码生成:
PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
结果的第一行是密码。第二行是对应的SHA256哈希值
0YRprnwF
25405afd91c280153e28a4ff88467d658b48084692241d0dc4b93419dfb63f4e
为了与MySQL客户端兼容,密码可以指定为双SHA1哈希。将其放在password_double_sha1_hex元素中。
例如:<password_double_sha1_hex>08b4a0f1de6ad37da17359e592c8d74788a83eb0</password_double_sha1_hex>。
如何从shell生成密码的示例:
PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
结果的第一行是密码。第二行是对应的双SHA1哈希值
修改user_default.xml,password_sha256_hex元素值,
修改user_default.xml文件权限
chown -R clickhouse:clickhouse users.d
查看配置是否生效:
systemctl status clickhouse-server.service
输出:
Oct 12 14:52:14 test clickhouse-server[29459]: Merging configuration file '/etc/clickhouse-server/users.d/user_default.xml'
方式二,上面修改default密码方式笔者没有成功,服务重启异常
修改users.xml 权限
chmod 777 users.xml
默认情况下,对所有用户禁用sql驱动的访问控制和帐户管理。需要在“users.xml”配置文件中配置至少一个用户,并将access_management、named_collection_control、show_named_collections和show_named_collections_secrets的设置值设置为1。
启用SQL用户模式 default用户添加权限:
<access_management>1</access_management>
<named_collection_control>1</named_collection_control>
<show_named_collections>1</show_named_collections>
<show_named_collections_secrets>1</show_named_collections_secrets>
默认用户是通过新安装创建的唯一用户,默认情况下也是用于节点间通信的帐户。
在生产环境中,由于默认帐户用于节点间通信,因此建议在使用SQL admin用户配置节点间通信并且使用、集群凭据和/或节点间HTTP和传输协议凭据设置节点间通信后禁用default用户。
修改配置完成重启节点
启动ClickHouse客户端:
clickhouse-client --user default --password <password>
创建SQL管理员帐户:
CREATE USER clickhouse_admin IDENTIFIED BY 'password';
授予新用户完整的管理权限:
GRANT ALL ON *.* TO clickhouse_admin WITH GRANT OPTION;
9.使用dbeaver可视化工具操作clickhouse
-
下载地址:https://dbeaver.io/download/
-
设置/etc/clickhouse-server/config.xml权限:
chmod 777 config.xml
-
编辑/etc/clickhouse-server/config.xml,设置listen_host
powershell <listen_host>0.0.0.0</listen_host>
保存,重新启动
-
创建连接: