Clickhouse入门笔记

最新推荐文章于 2024-08-06 14:22:18 发布

Data小成

最新推荐文章于 2024-08-06 14:22:18 发布

阅读量172

点赞数

分类专栏：大数据文章标签：大数据

本文链接：https://blog.csdn.net/cheng1210qq/article/details/110955725

版权

大数据专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Clickhouse入门笔记

1.基础操作

启动相关服务

[root@hadoop02 ck]# service clickhouse-server start

启动客户端

[root@hadoop02 ck]# clickhouse-client [-m]

关闭相关服务

[root@hadoop02 ck]# service clickhouse-server stop

查看集群

:) select * from system.clusters;

┌─cluster────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┐
│ perftest_3shards_1replicas │         1 │            1 │           1 │ hadoop02  │ 192.168.92.142 │ 9000 │        1 │ default │                  │
│ perftest_3shards_1replicas │         2 │            1 │           1 │ hadoop03  │ 192.168.92.143 │ 9000 │        0 │ default │                  │
│ perftest_3shards_1replicas │         3 │            1 │           1 │ hadoop04  │ 192.168.92.144 │ 9000 │        0 │ default │                  │
└────────────────────────────┴───────────┴──────────────┴─────────────┴───────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┘

2.客户端常用参数

在这里插入图片描述
例：
(1)非交互式模式下的查询语句：-q

[root@hadoop02 ck]# clickhouse-client -q 'show databases'

(2)默认当前操作的数据库：-d

[root@hadoop02 ck]# clickhouse-client -d system

(3)允许多行语句查询：-m （以分号结尾才执行）

[root@hadoop02 ck]# clickhouse-client -m

:) show
:-] databases;

SHOW DATABASES

┌─name────┐
│ default │
│ system  │
└─────────┘

(4)非交互模式下打印查询操作时间

[root@hadoop02 ck]# clickhouse-client -t -q 'show databases'
default
system
0.003

3.数据类型

(1)和其它数据库相比
在这里插入图片描述
(2)枚举类型

包括 Enum8 和 Enum16 类型，
Enum 只能保存 ‘string’= integer 的对应关系。
例：
创建一个枚举表

:) create table enum(boo Enum8('true'=0,'false'=1))engine=TinyLog;

插入数据

:) insert into enum values('true'),('false');

查看数据

:) select * from enum;
┌───boo─┐
│  true │
│ false │
└───────┘

查看枚举对应项

:) select cast(boo,'Int8') from enum;
┌─CAST(boo, \'Int8\')─┐
│                   0 │
│                   1 │
└─────────────────────┘

(3)数组类型

:) select array(1,2,3) as arr,toTypeName(arr);

┌─arr─────┬─toTypeName(array(1, 2, 3))─┐
│ [1,2,3] │ Array(UInt8)               │
└─────────┴────────────────────────────┘

(4)元组类型

:) select tuple(1,'a') as tu,toTypeName(tu);

┌─tu──────┬─toTypeName(tuple(1, \'a\'))─┐
│ (1,'a') │ Tuple(UInt8, String)        │
└─────────┴─────────────────────────────┘

(5)Date
日期类型，用两个字节存储，表示从 1970-01-01 (无符号) 到当前的日期值。

还有很多数据结构，可以参考官方文档：https://clickhouse.yandex/docs/zh/data_types/

4.表引擎

(1) TinyLog

特点：

最简单的表引擎
将数据存储在磁盘上
没有并发控制
不支持索引

使用场景：
只写入一次数据（数据之后不再更改），然后根据需要多次读取

存储位置（列式存储）：

[root@hadoop02 stu1]# pwd
/var/lib/clickhouse/data/default/stu1
[root@hadoop02 stu1]# ll
total 12
-rw-rw-rw- 1 clickhouse clickhouse 29 Dec 10 18:47 id.bin
-rw-rw-rw- 1 clickhouse clickhouse 48 Dec 10 18:47 name.bin
-rw-rw-rw- 1 clickhouse clickhouse 64 Dec 10 18:47 sizes.json

案例：

:) create table stu1(id Int8,name String)engine=TinyLog;

(2) Merge

特点：

本身不存数据
读是自动并行的，不支持写入
用于合并表数据

参数：
一个数据库名和一个用于匹配表名的正则表达式

案例：

:)create table t1 (id UInt16, name String) ENGINE=TinyLog;
:)create table t2 (id UInt16, name String) ENGINE=TinyLog;
:)create table t3 (id UInt16, name String) ENGINE=TinyLog;

:)insert into t1(id, name) values (1, 'first');
:)insert into t2(id, name) values (2, 'second');
:)insert into t3(id, name) values (3, 'i am in t3');
:)create table t (id UInt16, name String) ENGINE=Merge(currentDatabase(), '^t');

:) select * from t;
┌─id─┬─name───┐
│  2 │ second │
└────┴────────┘
┌─id─┬─name──┐
│  1 │ first │
└────┴───────┘
┌─id─┬─name───────┐
│  3 │ i am in t3 │
└────┴────────────┘

(3) MergeTree(重)

特点：

数据按主键排序
可以使用分区（指定了主键）
支持数据副本
支持数据采样

使用场景：
当你有巨量数据要插入到表中，你要高效地一批批写入数据片段，并希望这些数据片段在后台按照一定规则合并。

参数解析：
在这里插入图片描述
也可以：

ENGINE [=] MergeTree(date-column [,sampling_expression],(primary, key), index_granularity)

案例：

:) create table mt_table (date Date, id UInt8, name String) ENGINE=MergeTree(date, (id, name), 8192);

(4)ReplacingMergeTree

是在 MergeTree 的基础上，添加了“处理重复数据”的功能
它会删除具有相同主键的重复项

参数解析：
在这里插入图片描述
也可以：

ENGINE [=] ReplacingMergeTree(date-column [,sampling_expression], (primary, key), index_granularity, [ver])

案例：

create table rmt_table (date Date, id UInt8, name String,point UInt8) ENGINE= ReplacingMergeTree(date, (id, name), 8192,point);
插入一些数据：
insert into rmt_table values ('2019-07-10', 1, 'a', 20);
insert into rmt_table values ('2019-07-10', 1, 'a', 30);
insert into rmt_table values ('2019-07-11', 1, 'a', 20);
insert into rmt_table values ('2019-07-11', 1, 'a', 30);
insert into rmt_table values ('2019-07-11', 1, 'a', 10);
等待一段时间后查询
:) select * from rmt_table;
┌───────date─┬─id─┬─name─┬─point─┐
│ 2019-07-11 │  1 │ a    │    30 │
└────────────┴────┴──────┴───────┘

由结果可以看出，只保留了版本列值最大，且最后插入的记录

(5) Distributed (重)分布式

注意：在使用分布式引擎的时候Zookeeper集群需要启动

特点：

本身不存储数据
可以在多个服务器上进行分布式查询
读是自动并行的
读取时，远程服务器表的索引（如果有的话）会被使用

参数解析

Distributed(cluster_name, database, table [, sharding_key])

cluster_name - 服务器配置文件中的集群名,在/etc/metrika.xml 中配置的
database – 数据库名
table – 表名
sharding_key – 数据分片键

案例：
1）在 hadoop02，hadoop03，hadoop04 上分别创建一个表 t

:)create table t(id UInt16, name String) ENGINE=TinyLog;

2）分别向三台服务器的t表中插入一些数据

:)insert into t(id, name) values (1, 'zhangsan');
:)insert into t(id, name) values (2, 'lisi');

3）在 hadoop02 上创建分布式表

:)create table dis_table(id UInt16, name String) ENGINE=Distributed(perftest_3shards_1replicas, default, t, id);

4）查询分布式表数据

:) select * from dis_table;
┌─id─┬─name─────┐
│  1 │ zhangsan │
│  2 │ lisi     │
└────┴──────────┘
┌─id─┬─name─────┐
│  1 │ zhangsan │
│  2 │ lisi     │
└────┴──────────┘
┌─id─┬─name─────┐
│  1 │ zhangsan │
│  2 │ lisi     │
└────┴──────────┘

5）向分布式表中插入数据

:) insert into dis_table values(3,'zhaoliu'),(4,'lining');

6）再次查询分布式表数据

:) select * from dis_table;
┌─id─┬─name─────┐
│  1 │ zhangsan │
│  2 │ lisi     │
│  3 │ zhaoliu  │
└────┴──────────┘
┌─id─┬─name─────┐
│  1 │ zhangsan │
│  2 │ lisi     │
└────┴──────────┘
┌─id─┬─name─────┐
│  1 │ zhangsan │
│  2 │ lisi     │
│  4 │ lining   │
└────┴──────────┘

可以看见数据已经存入不同的表中

Data小成

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Clickhouse入门笔记

Clickhouse入门笔记1.基础操作启动相关服务[root@hadoop02 ck]# service clickhouse-server start启动客户端[root@hadoop02 ck]# clickhouse-client关闭相关服务[root@hadoop02 ck]# service clickhouse-server stop2.客户端常用参数例：(1)非交互式模式下的查询语句：-q[root@hadoop02 ck]# clickhouse-client
复制链接

扫一扫