clickhouse学习笔记

最新推荐文章于 2024-05-22 09:05:31 发布

学无止境的csdn

最新推荐文章于 2024-05-22 09:05:31 发布

阅读量863

点赞数

分类专栏： idea python Git 文章标签： big data 数据挖掘数据库

本文链接：https://blog.csdn.net/qq_36918815/article/details/121975591

版权

python 同时被 3 个专栏收录

2 篇文章 0 订阅

订阅专栏

Git

2 篇文章 0 订阅

订阅专栏

idea

1 篇文章 0 订阅

订阅专栏

官网地址点击进入

简介

Clickhouse是一款分布式实时分析型列式数据库服务。高性能，开箱即用，企业特性支持。应用于流量分析，广告营销分析，行为分析，人群划分，客户画像，敏捷 BI ，数据集市，网络监控，分布式服务和链路监控等业务场景。

优点：

1. 真正的面向列的 DBMS

2. 数据高效压缩

3. 磁盘存储的数据

4. 多核并行处理

5. 支持分布式处理

6. SQL 语法支持

7. 实时数据更新

8. 支持近似预估计算

缺点：

1. 不支持事务

2. 不支持 Update/Delete 操作

3. 不建议 join 操作，可以使用大宽表

1.1 列式存储

列式数据库更适合于OLAP 场景 ( OLAP(OnLine Analytical Processing) ，即联机分析处理。对于大多数查询而言，处理速度至少提高了 100 倍，下面详细解释了原因 ( 通过图片更有利于直观理解 ) ：

行式

列式

相比于行式存储，列式存储在分析场景下有着许多优良的特性。

1 ）分析场景中往往需要读大量行但是分析少数几个列。在行存模式下，数据按行连续存储，所有列的数据都存储在一个 block 中，不参与计算的列在

IO 时也要全部读出，读取操作被严重放大。而列存模式下，只需要读取参与计算的列即可，极大的减低了 IO cost ，加速了查询。

2 ）同一列中的数据属于同一类型，压缩效果显著。列存往往有着高达十倍甚至更高的压缩比，节省了大量的存储空间，降低了存储成本。

3 ）更高的压缩比意味着更小的 data size ，从磁盘中读取相应数据耗时更短。

4 ）自由的压缩算法选择。不同列的数据具有不同的数据类型，适用的压缩算法也就不尽相同。可以针对不同列类型，选择最合适的压缩算法。

5 ）高压缩比，意味着同等大小的内存能够存放更多数据，系统 cache 效果更好。

官方数据显示，通过使用列存，在某些分析场景下，能够获得 100 倍甚至更高的加速效应

1.2 数据有序存储

ClickHouse 支持在建表时，指定将数据按照某些列进行 sort by 排序后，保证了相同 sort key 的数据在磁盘上连续存储，且有序摆放。在进行等值、范围查询时， where 条件命中的数据都紧密存储在一个或若干个连续的 Block 中，而不是分散的存储在任意多个 Block ，大幅减少需要 IO 的 block 数量。另外，连续 IO 也能够充分利用操作系统 page cache 的预取能力，减少 page fault 。

1.3 主键索引

ClickHouse 支持主键索引，它将每列数据按照 index granularity （默认 8192 行）进行划分，每个 index granularity 的开头第一行被称为一个 mark 行。主键索引存储该 mark 行对应的 primary key 的值。对于 where 条件中含有 primary key 的查询，通过对主键索引进行二分查找，能够直接定位到对应的 index granularity ，避免了全表扫描从而加速查询。 ClickHouse 的主键索引与 MySQL 等数据库不同，它并不用于去重，即便 primary key 相同的行，也可以同时存在于数据库中。要想实现去

重效果，需要结合具体的表引擎 ReplacingMergeTree （去重）、 CollapsingMergeTree （删除（折叠））、 VersionedCollapsingMergeTree （版本删除（折叠））实现

1.4 稀疏索引

ClickHouse 支持对任意列创建任意数量的稀疏索引。其中被索引的 value 可以是任意的合法 SQL Expression ，并不仅仅局限于对column value 本身进行索引。之所以叫稀疏索引，是因为它本质上是对一个完整 index granularity （默认 8192 行）的统计信息，并不会具体记录每一行在文件中的位置。目前支持的稀疏索引类型包括：

minmax : 以 index granularity 为单位，存储指定表达式计算后的 min 、 max 值；在等值和范围查询中能够帮助快速跳过不满足要求的块，减少 IO 。

set(max_rows) ：以 index granularity 为单位，存储指定表达式的 distinct value 集合，用于快速判断等值查询是否命中该块，减少 IO 。

ngrambf_v1(n, size_of_bloom_fifilter_in_bytes, number_of_hash_functions, random_seed) ：将 string 进行 ngram 分词后，构建bloom fifilter ，能够优化等值、 like 、 in 等查询条件。

tokenbf_v1(size_of_bloom_fifilter_in_bytes, number_of_hash_functions, random_seed) ：与 ngrambf_v1 类似，区别是不使用 ngram进行分词，而是通过标点符号进行词语分割。

bloom_fifilter([false_positive]) ：对指定列构建 bloom fifilter ，用于加速等值、 like 、 in 等查询条件的执行。

1.5 数据 Sharding

ClickHouse 支持单机模式，也支持分布式集群模式。在分布式模式下， ClickHouse 会将数据分为多个分片，并且分布到不同节点上。

不同的分片策略在应对不同的 SQL Pattern 时，各有优势。 ClickHouse 提供了丰富的 sharding 策略，让业务可以根据实际需求选用。

1 ） random 随机分片：写入数据会被随机分发到分布式集群中的某个节点上。

2 ） constant 固定分片：写入数据会被分发到固定一个节点上。

3 ） column value 分片：按照某一列的值进行 hash 分片。

4 ）自定义表达式分片：指定任意合法表达式，根据表达式被计算后的值进行 hash 分片。

1.6 数据 Partitioning

ClickHouse 支持 PARTITION BY 子句，在建表时可以指定按照任意合法表达式进行数据分区操作，比如通过 toYYYYMM() 将数据按月进行分区、 toMonday() 将数据按照周几进行分区等。

数据 Partition 在 ClickHouse 中主要有两方面应用：

在 partition key 上进行分区裁剪，只查询必要的数据。灵活的 partition expression 设置，使得可以根据 SQL Pattern 进行分区设置，最大化的贴合业务特点。

对 partition 进行 TTL 管理，淘汰过期的分区数据。

1.7 数据 TTL

在分析场景中，数据的价值随着时间流逝而不断降低，多数业务出于成本考虑只会保留最近几个月的数据， ClickHouse 通过 TTL 提供了

数据生命周期管理的能力。

ClickHouse 支持几种不同粒度的 TTL ：

1 ）列级别 TTL ：当一列中的部分数据过期后，会被替换成默认值；当全列数据都过期后，会删除该列。

2 ）行级别 TTL ：当某一行过期后，会直接删除该行。

3 ）分区级别 TTL ：当分区过期后，会直接删除该分区。

1.8 高吞吐写入能力 官方公开 benchmark 测试显示能够达到 50MB-200MB/s 的写入吞吐能力，按照每行 100Byte 估算，大约相当于 50W-200W 条 /s 的写入

速度。

(2) 安装部署

默认目录

服务配置目录（ /etc/clickhouse-server ）

数据目录（ /var/lib/clickhouse/ ）

sudo yum install yum-utils

sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG

sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo

sudo yum install clickhouse-server clickhouse-client

sudo /etc/init.d/clickhouse-server start

clickhouse-client

启动服务 /etc/init.d/clickhouse-server start

链接到客户端 clickhouse-client

drop table if exists tb_orders;
create table tb_orders(
oid UUID ,
money Decimal(10,2) ,
ctime DateTime
)engine =MergeTree()
Order By oid
partition by ctime
SETTINGS index_granularity=3, index_granularity_bytes = 0;
insert into tb_orders values (generateUUIDv4() , 100.00 , now()) ;
insert into tb_orders values (generateUUIDv4() , 200.00 , now()) ;
insert into tb_orders values (generateUUIDv4() , 300.00 , now()) ;
insert into tb_orders values (generateUUIDv4() , 400.00 , now()) ;
optimize table tb_orders final ;
select * from tb_orders;
--设置index_granularity_bytes = 0取消自适应索引粒度，便于后续观察mrk文件结构。

table_name # 表名

├─ partition_{index} DIR # 分区目录

│ │ # 基础文件

│ ├─ checksums.txt BIN # 各类文件的尺寸以及尺寸的散列

│ ├─ columns.txt TXT # 列信息

│ ├─ count.txt TXT # 当前分区目录下数据总行数

│ ├─ primary.idx BIN # 稀疏索引文件

│ ├─ {column}.bin BIN # 经压缩的列数据文件，以字段名命名

│ ├─ {column}.mrk BIN # 列字段标记文件

│ ├─ {column}.mrk2 BIN # 使用自适应索引间隔的标记文件

│ │

│ │ # 分区键文件

│ ├─ partition.dat BIN # 当前分区表达式最终值

│ ├─ minmax_{column}.idx BIN # 当前分区字段对应原始数据的最值

│ │

│ │ # 跳数索引文件

│ ├─ skp_idx_{column}.idx BIN # 跳数索引文件

│ └─ skp_idx_{column}.mrk BIN # 跳数索引表及文件

│

└─ partition_{index} DIR # 分区目录

日志目录（ /var/log/clickhouse-server ）

(3) 数据类型

3.1 整数

Int8 — [-128 : 127]
Int16 — [-32768 : 32767]
Int32 — [-2147483648 : 2147483647]
Int64 — [-9223372036854775808 : 9223372036854775807]
Int128 — [-170141183460469231731687303715884105728 : 170141183460469231731687303715884105727]
Int256 — [-57896044618658097711785492504343953926634992332820282019728792003956564819968 :
57896044618658097711785492504343953926634992332820282019728792003956564819967]

3.2 无符号整数

UInt8 — [0 : 255]
UInt16 — [0 : 65535]
UInt32 — [0 : 4294967295]
UInt64 — [0 : 18446744073709551615]
UInt128 — [0 : 340282366920938463463374607431768211455]
UInt256 — [0 : 115792089237316195423570985008687907853269984665640564039457584007913129639935]

3.3 Float32, Float64 ,Decimal

Float32 — float .

Float64 — double .

create table tb_data_type_val
(
    id Int16 , 
    age UInt8 , 
    sal Float32 , 
    comm Float64 , 
    money Decimal(7,2) 
)engine = Memory ; 
insert into tb_data_type_val values(1,23,10000.23,200,99999.99) ,(2,22,20000,222,99999.99); insert into tb_data_type_val values(3,23,10000.23,200,999999.99);

3.4 Boolean Values

没有单独的类型来存储布尔值。可以使用 UInt8 类型，取值限制为 0 或 1

3.5 String

字符串可以任意长度的。它可以包含任意的字节集，包含空字节。

3.6 FixedString

固定长度 N 的字符串（ N 必须是严格的正自然数）

-- 字符串类型 
create table tb_data_type_str
( 
    id String , 
    gender FixedString(8) -- 有字符长度限制
 )engine = Memory ; 
insert into tb_data_type_str values('dghsahjdgsjkhgfjsdhgf','MMMMMMMMMMMMM') ;

3.7 Date,Date32,Datetime,Datetime64

Date 和 Date32 yyyy-MM-dd

Datetime yyyy-MM-dd HH:mm:ss

Datetime64 yyyy-MM-dd HH:mm:ss.sss

-- 日期类型 
create table tb_data_type_date
( 
cdate01 Date , 
cdate02 Date32 , 
cdate03 DateTime , 
cdate04 DateTime64 
)engine = Memory ; 
insert into tb_data_type_date('2021-01-01' , '2021-01-01' ,'2021-01-01 11:11:11' ,'2021-01-01 11:11:11') -- 数据都正确 秒 
insert into tb_data_type_date(1632628849 , 1632628849 ,1632628849 ,1632628849189) ; 
insert into tb_data_type_date('2021-01-01' , '2021-01-01' ,'2021-01-01 11:11:11' ,'2021-01-01 11:11:11')

3.8 UUID 类型

CREATE TABLE `t_uuid` 
(
`x` UUID, 
`y` String
) ENGINE= Memory ; 
INSERT INTO t_uuid SELECT generateUUIDv4(), 'Example 1' ; insert into t_uuid values(generateUUIDv4() , 'zss') ;

3.9 Enum 枚举类型

CREATE TABLE t_enum
( 
color Enum('RED' = 1, 
'BLUE' = 2 , 
'GREEN'=3) )ENGINE = Memory ; 
-- 插入数据 
insert into t_enum values('RED'),('RED'),('GREEN') ; -- 报错 枚举定义的时候没有这个值 限制值的所用 
insert into t_enum values('PINK'); -- 使用数字代替对应的数值 
insert into t_enum values(3),(2),(2) ; 
-- ┌─color─┐ │ GREEN │ │ BLUE │ │ BLUE │ └───────┘ -- 在底层存储的数据并不是 RED GREEN 底层存储的是 1 3

3.10 Array(T) 数组类型

数组数据类型 
定义方式 array(1,2,3,4,5) [1,2,3,45,56] 
SELECT['zss', 'lss', 'ww'] AS names,toTypeName(names); ┌─names──────────────┬─toTypeName(['zss', 'lss', 'ww'])─┐ │ ['zss','lss','ww'] │ Array(String) │ └────────────────────┴──────────────────────────────────┘ 
select array(1,2,3,4,5) as num , toTypeName(num); 
┌─num─────────┬─toTypeName(array(1, 2, 3, 4, 5))─┐ │ [1,2,3,4,5] │ Array(UInt8) │ └─────────────┴──────────────────────────────────┘ 
create table t_friend
(
 id UInt8 , 
name String , 
fs Array(String) comment '用户的朋友列表' 
)ENGINE = Memory ; 
insert into t_friend values(1,'lny',array('lyf','lzx','wyf','wdl')) ; ┌─id─┬─name─┬─fs────────────────────────┐ │ 1 │ lny │ ['lyf','lzx','wyf','wdl'] │ └────┴──────┴───────────────────────────┘ 
select id , fs.size0 from t_friend ; 
arr.size0 length(arr) 数组的长度 select id , empty(fs) , length(fs) , fs[1] ,fs[2] ,arrayElement(fs,3) from t_friend ; -- 数组中的高阶函数 
select id , name , arrayMap(e->upper(e) , fs) from t_friend ;

3.11 Nested 数据类型

CREATE TABLE t_enum(

color Enum ( 'RED' = 1 , 'BLUE' = 2 , 'GREEN' = 3 )

)ENGINE = Memory ;

插入数据

insert into t_enum values ( 'RED' ),( 'RED' ),( 'GREEN' ) ;

-- 报错枚举定义的时候没有这个值限制值的所用

insert into t_enum values ( 'PINK' );

-- 使用数字代替对应的数值

insert into t_enum values ( 3 ),( 2 ),( 2 ) ;

┌─color─┐

│ GREEN │

│ BLUE │

└───────┘

-- 在底层存储的数据并不是 RED GREEN 底层存储的是 1 3

数组数据类型

定义方式 array( 1 , 2 , 3 , 4 , 5 ) [ 1 , 2 , 3 , 45 , 56 ]

SELECT [ 'zss' , 'lss' , 'ww' ] AS names,toTypeName(names);

┌─names──────────────┬─toTypeName([ 'zss' , 'lss' , 'ww' ])─┐

│ [ 'zss' , 'lss' , 'ww' ] │ Array(String) │

└────────────────────┴──────────────────────────────────┘

select array( 1 , 2 , 3 , 4 , 5 ) as num , toTypeName(num);

┌─num─────────┬─toTypeName(array( 1 , 2 , 3 , 4 , 5 ))─┐

│ [ 1 , 2 , 3 , 4 , 5 ] │ Array(UInt8) │

└─────────────┴──────────────────────────────────┘

create table t_friend(

id UInt8 ,

name String ,

fs Array(String) comment ' 用户的朋友列表 '

)ENGINE = Memory ;

insert into t_friend values ( 1 , 'lny' ,array( 'lyf' , 'lzx' , 'wyf' , 'wdl' )) ;

┌─id─┬─name─┬─fs────────────────────────┐

│ 1 │ lny │ [ 'lyf' , 'lzx' , 'wyf' , 'wdl' ] │

└────┴──────┴───────────────────────────┘

select id , fs .size 0 from t_friend ;

arr .size 0

length(arr) 数组的长度

select id , empty(fs) , length(fs) , fs[ 1 ] ,fs[ 2 ] ,arrayElement(fs, 3 ) from t_friend ;

-- 数组中的高阶函数

select id , name , arrayMap(e->upper(e) , fs) from t_friend ;

Nested 数据类型

-- 可以对多个数组的元素的个数进行每行统一

CREATE TABLE t_nested(

id Int16 ,

name String ,

properties Nested(

pid UInt32,

k String ,

v String

)

) engine = Memory ;

┌─name───────────┬─type──────────┬

│ id │ Int16 │

│ name │ String │

│ properties .pid │ Array(UInt32) │

│ properties .k │ Array(String) │

│ properties .v │ Array(String) │

└────────────────┴───────────────┴

insert into t_nested values ( 1 , 'hadoop' ,[ 1 , 2 , 3 ],[ 'hostname' , 'Ip' , 'role' ],[ 'doit01' , '192.168.133.11' , 'NN' ]) ; 3.12 元组 Tuple

3.13 IP 类型

3.14 Map 数据类型

(4) 引擎

4.1 数据库引擎

4.1.1 Atomic

insert into t_nested values ( 2 , 'hbase' ,[ 1 , 2 ],[ 'hostname' , 'Ip' ],

[ 'doit01' , '192.168.133.11' ]) ;

SELECT tuple( 1 , 'a' ) AS x, toTypeName(x);

CREATE TABLE named_tuples (`a` Tuple(s String, i Int64)) ENGINE = Memory;

INSERT INTO named_tuples VALUES (( 'y' , 10 )), (( 'x' ,- 10 ));

SELECT a .s FROM named_tuples;

SELECT a. 2 FROM named_tuples;

create table tb_user(

uid String ,

info Tuple(name String, age UInt8 , gender String)

)ENGINE = Memory;

insert into tb_user values ( 1 , ( 'zss' , 23 , 'M' )),( 2 ,( 'lss' , 33 , 'F' )) ;

--create table tb_user(

--uid String ,

--name String,

--age UInt8 ,

--gender String

--)ENGINE = Memory;

IPv4 类型可以对 ip 地址格式的校验

create table t_ip(

url String ,

ip String

)engine = Memory ;

insert into t_ip values ( 'doitedu.cn' , '192.168.133.33' ) ;

insert into t_ip values ( '51doit.com' , '192.168.133' ) ;

insert into t_ip values ( '51doit.com' , '192.168.133.567' ) ;

create table t_ip2(

url String ,

ip IPv4

)engine = Memory ;

insert into t_ip2 values ( 'doitedu.cn' , '192.168.133.33' ) ;

insert into t_ip2 values ( '51doit.com' , '192.168.133.567' ) ;

Map 数据类型

create table t_map(

id Int8 ,

m Map(String,UInt8)

)engine = Memory ;

-- 插入数据

insert into t_map values ( 1 , map( 'zss' , 23 , 'lss' , 33 , 'ww' , 44 )) ;

insert into t_map values ( 2 , { 'zss' :23, 'lss' :33, 'ww' :44}) ,( 3 , { 'zss' :23, 'lss' :33, 'ww' :44}) ;

强制类型转换

SELECT cast( 12.23 + 3 , 'Int8' )

┌─CAST(plus( 12.23 , 3 ), 'Int8' )─┐

│ 15 │

└──────────────────────────────┘

SELECT cast( 12.23 + 3 as Int8 )

select ( 12.23 + 3 ) :: Int8 ; 它支持非阻塞的 DROP TABLE 和 RENAME TABLE 查询和原子的 EXCHANGE TABLES t1 AND t2 查询。默认情况下使用 Atomic 数据库

引擎。

4.1.2 MySQL

MySQL 引擎用于将远程的 MySQL 服务器中的表映射到 ClickHouse 中，并允许您对表进行 INSERT 和 SELECT 查询，以方便您在

ClickHouse 与 MySQL 之间进行数据交换

MySQL 数据库引擎会将对其的查询转换为 MySQL 语法并发送到 MySQL 服务器中，因此您可以执行诸如 SHOW TABLES 或 SHOW CREATE

TABLE 之类的操作。

但您无法对其执行以下操作：

RENAME

CREATE TABLE

ALTER

创建数据库

引擎参数

host:port — MySQL 服务地址

database — MySQL 数据库名称

user — MySQL 用户名

password — MySQL 用户密码

4.2 表引擎

表引擎 (table engine) 。如果对 MySQL 熟悉的话，或许你应该听说过 InnoDB 和 MyISAM 存储引擎。不同的存储引擎提供不同的存储机

制、索引方式、锁定水平等功能，也可以称之为表类型。 ClickHouse 提供了丰富的表引擎，这些不同的表引擎也代表着不同的表类型。比如

数据表拥有何种特性、数据以何种形式被存储以及如何被加载

作用

决定表存储在哪里以及以何种方式存储

支持哪些查询以及如何支持

并发数据访问

索引的使用

是否可以执行多线程请求

数据复制参数

4.2.1 MergeTree

MergeTree 系列的表引擎是 ClickHouse 数据存储功能的核心。它们提供了用于弹性和高性能数据检索的大多数功能：列存储，自定义分

区，稀疏的主索引，辅助数据跳过索引等。

基本 MergeTree 表引擎可以被认为是单节点 ClickHouse 实例的默认表引擎，因为它在各种用例中通用且实用。

根据 city 分区

CREATE DATABASE test[ ENGINE = Atomic];

CREATE DATABASE [IF NOT EXISTS] db_name [ ON CLUSTER cluster]

ENGINE = MySQL( 'host:port' , [ 'database' | database], 'user' , 'password' )

create database db_mysql

ENGINE = MySQL( '192.168.60.116:3306' , 'ngsoc' , 'ngsoc,' ngSOC1234 ');

MergeTree 引擎表

--1) 表必须指定主键

--2) order by uid 不指定主键的情况下 , 排序字段就是主键字段

-- 根据 city 分区

create table tb_merge(

uid UInt8 , 根据月份分区

4.2.2 ReplacingMergeTree

该引擎和 MergeTree 的不同之处在于它会删除 排序键值相同 的重复项 ( 同一分区内 ) 。

name String ,

age UInt8 ,

city String

)engine = MergeTree

order by uid

partition by city ;

insert into tb_merge values

( 4 , 'zss' , 23 , 'BJ' ) ,

( 2 , 'lss' , 33 , 'BJ' ) ,

( 3 , 'ww' , 13 , 'SH' ) ;

insert into tb_merge values

( 1 , 'lny' , 33 , 'BJ' ) ,

( 5 , 'lyy' , 43 , 'BJ' ) ,

( 6 , 'yyl' , 53 , 'SH' ) ;

insert into tb_merge values

( 8 , 'taoge' , 44 , 'GZ' ) ,

( 9 , 'xingge' , 43 , 'GZ' ) ,

( 7 , 'najie' , 33 , 'BJ' ) ;

select * from tb_merge ;

-- 分区数据整合

optimize table tb_merge final ;

-- 没有分区

create table tb_merge2(

uid UInt8 ,

name String ,

age UInt8 ,

city String

)engine = MergeTree

order by uid ;

insert into tb_merge2 values

( 4 , 'zss' , 23 , 'BJ' ) ,

( 2 , 'lss' , 33 , 'BJ' ) ,

( 3 , 'ww' , 13 , 'SH' ) ;

insert into tb_merge2 values

( 1 , 'lny' , 33 , 'BJ' ) ,

( 5 , 'lyy' , 43 , 'BJ' ) ,

( 6 , 'yyl' , 53 , 'SH' ) ;

insert into tb_merge2 values

( 8 , 'taoge' , 44 , 'GZ' ) ,

( 9 , 'xingge' , 43 , 'GZ' ) ,

( 7 , 'najie' , 33 , 'BJ' ) ;

insert into tb_merge2 values

( 8 , 'taoge' , 44 , 'GZ' ) ,

( 9 , 'xingge' , 43 , 'GZ' ) ,

( 7 , 'najie' , 33 , 'BJ' ),

( 7 , 'najie' , 33 , 'BJ' );

select toYYYYMM(now()) ; -- 202109

-- 创建一个分区表月份

create table tb_orders(

oid UUID ,

money Decimal ( 10 , 2 ) ,

ctime DateTime

)engine =MergeTree()

primary key oid

order by (oid , ctime)

partition by toYYYYMM(ctime) ;

insert into tb_orders values (generateUUIDv4() , 100.00 , now()) ;

insert into tb_orders values (generateUUIDv4() , 200.00 , now()) ;

insert into tb_orders values (generateUUIDv4() , 300.00 , now()) ;

insert into tb_orders values (generateUUIDv4() , 400.00 , '2021-01-21 11:11:11' ) ;

insert into tb_orders values (generateUUIDv4() , 500.00 , '2021-01-20 11:11:11' ) ;

insert into tb_orders values (generateUUIDv4() , 400.00 , now()) ;

insert into tb_orders values (generateUUIDv4() , 500.00 , now()) ;

select * from tb_orders ;

optimize table tb_orders final ; 数据的去重只会在数据合并期间进行。合并会在后台一个不确定的时间进行，因此你无法预先作出计划。有一些数据可能仍未被处理。

尽管你可以调用 OPTIMIZE 语句发起计划外的合并，但请不要依靠它，因为 OPTIMIZE 语句会引发对数据的大量读写。因此，

ReplacingMergeTree 适用于在后台清除重复的数据以节省空间，但是它不保证没有重复的数据出现。

ver — 版本列。类型为 UInt* , Date 或 DateTime 。可选参数。

在数据合并的时候， ReplacingMergeTree 从所有具有相同排序键的行中选择一行留下：

如果 ver 列未指定，保留最后一条。

如果 ver 列已指定，保留 ver 值最大的版本。

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ ON CLUSTER cluster]

(

name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],

name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],

...

) ENGINE = ReplacingMergeTree([ver])

[PARTITION BY expr]

[ ORDER BY expr]

[SAMPLE BY expr]

[SETTINGS name=value, ...]

-- ReplacingMergeTree 去重区内排序相同的数据去重

drop table if exists test_replacingMergeTree ;

create table test_replacingMergeTree(

oid Int8 ,

ctime DateTime ,

cost Decimal ( 10 , 2 )

)engine = ReplacingMergeTree()

order by oid

partition by toDate(ctime) ;

-- 天分区同一天的 oid 相同的数据会被去重

-- 插入数据

insert into test_replacingMergeTree values ( 3 , '2021-01-01 11:11:11' , 30 ) ;

insert into test_replacingMergeTree values ( 1 , '2021-01-01 11:11:14' , 40 ) ;

insert into test_replacingMergeTree values ( 1 , '2021-01-01 11:11:11' , 10 );

insert into test_replacingMergeTree values ( 2 , '2021-01-01 11:11:11' , 20 ) ;

insert into test_replacingMergeTree values ( 3 , '2021-01-02 11:11:11' , 41 ) ;

insert into test_replacingMergeTree values ( 4 , '2021-01-02 11:11:11' , 10 );

-- 手动合并

optimize table test_replacingMergeTree final ;

-- 相同分区内排序字段相同的数据去重

-- 保留下来的数据我们不确定可能不是最新的数据

/*┌─oid─┬───────────────ctime─┬─cost─┐

│ 3 │ 2021-01-02 11:11:11 │ 41 │

│ 4 │ 2021-01-02 11:11:11 │ 10 │

└─────┴─────────────────────┴──────┘

┌─oid─┬───────────────ctime─┬─cost─┐

│ 1 │ 2021-01-01 11:11:11 │ 10 │***

│ 2 │ 2021-01-01 11:11:11 │ 20 │

│ 3 │ 2021-01-01 11:11:11 │ 30 │

└─────┴─────────────────────┴──────┘*/

-- 由于 ReplacingMergeTree(Version) 保留分区内主键相同的版本比较大的数据

-- Version 可以是时间格式的数据也可以是 UInt8

-- 版本是时间

drop table if exists test_replacingMergeTree3 ;

create table test_replacingMergeTree3(

oid Int8 ,

ctime DateTime ,

cost Decimal ( 10 , 2 )

)engine = ReplacingMergeTree(ctime)

order by oid

partition by toDate(ctime) ;

insert into test_replacingMergeTree3 values ( 3 , '2021-01-01 11:11:11' , 30 ) ;

insert into test_replacingMergeTree3 values ( 1 , '2021-01-01 11:11:14' , 40 ) ;

insert into test_replacingMergeTree3 values ( 1 , '2021-01-01 11:11:11' , 10 );

insert into test_replacingMergeTree3 values ( 2 , '2021-01-01 11:11:11' , 20 ) ;

insert into test_replacingMergeTree3 values ( 3 , '2021-01-02 11:11:11' , 41 ) ;

insert into test_replacingMergeTree3 values ( 4 , '2021-01-02 11:11:11' , 10 );

optimize table test_replacingMergeTree3 final ;

select * from test_replacingMergeTree3;

-- 版本是数字

drop table if exists test_replacingMergeTree4 ;

create table test_replacingMergeTree4(

oid Int8 ,

ctime DateTime ,

cost Decimal ( 10 , 2 ) ,

version UInt8

)engine = ReplacingMergeTree(version)

order by oid 4.2.3 CollapsingMergeTree

该引擎继承于 MergeTree ，并在数据块合并算法中添加了折叠行的逻辑。

CollapsingMergeTree 会异步的删除（折叠）这些除了特定列 Sign 有 1 和 - 1 的值以外，其余所有字段的值都相等的成对的行。没有

成对的行会被保留。

因此，该引擎可以显著的降低存储量并提高 SELECT 查询效率。

sign — 类型列的名称： 1 是 « 状态 » 行， - 1 是 « 取消 » 行。

列数据类型 — Int8 。

partition by toDate(ctime) ;

insert into test_replacingMergeTree4 values ( 1 , '2021-01-01 11:11:14' , 40 , 3 ) ;

insert into test_replacingMergeTree4 values ( 1 , '2021-01-01 11:11:11' , 10 , 2 );

insert into test_replacingMergeTree4 values ( 1 , '2021-01-01 11:11:11' , 10 , 5 );

insert into test_replacingMergeTree4 values ( 1 , '2021-01-01 11:11:11' , 10 , 1 );

optimize table test_replacingMergeTree4 final ;

select * from test_replacingMergeTree4 ;

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ ON CLUSTER cluster]

(

name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],

name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],

...

) ENGINE = CollapsingMergeTree(sign)

[PARTITION BY expr]

[ ORDER BY expr]

[SAMPLE BY expr]

[SETTINGS name=value, ...]

/**

在数据进行合并以后实现了数据的更新

drop table if exists tb_cps_merge_tree1 ;

CREATE TABLE tb_cps_merge_tree1

(

user_id UInt64,

name String,

age UInt8,

sign Int8

)

ENGINE = CollapsingMergeTree(sign)

ORDER BY user_id;

-- 插入数据

insert into tb_cps_merge_tree1 values ( 1 , 'xiaoluo' , 23 , 1 ),( 2 , 'xiaoyu' , 24 , 1 ),( 3 , 'xiaofeng' , 25 , 1 ) ;

insert into tb_cps_merge_tree1 values ( 1 , 'xiaoluo_' , 23 ,- 1 ),( 2 , 'xiaoyu_' , 24 ,- 1 ),( 3 , 'xiaofeng2' , 25 , 1 ) ;

select * from tb_cps_merge_tree1 ;

/*┌─user_id─┬─name─────┬─age─┬─sign─┐

│ 1 │ xiaoluo │ 23 │ 1 │

│ 2 │ xiaoyu │ 24 │ 1 │

│ 3 │ xiaofeng │ 25 │ 1 │

└─────────┴──────────┴─────┴──────┘

┌─user_id─┬─name──────┬─age─┬─sign─┐

│ 1 │ xiaoluo_ │ 23 │ -1 │

│ 2 │ xiaoyu_ │ 24 │ -1 │

│ 3 │ xiaofeng2 │ 25 │ 1 │

└─────────┴───────────┴─────┴──────┘

│ 3 │ xiaofeng2 │ 25 │ 1 │

-- 分区内排序相同状态为 1 -1 删除

-- 分区内排序相同状态为 1 1 保留后面一条数据

-- 分区内排序相同状态为 -1 1 无法折叠

-- 分区内排序相同状态为 -1 -1 去重保留第一个数据

insert into tb_cps_merge_tree1 values ( 4 , 'lny' , 49 ,- 1 ) ;

insert into tb_cps_merge_tree1 values ( 4 , 'lny2' , 49 , 1 ) ;

optimize table tb_cps_merge_tree1 final ;

select * from tb_cps_merge_tree1 ;

insert into tb_cps_merge_tree1 values ( 5 , 'lnn' , 79 ,- 1 ) ;

insert into tb_cps_merge_tree1 values ( 5 , 'lnn2' , 89 ,- 1 ) ;

optimize table tb_cps_merge_tree1 final ;

select * from tb_cps_merge_tree1 ;

/**

可以实现数据的更新和数据的删除

*/ 4.2.4 VersionedCollapsingMergeTree

引擎继承自 MergeTree 并将折叠行的逻辑添加到合并数据部分的算法中。 VersionedCollapsingMergeTree 用于相同的目的折叠树

但使用不同的折叠算法，允许以多个线程的任何顺序插入数据。特别是， Version 列有助于正确折叠行，即使它们以错误的顺序插入。

相比之下 , CollapsingMergeTree 只允许严格连续插入。

处理 CollapsingMergeTree 多线程插入执行下， sign -1 出现在 1 之前，导致数据无法删除的问题 . 通过增加 version 来合并 sign(1 -1) 的数

据

sign — 指定行类型的列名 : 1 是一个 “state” 行 , - 1 是一个 “cancel” 行

列数据类型应为 Int8 .

version — 指定对象状态版本的列名。

列数据类型应为 UInt* .

4.2.5 SummingMergeTree

该引擎继承自 MergeTree 。区别在于，当合并 SummingMergeTree 表的数据片段时， ClickHouse 会把所有具有相同主键的行合并为一

行，该行包含了被合并的行中具有数值数据类型的列的汇总值。如果主键的组合方式使得单个键值对应于大量的行，则可以显著的减少存储

空间并加快数据查询的速度。

我们推荐将该引擎和 MergeTree 一起使用。例如，在准备做报告的时候，将完整的数据存储在 MergeTree 表中，并且使用

SummingMergeTree 来存储聚合数据。这种方法可以使你避免因为使用不正确的主键组合方式而丢失有价值的数据。

特点：指定字段进行聚合操作

数据多线程执行出现 - 1 在前 1 在后数据无法删除的情况

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ ON CLUSTER cluster]

(

name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],

name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],

...

) ENGINE = VersionedCollapsingMergeTree(sign, version)

[PARTITION BY expr]

[ ORDER BY expr]

[SAMPLE BY expr]

[SETTINGS name=value, ...]

VersionedCollapsingMergeTree(sign, version)

drop table if exists tb_cps_merge_tree_version1 ;

CREATE TABLE tb_cps_merge_tree_version1

(

user_id UInt64,

name String,

age UInt8,

sign Int8 ,

version UInt8

)

ENGINE = VersionedCollapsingMergeTree(sign,version)

ORDER BY user_id;

-- 插入数据

-- version 相同 ,sign 相反的数据折叠删除

-- version 相同 ,sign 相同的数据都保留

insert into tb_cps_merge_tree_version1 values ( 1 , 'sh' , 23 ,- 1 , 1 ),( 2 , 'bj' , 24 ,- 1 , 1 ),( 3 , 'js' , 25 , 1 , 1 ) ;

insert into tb_cps_merge_tree_version1 values ( 1 , 'sh' , 23 , 1 , 1 ),( 2 , 'bj' , 24 , 1 , 1 ),( 3 , 'js' , 25 , 1 , 1 ) ;

insert into tb_cps_merge_tree_version1 values ( 4 , 'gz' , 26 ,- 1 , 2 ) ;

insert into tb_cps_merge_tree_version1 values ( 4 , 'gz' , 25 ,- 1 , 2 ) ;

insert into tb_cps_merge_tree_version1 values ( 4 , 'gz' , 26 , 1 , 2 ) ;

optimize table tb_cps_merge_tree_version1 final ;

select * from tb_cps_merge_tree_version1 ;

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ ON CLUSTER cluster]

(

name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],

name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],

...

) ENGINE = SummingMergeTree([columns])

[PARTITION BY expr]

[ ORDER BY expr]

[SAMPLE BY expr]

[SETTINGS name=value, ...] 4.2.6 AggregatingMergeTree

该引擎继承自 MergeTree ，并改变了数据片段的合并逻辑。 ClickHouse 会将一个数据片段内所有具有相同主键（准确的说是排序键）的行

替换成一行，这一行会存储一系列聚合函数的状态。

可以使用 AggregatingMergeTree 表来做增量数据的聚合统计，包括物化视图的数据聚合。

引擎使用以下类型来处理所有列：

AggregateFunction

SimpleAggregateFunction

AggregatingMergeTree 适用于能够按照一定的规则缩减行数的情况。

明细表 detail_table

聚合表 agg_table

drop table if exists tb_summing_mergr_tree_goods;

create table tb_summing_mergr_tree_goods(

id UInt64,

price UInt64,

version UInt64

)ENGINE =SummingMergeTree(price)

ORDER BY id

partition by version;

-- 同一分区，相同的 id ，对 price 进行 sum 操作

insert into tb_summing_mergr_tree_goods values ( 1 , 12 , 1 ),( 2 , 11 , 1 ),( 3 , 11 , 1 );

insert into tb_summing_mergr_tree_goods values ( 1 , 1 , 1 ),( 2 , 71 , 1 );

insert into tb_summing_mergr_tree_goods values ( 1 , 17 , 2 );

optimize table tb_summing_mergr_tree_goods final ;

select * from tb_summing_mergr_tree_goods ;

select id,price,version from tb_summing_mergr_tree_goods order by id;

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ ON CLUSTER cluster]

(

name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],

name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],

...

) ENGINE = AggregatingMergeTree()

[PARTITION BY expr]

[ ORDER BY expr]

[SAMPLE BY expr]

[TTL expr]

[SETTINGS name=value, ...]

-- 1) 建立明细表

CREATE TABLE detail_table

(id UInt8,

ctime Date ,

money UInt64

) ENGINE = MergeTree()

PARTITION BY toDate(ctime)

ORDER BY id;

-- 2) 插入明细数据

INSERT INTO detail_table VALUES ( 1 , '2021-08-06' , 100 );

INSERT INTO detail_table VALUES ( 2 , '2021-08-07' , 200 );

-- 3) 建立预先聚合表，

-- 注意：其中 allMoney 一列的类型为： AggregateFunction(sum, UInt64)

CREATE TABLE agg_table

(id UInt8,

ctime Date ,

allMoney AggregateFunction(sum, UInt64)

) ENGINE = AggregatingMergeTree()

PARTITION BY toDate(ctime)

ORDER BY id;

-- 4) 从明细表中读取数据，插入聚合表。

-- 注意：子查询中使用的聚合函数为 sumState ，对应于写入语法 <agg>-State

INSERT INTO agg_table

select id, ctime, sumState(money)

from detail_table

group by id, ctime ;

-- 不能使用普通 insert 语句向 AggregatingMergeTree 中插入数据。 4.2.7 外部引擎 Mysql

MySQL 引擎可以对存储在远程 MySQL 服务器上的数据执行 SELECT 查询。

host:port — MySQL 服务器地址。

database — 数据库的名称。

table — 表名称。

user — 数据库用户。

password — 用户密码。

replace_query — 将 INSERT INTO 查询是否替换为 REPLACE INTO 的标志。如果 replace_query=1 ，则替换查询

'on_duplicate_clause' — 将 ON DUPLICATE KEY UPDATE 'on_duplicate_clause' 表达式添加到 INSERT 查询语句中。例

如： impression = VALUES(impression) + impression 。如果需要指定 'on_duplicate_clause' ，则需要设置

replace_query=0 。如果同时设置 replace_query = 1 和 'on_duplicate_clause' ，则会抛出异常。

此时，简单的 WHERE 子句（例如 =, !=, >, >=, <, <= ）是在 MySQL 服务器上执行。

其余条件以及 LIMIT 采样约束语句仅在对 MySQL 的查询完成后才在 ClickHouse 中执行。

4.3 视图

4.3.1 普通视图 View

它不存储数据，仅存储指定的 SELECT 查询。从表中读取时，它会运行此查询（并从查询中删除所有不必要的列）。

-- 本 SQL 会报错： Cannot convert UInt64 to AggregateFunction(sum, UInt64)

INSERT INTO agg_table VALUES ( 1 , '2020-08-06' , 1 );

-- 5) 从聚合表中查询。

-- 注意： select 中使用的聚合函数为 uniqMerge ，对应于查询语法 <agg>-Merge

SELECT

id, ctime ,

sumMerge(allMoney) AS state

FROM agg_table

GROUP BY id, ctime;

---- 总结

select

id, ctime,

sum(money) -- 计算

from detail_table

group by id, ctime ;

-- 高效

SELECT

id, ctime ,

sumMerge(allMoney) AS all -- 取值

FROM agg_table

GROUP BY id, ctime;

MySQL( 'host:port' , 'database' , 'table' , 'user' , 'password' [, replace_query, 'on_duplicate_clause' ]);

--Mysql 引擎

drop table if exists tb_mysql_user;

create table tb_mysql_user(

id Int32 ,

name String,

age UInt8

)engine =MySQL( '192.168.60.116:3306' , 'cchDemo' , 't_user' , 'ngsoc' , 'ngSOC1234' ) ;

-- 直接读取 mysql 中的数据

select * from tb_mysql_user;

insert into tb_mysql_user values ( 2 , 'zjl' , 45 );

--- 表函数 from

-- mysql

select * from mysql( '192.168.60.116:3306' , 'cchDemo' , 't_user' , 'ngsoc' , 'ngSOC1234' ) ;

-- 普通视图（不会存储表数据）

CREATE VIEW tb_user_view as select * from tb_mysql_user;

show tables;

select * from tb_user_view; 4.3.2 物化视图 MaterializedView

它需要使用一个不同的引擎来存储数据，这个引擎要在创建物化视图时指定。当从表中读取时，它就会使用该引擎。

一般和 AggregatingMergeTree 引擎配合使用

(5) 基础语法

5.1 DDL 基础

5.1.1 创建表

5.1.2 修改表结构 ( 主键排序分区列不可以修改 )

5.1.3 移动表

-- 物化视图

drop table if exists m_person_view;

Create materialized view m_person_view

engine =AggregatingMergeTree()

PARTITION BY toDate(ctime)

ORDER BY id

populate

select

id,

ctime,

sumState(money) as sm -- 注意别名

from detail_table group by id,ctime ;

desc m_person_view;

select * from detail_table;

select id, ctime,sumMerge(sm) from m_person_view group by id,ctime;

-- 新增数据

INSERT INTO detail_table VALUES ( 2 , '2021-08-07' , 100 );

select

id, ctime,

sum(money) -- 计算

from detail_table

group by id, ctime ;

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ ON CLUSTER cluster]

(

name1 [type1] [ NULL |NOT NULL ] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1],

name2 [type2] [ NULL |NOT NULL ] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2],

...

) ENGINE = engine

-- 查看表结构

desc tb_user;

-- 添加字段

alter table tb_user add column age UInt8;

-- 删除字段

alter table tb_user drop column age ;

-- 修改字段的数据类型

alter table tb_user modify column age UInt64 default 18 ;

-- 增加字段备注

alter table tb_user comment colum age ' 年龄 ' ; 5.2 DML 基础

5.2.1 插入数据

5.2.2 删除数据（可以通过 CollapsingMergeTree 实现）

不建议使用删除分区数据只针对与 MergeTree

5.3 分区操作

(6) 函数

6.1 日期类型函数

-- 修改表名

rename table tb_book to tb_book1;

-- 修改多张表

rename table tb_book to tb_book1, tb_user to tb_user1;

-- 移动表到另一个数据库

rename table tb_book1 to cch_demo .tb_book ;

-- 查看当前数据库所有表

show tables from cch_demo;

（一） sql 语句

insert into tb_user values ( 11 , 'cch' , 18 );

（二）命令行导入

cat user .txt | clickhouse-client --host 172.17.0.2 -q 'insert into default.tb_user format CSV'

clickhouse-client --host 172.17.0.2 -q 'insert into default.tb_user format CSV'< user.txt

-- 指定分隔符

clickhouse-client --format_csv_delimiter="-" --host 172.17.0.2 -q 'insert into default.tb_user format CSV'<

user.txt

-- 条件删除数据

alter table t_user delete where id= 3 ;

-- 条件更新数据

alter table t_user update name= 'zde' where id = 1 ;

-- 查询表的分区

select name, table ,partition from system .parts ;

-- 删除分区数据

alter table t_user drop partition '2021-11' ;

-- 复制分区数据 ( 将 t_user1 里面分区： '2021-11' 复制到 t_user2)

alter table t_user2 replace partition '2021-11' from t_user1;

-- 卸载分区

alter table t_user detach partition '2021-11'

-- 装载分区

alter table t_user attach partition '2021-11' 函数

用途

举例

结果

toYear()

取日期或时间日期的年份

toYear(toDateTime(‘2021-12-11 11:12:13’))

返回 2021

toMonth()

取日期或时间日期的月份

toMonth(toDateTime(‘2021-12-11 11:12:13’))

返回 12

toDayOfMonth()

取日期或时间日期的天（ 1-31 ）

toMonth(toDayOfMonth(‘2021-12-11 11:12:13’))

返回 11

toDayOfWeek()

取日期或时间日期的星期（星期一

为 1 ，星期日为 7 ）。

toDayOfWeek(toDateTime(‘2021-12-11 11:12:13’))

返回 6

toHour()

取时间日期的小时

toHour(toDateTime(‘2021-12-11 11:12:13’))

返回 11

toMinute()

取时间日期的分钟

toMinute(toDateTime(‘2021-12-11 11:12:13’))

返回 12

toSecond()

取时间日期的秒

toSecond(toDateTime(‘2021-12-11 11:12:13’))

返回 13

toMonday()

取时间日期最近的周一（返回日

期）

toMonday(toDate(‘2021-12-11’))

toMonday(toDateTime(‘2021-12-11 11:12:13’))

返回 2021-12-

toTime()

将时间日期的日期固定到某一天，

保留原始时间

toTime(toDateTime(‘2021-12-11 11:12:13’))

返回 1970-01-

02 11:12:13

函数

用途

举例

结果

toStartOfMonth()

取日期或时间日期的月份的

第一天，返回日期

toStartOfMonth(toDateTime(‘2018-12-11

11:12:13’))toStartOfMonth(toDate(‘2018-12-

11’))

返回 2018-12-01

toStartOfQuarter()

取日期或时间日期的季度的

第一天，返回日期

toStartOfQuarter(toDateTime(‘2018-12-11

11:12:13’))toStartOfQuarter(toDate(‘2018-

12-11’))

返回 2018-10-01

toStartOfYear()

取日期或时间日期的年份的

第一天，返回日期

toStartOfYear(toDateTime(‘2018-12-11

11:12:13’))toStartOfYear(toDate(‘2018-12-

11’))

返回 2018-01-01

toStartOfMinute()

截取时间日期到分钟（之后

归零），返回日期

toStartOfMinute(toDateTime(‘2018-12-11

11:12:13’))

返回 2018-12-11

11:12:00

toStartOfFiveMinute()

截取时间日期到最近的 5 的倍

数分钟（之后归零），返回

日期

toStartOfFiveMinute(toDateTime(‘2018-12-

11 11:12:13’))

返回 2018-12-11

11:10:00

toStartOfFifteenMinutes()

截取时间日期到最近的 15 的

倍数分钟（之后归零），返

回日期

toStartOfFifteenMinutes(toDateTime(‘2018-

12-11 11:12:13’))

返回 2018-12-11

11:00:00

toStartOfHour()

截取时间日期到小时（之后

归零），返回日期

toStartOfHour(toDateTime(‘2018-12-11

11:12:13’))

返回 2018-12-11

11:00:00

toStartOfDay()

截取时间日期到天（之后归

零），返回日期

toStartOfDay(toDateTime(‘2018-12-11

11:12:13’))

返回 2018-12-11

00:00:00

timeSlot()

将时间日期中，分钟大于等

于 30 的归于 30 ，分钟数小于

30 的归为 00

timeSlot(toDateTime(‘2018-12-11

11:33:13’))timeSlot(toDateTime(‘2018-12-11

11:33:13’))

返回 2018-12-11

11:00:00 返回

2018-12-11

11:30:00

函数

用途

举例

结果

now()

生成当前时间日期

now()

返回 2018-12-13 10:10:12

today()

生成今天的日期

today()

返回 2018-12-13

yesterday()

生成昨天的日期

yesterday()

返回 2018-12-12

函数

用途

举例

结果

toDecimal32(‘whdwjfew’,8)

将数值型或者含有非数字的字符

串进行精度保留

toDecimal32(23.12291,

3)toDecimal32(’_23.12291’, 3)

返回 23.122 返

回 0.000

toDecimal64(‘whdwjfew’,8)

将数值型或者含有非数字的字符

串进行精度保留

toDecimal64(23.12291,

3)toDecimal64(’_23.12291’, 3)

返回 23.122 返

回 0.000

toDecimal128(‘whdwjfew’,8)

将数值型或者含有非数字的字符

串进行精度保留

toDecimal128(23.12291,

3)toDecimal128(’_23.12291’, 3)

返回 23.122 返

回 0.000

6.2 日期或时间生成函数

6.3 类型转化类函数 函数

用途

举例

结果

toUInt8OrZero()

将无符号整数字符型转化为整数型，否

则返回 0

toUInt8OrZero(‘123’)toUInt8OrZero(‘123.12’)

123 返回

toInt8OrZero()

将整数字符型转化为整数型，否则返回 0

toInt8OrZero(‘123’)toInt8OrZero(’-123’)

123 返回

-123

toFloat32OrZero()

将数值字符串型转化为数值型，注意：

从 toFloat32OrZero 开始，丢 32 的没有

对应的函数

toFloat32OrZero(‘-123’)toFloat32OrZero(‘123.123’)

-123 返

回

123.123

函数

用途

举例

结果

toDate()

将字符型日期转化为日期型

toDate(‘2018-12-24’)

返回 2018-12-24

toDateTime()

将字符型时间日期转化为时间日期型

toDateTime(‘2018-12-24 10:10:00’)

返回 2018-12-24 10:10:00

函数

用途

举例

结果

toString()

将数值型、字符型、日期等转化为字符型

toString(‘2018-12-24’)toString(‘123’)

返回 2018-12-24 返回 123

函数

用途

举例

结果

toTypeName()

返回数据的类型

toTypeName(toString(‘123’))toTypeName(toDate(‘2018-12-24’))

返回 String 返回 Date

函数

用途

举例

结果

empty()

判断字符串是空为 1 ，否则为 0

empty(’’)empty(‘123a’)

返回 1 返

回 0

notEmpty()

判断字符串是非空为 1 ，否则为 0

notEmpty(’’)notEmpty(‘123a’)

返回 0 返

回 1

length()

返回字符串的长度

length(’’)length(‘123a’)

返回 0 返

回 4

lower()

将字符串转为小写

lower(‘aBc’)

返回 abc

upper()

将字符串转为大写

upper(‘aBc’)

ABC

reverse()

将字符串反转

reverse(‘abc’)

返回 cba

substring(s, offffset, length)

字符串截取

substring(‘123abcABC’, 2, 3)

返回 23a

appendTrailingCharIfAbsent(s,

如果字符串 s 非空，则将 s 后追加一个

字符 c(s 最后一个字符与 c 不同 ) ，否则

不处理

appendTrailingCharIfAbsent(‘123abc’,

‘b’)appendTrailingCharIfAbsent(‘123abc’,

‘c’)

123abcb

123abc

函数

用途

举例

结果

match(haystack,pattern)

字符串正则匹配，返

回 0 或 1

match(‘avhsca’,'vh’)

返回 1

extract(haystack,pattern)

返回匹配到的第一个

子串

extract(‘iioomAj12123124OOBJB’, ‘\d+’)

返回 12123124

extractAll(haystack,pattern)

返回匹配到的所有子

串，输出列表

extractAll(‘iioomAj12123124OOBJ123B’, ‘\d+’)

[12123124,123]

like(haystack,pattern)

匹配到的数据返回 1 ，

否则返回 0

like(‘avhsca’,’%vh%’)like(‘avhsca’,’%vabjh%’)

返回 1 返回 0

notLike(haystack, pattern)

与 like() 函数相反

notLike(‘avhsca’,’%vh%’)notLike(‘avhsca’,’%vabjh%’)

返回 0 返回 1

6.4 字符串操作 函数

用途

举例

结果

replaceOne(haystack,pattern,replacement)

替换第一个匹

配到的

pattern

replaceOne(‘asd123cbbj464sd’,

‘sd’, ‘-’)

返回 a-

123cbbj464sd

replaceAll(haystack,pattern,replacement)

替换所有匹配

到的 pattern

replaceOne(‘asd123cbbj464sd’,

‘sd’, ‘-’)

返回 a-

123cbbj464-

replaceRegexpOne(haystack, pattern,

replacement)

正则匹配替换

第一个匹配到

的 pattern

replaceRegexpOne(‘Hello, World!’,

‘o’, '- ')

返回 Hell- ,

World!

replaceRegexpAll(haystack,pattern,replacement)

正则匹配替换

所有匹配到的

pattern

replaceRegexpAll(‘Hello, World!’,

‘^’, 'here: ')replaceRegexpAll(‘Hello,

World!’, ‘o’, '-- ')

返回 here: Hello,

World! 返回 Hell--

, W-- rld!

函数

用途

举例

结果

splitByChar(separator,

以单个字符分

割字符串

splitByChar(’-’, ‘qw-asaf-asfqw-2312-asd’)

[‘qw’,‘asaf’,‘asfqw’,‘2312’,‘asd’]

splitByString(separator,

以单个或多个

字符分割字符

串

splitByString(’-’, ‘qw-asaf-asfqw-2312-

asd’)splitByString(’-a’, ‘qw-asaf-asfqw-2312-

asd’)

[‘qw’,‘asaf’,‘asfqw’,‘2312’,‘asd’] 返

回 [‘qw’,‘saf’,‘sfqw-2312’,‘sd’]

函数

用途

举例

结果

concat(s1,s2,…)

将字符串拼接

concat(‘123’, ‘abc’, ‘ABC’)

返回 123abcABC

函数

用途

举例

结果

if(cond,then,else)

条件输出

if(1 > 2, ‘ 正确 ’, ‘ 错误 ’)

返回错误

multiIf(cond_1, then_1, cond_2, then_2…else)

多条件输出

multiIf(1 > 2, ‘ 正确 ’, 2 < 0, ‘ 正确 ’, ‘ 错误 ’)

返回错误

函数

用途

举例

结果

e()

返回 e 的值

e()

返回 2.718281828459045

pi()

返回 pi 的值

pi()

返回 3.141592653589793

exp(x)

返回 e 的 x 次方

exp(1)

返回 2.718281828459045

exp2(x)

返回 2 的 x 次方

exp2(2)

返回 4

exp10(x)

返回 10 的 x 次方

exp10(1)

返回 10

log(x)

返回 log 以 e 为底的对数值

log(e())

返回 1

log2(x)

返回 log 以 2 为底的对数值

log2(2)

返回 1

log10(x)

返回 log 以 10 为底的对数值

log10(100)

返回 2

sqrt(x)

对 x 开平方

sqrt(4)

返回 2

cbrt(x)

对 x 开立方

cbrt(8)

返回 2

pow(x, y)

返回 x 的 y 次方

pow(2, 3)

返回 8

函数

用途

举例

结果

flfloor(x[, N])

向下取数

flfloor(123.883, 1)flfloor(123.883, -1)

返回 123.8 返回 120

ceil(x[, N])

向上取数

ceil(123.883, 1)ceil(123.883, -1)

返回 123.9 返回 130

round(x[, N])

四舍五入

round(123.883, 1)round(123.883, -1)

返回 123.9 返回 120

6.5 条件语句

6.6 数学函数

6.7 舍入函数

6.8 URL 操作函数 函数

用途

举例

结果

protocol()

返回 URL 的协

议类型

protocol(‘http://www.baidu.com.cn’)

返回 http

domain()

返回 URL 的域

名

domain(‘http://www.baidu.com.cn’)

返回 www.baidu.com.cn

domainWithoutWWW()

返回 URL 不带

www 的域名

domainWithoutWWW(‘http://www.baidu.com.cn’)

返回 baidu.com.cn

topLevelDomain()

返回顶级域

名

topLevelDomain(‘http://www.baidu.com.cn’)

返回 cn

fifirstSignifificantSubdomain()

Returns the

“fifirst

signifificant

subdomain”.

fifirstSignifificantSubdomain(‘http://www.baidu.com.cn’)

返回 baidu

cutToFirstSignifificantSubdomain()

Returns the

part of the

domain that

includes

top-level

subdomains

up to the

“fifirst

signifificant

subdomain”

(see the

explanation

above).

cutToFirstSignifificantSubdomain(‘http://www.baidu.com.cn’)

返回 baidu.com.cn

path()

返回 URL 的路

径

path(‘https://www.baidu.com/s?

wd=SQL%E4%B8%AD%E7%9A%84split’)

返回 /s

pathFull()

返回 URL 的完

整路径

pathFull(‘https://www.baidu.com/s?

wd=SQL%E4%B8%AD%E7%9A%84split’)

返回 /s?wd=SQL%E4%B8%AD%E7%

queryString()

返回 URL 的参

数（查询字

符串）

queryString(‘https://www.baidu.com/s?

wd=SQL%E4%B8%AD%E7%9A%84split’)

返回 wd=SQL%E4%B8%AD%E7%9A

extractURLParameters()

以列表的形

式返回 URL 的

参数

extractURLParameters(‘https://www.baidu.com/s?

wd=SQL%E4%B8%AD%E7%9A%84split&ur=qwguq’)

[‘wd=SQL%E4%B8%AD%E7%9A%84

extractURLParameterNames()

以列表的形

式返回 URL 的

参数名

extractURLParameterNames(‘https://www.baidu.com/s?

wd=SQL%E4%B8%AD%E7%9A%84split&ur=qwguq’)

返回 [‘wd’,‘ur’]

cutQueryString()

返回 URL ？

（参数）前

面的内容

cutQueryString(‘https://www.baidu.com/s?

wd=SQL%E4%B8%AD%E7%9A%84split&ur=qwguq’)

返回 https://www.baidu.com/s

函数

用途

举例

结果

IPv4StringToNum(s)

将 IPV4

转为数

值，非

IPV4 的

转化为

IPv4StringToNum(‘23.217.198.69’)IPv4StringToNum(‘adwh.124.qwfqw’)

400148037 返

回 0

IPv4NumToString(num)

将数值

转为

IPV4

IPv4NumToString(400148037)

23.217.198.69

IPv4NumToStringClassC(num)

将数值

转为

IPV4 ，

且最后

的段位

用 xxx

代替

IPv4NumToStringClassC(400148037)

23.217.198.xxx

6.9 IP 操作函数 函数

用途

举例

结果

INNER JOIN

内连接

A 表 INNER JOIN B 表

A 表与 B 表的公共部分

LEFT OUTER JOIN

左外连接

A 表 LEFT OUTER JOIN B 表

A 表不在 B 表中的部分

RIGHT OUTER JOIN

右外连接

A 表 RIGHT OUTER JOIN B 表

B 表不在 A 表中的部分

FULL OUTER JOIN

全外连接

A 表 FULL OUTER JOIN B 表

A 与 B 表全部，没有为 NULL

函数

用途

举例

结果

LIMIT N

查询 N 条数据，一般跟

ORDER BY 连用

ORDER BY hit DESC LIMIT 10

按照 hit 列

降排取前

LIMIT N

Clause

按照 Clause 列查询 N 条数

据，一般跟 ORDER BY 连用

SELECT date ， domain, count(1) AS hit from db.tb where…GROUP

BY date ， domain,ORDER BY hit DESCLIMIT 10 BY date

取每天

TOP10 的

域名

函数

用途

举例

结果

dictGetString()

字典映射

dictGetString(‘ck_abc_dic’, ‘ck_value’, tuple(_abc))

6.10 表操作

6.11 字典操作

学无止境的csdn

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
clickhouse学习笔记

官网地址点击进入简介 Clickhouse是一款分布式实时分析型列式数据库服务。高性能，开箱即用，企业特性支持。应用于流量分析，广告营销分析，行为分析，人群划分，客户画像，敏捷BI，数据集市，网络监控，分布式服务和链路监控等业务场景。优点： 1. 真正的面向列的DBMS 2. 数据高效压缩 3. 磁盘存储的数据 4. 多核并行处理 5. 支持分布式处理 6. SQL语法支持 7. 实时数据更新 8. 支持近似预...
复制链接

扫一扫

专栏目录

clickhouse学习笔记

简介

“相关推荐”对你有帮助么？