clickhouse系列学习笔记——（三）（下）clickhouse基本语法

最新推荐文章于 2023-06-10 18:49:26 发布

肿么肥四啊哈

最新推荐文章于 2023-06-10 18:49:26 发布

阅读量1k

点赞数 1

分类专栏： clickhouse 文章标签：大数据

本文链接：https://blog.csdn.net/niuyan666/article/details/117091861

版权

clickhouse 专栏收录该内容

6 篇文章 1 订阅

订阅专栏

三 clickhouse基础入门

2 基本语法

2.1 DDL基础

建表

目前只有MergeTree、Merge和Distributed这三类表引擎支持 ALTER查询,所以在进行alter操作的时候注意表的引擎!

注意在建表的时候一般要求指定表的引擎

CREATE TABLE tb_test1
(
    `id` Int8,
    `name` String
)
ENGINE = Memory() ;

修改表结构

-- 查看表结构
desc tb_test1 ;
┌─name─┬─type───┬
│ id   │ Int8   │
│ name │ String │
└──────┴────────┴
-- 添加字段
alter table tb_test1 add column age UInt8 ;-- 报错 , 因为修改的表引擎是内存引擎,不支持表结构的修改 
-- 创建一张MergeTree引擎的表
CREATE TABLE tb_test2
(
    `id` Int8,
    `name` String
)
ENGINE = MergeTree()
ORDER BY id ;
┌─name─┬─type───┬
│ id   │ Int8   │
│ name │ String │
└──────┴────────┴
-- 添加字段 
alter table tb_test2 add column age UInt8 ;
┌─name─┬─type───┬
│ id   │ Int8   │
│ name │ String │
│ age  │ UInt8  │
└──────┴────────┴
alter table tb_test2 add column gender String after name ; 
┌─name───┬─type───┬
│ id     │ Int8   │
│ name   │ String │
│ gender │ String │
│ age    │ UInt8  │
└────────┴────────┴
-- 删除字段
alter table tb_test2 drop column age ;
-- 修改字段的数据类型 
alter  table  tb_test2 modify column  gender UInt8 default 0 ;
┌─name───┬─type───┬─default_type─┬─default_expression─┬
│ id     │ Int8   │              │                    │
│ name   │ String │              │                    │
│ gender │ UInt8  │ DEFAULT      │ 0                  │
└────────┴────────┴──────────────┴────────────────────┴
-- 作为一个优秀的程序员,表的字段使用注释一种良好的习惯, 所以建议大家在操作的时候使用注释来描述字段的意义
-- 修改 / 添加字段的注释
alter table tb_test2 comment column name '用户名' ;
┌─name───┬─type───┬─default_type─┬─default_expression─┬─comment─┬
│ id     │ Int8   │              │                    │         │
│ name   │ String │              │                    │ 用户名  │
│ gender │ UInt8  │ DEFAULT      │ 0                  │         │
└────────┴────────┴──────────────┴────────────────────┴─────────┴

移动表

在Linux系统中，mv命令的本意是将一个文件从原始位置A移动到目标位置B，但是如果位置A与位置B相同，则可以变相实现重命名的作用。ClickHouse的RENAME查询就与之有着异曲同工之妙，RENAME语句的完整语法如下所示：

-- 修改表名 
rename table tb_test1 to t1 ;
-- 修改多张表名
rename table tb_test2 to t2 , t1 to tt1 ;
-- 移动表到另一数据库中 
rename table t2 to test1.t ;
-- 查看数据库下的所有的表 
show tables ;
show tables from db_name ;

设置表属性

-- 设置列的默认值 
create table tb_test3(
   id Int8 ,
    name String ,
    role String default 'VIP'
)engine = Log ;
┌─name─┬─type───┬─default_type─┬─default_expression─┬
│ id   │ Int8   │              │                    │
│ name │ String │              │                    │
│ role │ String │ DEFAULT      │ 'VIP'              │
└──────┴────────┴──────────────┴────────────────────┴
insert into tb_test3 (id , name) values(1,'HANGGE') ;
SELECT *
FROM tb_test3 ;
┌─id─┬─name───┬─role─┐
│  1 │ HANGGE │ VIP  │
└────┴────────┴──────┘

2.2 DML基础

1) 插入数据

INSERT语句支持三种语法范式，三种范式各有不同，可以根据写入的需求灵活运用。

第一种方式

使用VALUES格式的常规语法

INSERT INTO [db.]table [(c1, c2, c3…)] VALUES (v11, v12, v13…), (v21, v22, v23…), …

其中，c1、c2、c3是列字段声明，可省略。VALUES后紧跟的是由元组组成的待写入数据，通过下标位与列字段声明一一对应。数据支持批量声明写入，多行数据之间使用逗号分隔

第二种方式

静态数据: cat user.txt 
1,zss,23,BJ,M
2,lss,33,NJ,M
3,ww,21,SH,F
create table test_load1(
    id UInt8 ,
    name String ,
    age UInt8 ,
    city String ,
    gender String
)engine=Log ;
-- 将数据导入到表中
cat user.txt  | clickhouse-client  -q 'insert into default.test_load1 format CSV' 
 
上面的两种方式都可以将数据导入到表中  
-- 我们还可以执行数据行属性的分割符
clickhouse-client --format_csv_delimiter=',' -q 'insert into default.test_load1 format CSV'  <   user.txt

第三种方式

INSERT INTO [db.]table [(c1, c2, c3…)] SELECT …

虽然VALUES和SELECT子句的形式都支持声明表达式或函数，但是表达式和函数会带来额外的性能开销，从而导致写入性能的下降。所以如果追求极致的写入性能，就应该尽可能避免使用它们。

create table log3 as log2 ;
Insert into log3 select * from log2 ;

ClickHouse内部所有的数据操作都是面向Block数据块的，所以INSERT查询最终会将数据转换为Block数据块。也正因如此，INSERT语句在单个数据块的写入过程中是具有原子性的。在默认的情况下，每个数据块最多可以写入1048576行数据（由max_insert_block_size参数控制）。也就是说，如果一条INSERT语句写入的数据少于max_insert_block_size行，那么这批数据的写入是具有原子性的，即要么全部成功，要么全部失败。需要注意的是，只有在ClickHouse服务端处理数据的时候才具有这种原子写入的特性，例如使用JDBC或者HTTP接口时。因为max_insert_block_size参数在使用CLI命令行或者INSERT SELECT子句写入时是不生效的。

2) 更新删除数据

ClickHouse提供了DELETE和UPDATE的能力，这类操作被称为Mutation查询，它可以看作ALTER语句的变种。虽然Mutation能最终实现修改和删除，但不能完全以通常意义上的UPDATE和DELETE来理解，我们必须清醒地认识到它的不同：首先，Mutation语句是一种“很重”的操作，更适用于批量数据的修改和删除；其次，它不支持事务，一旦语句被提交执行，就会立刻对现有数据产生影响，无法回滚；最后， Mutation语句的执行是一个异步的后台过程，语句被提交之后就会立即返回。所以这并不代表具体逻辑已经执行完毕，它的具体执行进度需要通过system.mutations系统表查询。注意数据的修改和删除操作是使用用MergeTree家族引擎:

删除分区数据

-- 创建表
create table test_muta(
	id UInt8 ,
    name String ,
    city String
)engine=MergeTree() 
partition  by city 
order by id ;
-- 导入数据 
clickhouse-client  -q 'insert into test_muta format CSV' < data.csv 
-- 删除分区数据 
alter table test_muta drop  partition 'SH' ;

条件删除数据

alter table test_muta delete where id=3 ;

条件更新数据

ALTER TABLE [db_name.]table_name UPDATE column1 = expr1 [, ...] WHERE filter_expr 
ALTER TABLE test_ud
UPDATE name = 'my', job = 'teacher' WHERE id = '2' ; 

alter table test_muta update name='李思思'  where id=3 ;

但是注意的时候一定指定where条否则会报错,这种语法的where条件也可以使用子查询 ;

2.3 分区表操作

目前只有MergeTree系列的表引擎支持数据分区,分区的基本概念和意义和hive中的意义一样,这里不过多赘述!

create table test_partition1(
id String , 
ctime DateTime
)engine=MergeTree() 
partition by toYYYYMM(ctime)
order by (id) ;
-- 查看建表语句 
│ CREATE TABLE default.test_partition1
(
    `id` String,
    `ctime` DateTime
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(ctime)
ORDER BY id
SETTINGS index_granularity = 8192 │
-- 插入数据
insert into test_partition1 values(1,now()) ,(2,'2021-06-11 11:12:13') ;
-- 查看数据
SELECT *
FROM test_partition1 ;
┌─id─┬───────────────ctime─┐
│ 2  │ 2021-06-11 11:12:13 │
└────┴─────────────────────┘
┌─id─┬───────────────ctime─┐
│ 1  │ 2021-05-19 13:38:29 │
└────┴─────────────────────┘
--  查看表中的分区 
ClickHouse内置了许多system系统表，用于查询自身的状态信息。 其中parts系统表专门用于查询数据表的分区信息。
SELECT 
    name,
    table,
    partition
FROM system.parts
WHERE table = 'test_partition1' ;
┌─name─────────┬─table───────────┬─partition─┐
│ 202105_1_1_0 │ test_partition1 │ 202105    │
│ 202106_2_2_0 │ test_partition1 │ 202106    │
└──────────────┴─────────────────┴───────────┘
insert into test_partition1 values(1,now()) ,(2,'2021-06-12 11:12:13') ;

┌─name─────────┬─table───────────┬─partition─┐
│ 202105_1_1_0 │ test_partition1 │ 202105    │
│ 202105_3_3_0 │ test_partition1 │ 202105    │
│ 202106_2_2_0 │ test_partition1 │ 202106    │
│ 202106_4_4_0 │ test_partition1 │ 202106    │
└──────────────┴─────────────────┴───────────┘
-- 删除分区 
alter table test_partition1 drop partition '202105' ;
删除分区以后 , 分区中的所有的数据全部删除 
SELECT 
    name,
    table,
    partition
FROM system.parts
WHERE table = 'test_partition1'
┌─name─────────┬─table───────────┬─partition─┐
│ 202106_2_2_0 │ test_partition1 │ 202106    │
│ 202106_4_4_0 │ test_partition1 │ 202106    │
└──────────────┴─────────────────┴───────────┘
SELECT *
FROM test_partition1
┌─id─┬───────────────ctime─┐
│ 2  │ 2021-06-12 11:12:13 │
└────┴─────────────────────┘
┌─id─┬───────────────ctime─┐
│ 2  │ 2021-06-11 11:12:13 │
└────┴─────────────────────┘

-- 复制分区 
clickHouse支持将A表的分区数据复制到B表，这项特性可以用于快速数据写入、多表间数据同步和备份等场景，它的完整语法如下：
ALTER TABLE  B  REPLACE PARTITION partition_expr FROM A 
不过需要注意的是，并不是任意数据表之间都能够相互复制，它们还需要满足两个前提 条件：
·两张表需要拥有相同的分区键
·它们的表结构完全相同。
create table test_partition2  as  test_partition1 ;
show  create table test_partition2 ;  -- 查看表2的建表语句 
│ CREATE TABLE default.test_partition2
    (
        `id` String,
        `ctime` DateTime
    )
    ENGINE = MergeTree()
    PARTITION BY toYYYYMM(ctime)
    ORDER BY id
    SETTINGS index_granularity = 8192 │   -- 两张表的结构完全一致
-- 复制一张表的分区到另一张表中 
SELECT *
FROM test_partition2
┌─id─┬───────────────ctime─┐
│ 2  │ 2021-06-12 11:12:13 │
└────┴─────────────────────┘
┌─id─┬───────────────ctime─┐
│ 2  │ 2021-06-11 11:12:13 │
└────┴─────────────────────┘
┌─id─┬───────────────ctime─┐
│ 2  │ 2021-06-21 11:12:13 │
└────┴─────────────────────┘
----------------------------
alter table test_partition2  replace  partition '202106' from  test_partition1
SELECT 
    name,
    table,
    partition
FROM system.parts
WHERE table = 'test_partition2'
┌─name─────────┬─table───────────┬─partition─┐
│ 202106_2_2_0 │ test_partition2 │ 202106    │
│ 202106_3_3_0 │ test_partition2 │ 202106    │
│ 202106_4_4_0 │ test_partition2 │ 202106    │
└──────────────┴─────────────────┴───────────┘

-- 重置分区数据 
如果数据表某一列的数据有误，需要将其重置为初始值，如果设置了默认值那么就是默认值数据,如果没有设置默认值,系统会给出默认的初始值,此时可以使用下面的语句实现：
ALTER TABLE tb_name CLEAR COLUMN column_name IN PARTITION partition_expr ;
注意: 不能重置主键和分区字段
示例: 
alter  table test_rep clear column name in partition '202105' ;

-- 卸载分区 
表分区可以通过DETACH语句卸载，分区被卸载后，它的物理数据并没有删除，而是被转移到了当前数据表目录的detached子目录下。而装载分区则是反向操作，它能够将detached子目录下的某个分区重新装载回去。卸载与装载这一对伴生的操作，常用于分区数据的迁移和备份场景

┌─id─┬─name─┬───────────────ctime─┐
│  1 │      │ 2021-05-19 13:59:49 │
│  2 │      │ 2021-05-19 13:59:49 │
└────┴──────┴─────────────────────┘
┌─id─┬─name─┬───────────────ctime─┐
│  3 │ ww   │ 2021-04-11 11:12:13 │
└────┴──────┴─────────────────────┘
alter table test_rep detach partition '202105' ;
┌─id─┬─name─┬───────────────ctime─┐
│  3 │ ww   │ 2021-04-11 11:12:13 │
└────┴──────┴─────────────────────┘
-- 装载分区
alter table test_rep attach partition '202105' ;
┌─id─┬─name─┬───────────────ctime─┐
│  1 │      │ 2021-05-19 13:59:49 │
│  2 │      │ 2021-05-19 13:59:49 │
└────┴──────┴─────────────────────┘
┌─id─┬─name─┬───────────────ctime─┐
│  3 │ ww   │ 2021-04-11 11:12:13 │
└────┴──────┴─────────────────────┘
-- 记住，一旦分区被移动到了detached子目录，就代表它已经脱离 了ClickHouse的管理，ClickHouse并不会主动清理这些文件。这些分区文件会一直存在，除非我们主动删除或者使用ATTACH语句重新装载

2.4 视图

1) 普通视图

ClickHouse拥有普通和物化两种视图，其中物化视图拥有独立的存储，而普通视图只是一层简单的查询代理

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name AS SELECT ...

普通视图不会存储任何数据，它只是一层单纯的SELECT查询映射，起着简化查询、明晰语义的作用，对查询性能不会有任何增强。

create view  test3_view as select id , upper(name) , role from tb_test3 ;
┌─name────────────┐
│ tb_test3        │
│ test3_view      │
│ test_partition1 │
│ test_partition2 │
│ test_rep        │
│ tt1             │
└─────────────────┘
drop view test3_view ;   -- 删除视图

2) 物化视图

物化视图支持表引擎，数据保存形式由它的表引擎决定，创建物化视图的完整语法如下所示

create materialized view mv_log engine=Log populate as select * from log ;

物化视图创建好之后，如果源表被写入新数据，那么物化视图也会同步更新。POPULATE修饰符决定了物化视图的初始化策略：如果使用了POPULATE修饰符，那么在创建视图的过程中，会连带将源表中已存在的数据一并导入，如同执行了INTO SELECT 一般；反之，如果不使用POPULATE修饰符，那么物化视图在创建之后是没有数据的，它只会同步在此之后被写入源表的数据。物化视图目前并不支持同步删除，如果在源表中删除了数据，物化视图的数据仍会保留。

create materialized view test3_view engine = Log populate as select * from tb_test3 ;
-- 建表的时候同步数据 , 当数据更新以后 物化视图中的数据会同步更新 , 但是当删除数据以后,物化视图中的数据不会被删除
SELECT *
FROM test3_view ;
┌─id─┬─name───┬─role─┐
│  1 │ HANGGE │ VIP  │
│  2 │ BENGE  │ VIP  │
│  3 │ PINGGE │ VIP  │
└────┴────────┴──────┘
-- 向源表中擦混入数据
SELECT *
FROM test3_view

┌─id─┬─name──┬─role─┐
│  4 │ TAOGE │ VIP  │
└────┴───────┴──────┘
┌─id─┬─name───┬─role─┐
│  1 │ HANGGE │ VIP  │
│  2 │ BENGE  │ VIP  │
│  3 │ PINGGE │ VIP  │
└────┴────────┴──────┘
-- 删除源表中的数据 , 物化视图中的数据 不会变化   
注意: 数据删除语法只适用于MergeTree引擎的表   基本语法如下
ALTER TABLE db_name.table_name DROP PARTITION '20210601'
ALTER TABLE db_name.table_name DELETE WHERE day = '20210618'
ALTER TABLE <table_name> UPDATE col1 = expr1, ... WHERE <filter>

Show tables ; 其实物化视图就是一种特殊的表

肿么肥四啊哈

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
clickhouse系列学习笔记——（三）（下）clickhouse基本语法

三 clickhouse基础入门2 基本语法2.1 DDL基础建表目前只有MergeTree、Merge和Distributed这三类表引擎支持 ALTER查询,所以在进行alter操作的时候注意表的引擎!注意在建表的时候一般要求指定表的引擎CREATE TABLE tb_test1( `id` Int8, `name` String)ENGINE = Memory() ;修改表结构-- 查看表结构desc tb_test1 ;┌─name─┬─type─
复制链接

扫一扫