ClickHouse CREATE TABLE 语法详解
ClickHouse 提供了强大的 CREATE TABLE
语法来创建表,支持多种表引擎和灵活的列定义。以下是主要特性的详细介绍:
基本语法
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
column1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1],
column2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2],
...
) ENGINE = engine_name()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]
关键组成部分
1. 列定义
- 数据类型:支持
UInt8/16/32/64
,Int8/16/32/64
,Float32/64
,String
,FixedString(N)
,Date
,DateTime
,Decimal(P,S)
等 - 默认值:
DEFAULT
:显式指定的默认值MATERIALIZED
:计算列,不能插入值ALIAS
:列别名,不实际存储
CREATE TABLE example (
id UInt32,
event_date Date DEFAULT today(),
processed_flag UInt8 MATERIALIZED 0,
description ALIAS concat('ID: ', toString(id))
)
2. 表引擎 (ENGINE)
ClickHouse 的核心特性,决定表的存储方式和特性:
常用引擎
-
MergeTree 系列(最常用):
ENGINE = MergeTree() ENGINE = ReplacingMergeTree([ver]) ENGINE = SummingMergeTree([columns]) ENGINE = AggregatingMergeTree()
-
Log 系列(小数据量场景):
ENGINE = TinyLog() ENGINE = StripeLog() ENGINE = Log()
-
集成引擎(用于外部系统):
ENGINE = Kafka() ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password')
3. 分区与排序
- PARTITION BY:分区表达式,常用按日期分区
- ORDER BY:排序键,决定数据在分区内的物理排序
- PRIMARY KEY:主键(默认与ORDER BY相同)
CREATE TABLE events (
event_date Date,
event_type String,
user_id UInt64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, event_type, user_id)
4. 高级特性
-
TTL (Time To Live):自动删除过期数据
TTL event_date + INTERVAL 1 MONTH
-
压缩编解码器:
column_name String CODEC(ZSTD(5))
-
采样表达式:
SAMPLE BY intHash32(user_id)
分布式表创建
使用 ON CLUSTER
语法在集群上创建表:
CREATE TABLE distributed_table ON CLUSTER my_cluster
(
id UInt64,
data String
) ENGINE = Distributed(my_cluster, default, local_table, rand())
实际示例
CREATE TABLE IF NOT EXISTS analytics.user_events ON CLUSTER analytics_cluster
(
event_date Date,
event_time DateTime,
user_id UInt64,
event_type Enum8('click'=1, 'view'=2, 'purchase'=3),
product_id String,
revenue Decimal(10,2) DEFAULT 0,
session_duration UInt32 MATERIALIZED toUnixTimestamp(event_time) - toUnixTimestamp(session_start),
session_start DateTime
) ENGINE = ReplacingMergeTree(event_time)
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, event_type, user_id)
PRIMARY KEY (event_date, event_type)
TTL event_date + INTERVAL 6 MONTH
SETTINGS index_granularity = 8192
ClickHouse 的 CREATE TABLE 语法非常灵活,可以根据具体的数据特性和查询模式进行优化配置。