派生表是由SQL查询动态创建的表,仅存在与查询过程中的临时表,不会持久化在数据库中。ClickHouse提供派生表用于简化复杂查询、减少需要处理的数据量。本文通过多个实例展示如何创建、连接派生表,以及如何多次使用派生表。
隐式创建派生表
创建派生表
要创建派生表,可以在主查询中包括子查询,使用子查询结果作为派生表,举例:
SELECT *
FROM (
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
GROUP BY customer_id
) AS customer_orders
WHERE num_orders > 10;
该示例中子查询创建派生表,负责计算每个客户端的订单数。主查询过滤派生表结果,仅保留订单数量超过10的记录。
连接派生表
派生表还可以与其他表连接实现更复杂查询,举例:
SELECT *
FROM products
JOIN (
SELECT product_id, AVG(price) AS avg_price
FROM prices
GROUP BY product_id
) AS product_prices
ON products.product_id = product_prices.product_id
WHERE product_prices.avg_price > 50;
该示例中,子查询创建派生表,负责计算每个产品的平均价格;主查询连接派生表和产品表,并过滤仅保留平均价格大于50的产品。
多次使用派生表
可以在一个查询中多次使用派生表,举例:
SELECT *
FROM (
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
GROUP BY customer_id
) AS customer_orders
JOIN (
SELECT customer_id, SUM(total_price) AS total_spent
FROM orders
GROUP BY customer_id
) AS customer_totals
ON customer_orders.customer_id = customer_totals.customer_id
WHERE customer_orders.num_orders > 10 AND customer_totals.total_spent > 1000;
该实例中创建了两个派生表:一个负责计算每个客户的订单数量,另一个计算每个客户的消费总额。主查询连接两个派生表,并过滤符合条件的记录:订单数量大于10,消费总额大于1000。
显示创建派生表
CTE(common table expression)是临时命名数据集,是定义在SELECT, INSERT, UPDATE, DELETE 语句中派生表,利用CTE可以提升查询SQL的可读性。但CTE和临时表或嵌套表有一定差异:
-
如果子查询是关联查询 (也称为同步子查询),它有可能对查询中每行需要重复执行,极大地增加查询执行负载。
-
大量数据填充临时表可能会占用磁盘空间,由于存储临时表特性,使用临时表执行查询会增加执行时间。
CTE语法
ClickHouse支持两种语法:
WITH <expression> AS <identifier>
-- 或者
WITH <identifier> AS <subquery expression>
如果一次定义多个CTE,可以使用逗号分隔。
举例:
WITH locations AS
(
SELECT location
FROM table
WHERE date > (today() - 10)
)
SELECT *
FROM locations
-- 另一个简单示例
WITH ('USA', 'BRA') AS locations
SELECT 'ARG' IN (locations)
完整CTE示例
首先创建表并插入示例数据:
CREATE TABLE SpareParts
(
`id` UInt32,
`partName` String,
`partOrigin` String,
`storeID` UInt32
)
ENGINE = MergeTree()
ORDER BY id
-- 插入数据
INSERT INTO SpareParts VALUES (1, 'headlight', 'USA', 1)
,(2, 'hood', 'JPN', 1)
, (3, 'bumper', 'USA', 1)
, (4, 'radiator', 'BRA', 3)
, (5, 'wheel', 'BRA', 2)
,(6, 'stabilizer', 'ARG', 3)
, (7, 'absorber', 'TUR', 2)
, (8, 'cable', 'MEX', 1)
,(9, 'spring', 'MEX', 3)
, (10, 'door', 'USA', 2);
定义CTE查询:
WITH
originsByStore AS
(
SELECT
storeID,
groupArray(partOrigin) AS origins
FROM SpareParts
GROUP BY storeID
),
partsByStore AS
(
SELECT
storeID,
groupArray(partName) AS partNames
FROM SpareParts
GROUP BY storeID
),
has(origins, 'USA') = 1 AS isUSA
SELECT
storeID,
origins,
partNames,
isUSA
FROM originsByStore AS t1
LEFT JOIN
(
SELECT
storeID,
partNames
FROM partsByStore
) AS t2 USING (storeID)
返回结果:
┌─storeID─┬─origins───────────────────┬─partNames─────────────────────────────┬─isUSA─┐
│ 3 │ ['BRA','ARG','MEX'] │ ['radiator','stabilizer','spring'] │ 0 │
│ 2 │ ['BRA','TUR','USA'] │ ['wheel','absorber','door'] │ 1 │
│ 1 │ ['USA','JPN','USA','MEX'] │ ['headlight','hood','bumper','cable'] │ 1 │
└─────────┴───────────────────────────┴───────────────────────────────────────┴───────┘
总结
本文介绍了ClickHouse派生表,如何显示或隐式创建派生表,利用好派生表不仅可以提升SQL可读性,还可以提升查询性能。