在Hive SQL 中如何从多个维度去计算同一个指标?
假如有一张订单表,表中有 时间,省份,商品分类,销售金额 共4个字段,如何1条sql计算出如下每天的以下指标:
a)、每天销售额;
b)、每天各省份销售额;
c)、每天各商品分类销售额;
d)、每天各省份、各商品分类销售额;
在Hive SQL中,可以使用GROUPING SETS
、CUBE
或ROLLUP
来一次性计算多个维度的同一指标。以下是几种实现方式:
方法1:使用 GROUPING SETS(推荐)
SELECT
time AS 日期,
province AS 省份,
category AS 商品分类,
SUM(sales_amount) AS 销售额,
CASE
WHEN GROUPING(province) = 1 AND GROUPING(category) = 1 THEN '每天销售额'
WHEN GROUPING(category) = 1 THEN '每天各省份销售额'
WHEN GROUPING(province) = 1 THEN '每天各商品分类销售额'
ELSE '每天各省份各商品分类销售额'
END AS 指标类型
FROM order_table
GROUP BY time, province, category
GROUPING SETS (
(time), -- 每天销售额
(time, province), -- 每天各省份销售额
(time, category), -- 每天各商品分类销售额
(time, province, category) -- 每天各省份各商品分类销售额
)
ORDER BY time,
GROUPING(province),
GROUPING(category),
province,
category;
方法2:使用 CUBE
SELECT
time AS 日期,
province AS 省份,
category AS 商品分类,
SUM(sales_amount) AS 销售额,
CASE
WHEN province IS NULL AND category IS NULL THEN '每天销售额'
WHEN category IS NULL THEN '每天各省份销售额'
WHEN province IS NULL THEN '每天各商品分类销售额'
ELSE '每天各省份各商品分类销售额'
END AS 指标类型
FROM order_table
GROUP BY time, province, category
WITH CUBE
HAVING time IS NOT NULL -- 排除时间维度的汇总
ORDER BY time, province, category;
方法3:使用 ROLLUP + UNION ALL
-- 每天销售额
SELECT time, NULL AS province, NULL AS category, SUM(sales_amount), '每天销售额'
FROM order_table
GROUP BY time
UNION ALL
-- 每天各省份销售额
SELECT time, province, NULL AS category, SUM(sales_amount), '每天各省份销售额'
FROM order_table
GROUP BY time, province
UNION ALL
-- 每天各商品分类销售额
SELECT time, NULL AS province, category, SUM(sales_amount), '每天各商品分类销售额'
FROM order_table
GROUP BY time, category
UNION ALL
-- 每天各省份各商品分类销售额
SELECT time, province, category, SUM(sales_amount), '每天各省份各商品分类销售额'
FROM order_table
GROUP BY time, province, category
ORDER BY 1, 2, 3;
方法4:更清晰的输出格式
如果希望结果更易读,可以这样处理NULL值:
SELECT
time AS 日期,
COALESCE(province, '全部省份') AS 省份,
COALESCE(category, '全部分类') AS 商品分类,
SUM(sales_amount) AS 销售额,
CASE
WHEN province IS NULL AND category IS NULL THEN '每天销售额'
WHEN category IS NULL THEN '每天各省份销售额'
WHEN province IS NULL THEN '每天各商品分类销售额'
ELSE '每天各省份各商品分类销售额'
END AS 指标类型
FROM order_table
GROUP BY time, province, category
GROUPING SETS (
(time),
(time, province),
(time, category),
(time, province, category)
)
ORDER BY time, 指标类型, province, category;
各方法对比:
- GROUPING SETS:最灵活,明确指定需要的维度组合
- CUBE:生成所有可能的维度组合(包含不需要的汇总)
- UNION ALL:逻辑清晰但代码冗长,性能较差
推荐使用GROUPING SETS,因为它:
- 代码简洁易懂
- 性能最优
- 可以精确控制需要计算的维度组合
- 使用GROUPING()函数可以准确识别汇总级别