【SQL练习】Hive SQL多维度指标计算方法

在Hive SQL 中如何从多个维度去计算同一个指标?
假如有一张订单表,表中有 时间,省份,商品分类,销售金额 共4个字段,如何1条sql计算出如下每天的以下指标:
a)、每天销售额;
b)、每天各省份销售额;
c)、每天各商品分类销售额;
d)、每天各省份、各商品分类销售额;

在Hive SQL中,可以使用GROUPING SETSCUBEROLLUP来一次性计算多个维度的同一指标。以下是几种实现方式:

方法1:使用 GROUPING SETS(推荐)

SELECT 
    time AS 日期,
    province AS 省份,
    category AS 商品分类,
    SUM(sales_amount) AS 销售额,
    CASE 
        WHEN GROUPING(province) = 1 AND GROUPING(category) = 1 THEN '每天销售额'
        WHEN GROUPING(category) = 1 THEN '每天各省份销售额' 
        WHEN GROUPING(province) = 1 THEN '每天各商品分类销售额'
        ELSE '每天各省份各商品分类销售额'
    END AS 指标类型
FROM order_table
GROUP BY time, province, category
GROUPING SETS (
    (time),                    -- 每天销售额
    (time, province),          -- 每天各省份销售额  
    (time, category),          -- 每天各商品分类销售额
    (time, province, category) -- 每天各省份各商品分类销售额
)
ORDER BY time, 
         GROUPING(province), 
         GROUPING(category),
         province, 
         category;

方法2:使用 CUBE

SELECT 
    time AS 日期,
    province AS 省份,
    category AS 商品分类,
    SUM(sales_amount) AS 销售额,
    CASE 
        WHEN province IS NULL AND category IS NULL THEN '每天销售额'
        WHEN category IS NULL THEN '每天各省份销售额'
        WHEN province IS NULL THEN '每天各商品分类销售额'
        ELSE '每天各省份各商品分类销售额'
    END AS 指标类型
FROM order_table
GROUP BY time, province, category
WITH CUBE
HAVING time IS NOT NULL  -- 排除时间维度的汇总
ORDER BY time, province, category;

方法3:使用 ROLLUP + UNION ALL

-- 每天销售额
SELECT time, NULL AS province, NULL AS category, SUM(sales_amount), '每天销售额'
FROM order_table
GROUP BY time

UNION ALL

-- 每天各省份销售额
SELECT time, province, NULL AS category, SUM(sales_amount), '每天各省份销售额'
FROM order_table  
GROUP BY time, province

UNION ALL

-- 每天各商品分类销售额
SELECT time, NULL AS province, category, SUM(sales_amount), '每天各商品分类销售额'
FROM order_table
GROUP BY time, category

UNION ALL

-- 每天各省份各商品分类销售额
SELECT time, province, category, SUM(sales_amount), '每天各省份各商品分类销售额'
FROM order_table
GROUP BY time, province, category

ORDER BY 1, 2, 3;

方法4:更清晰的输出格式

如果希望结果更易读,可以这样处理NULL值:

SELECT 
    time AS 日期,
    COALESCE(province, '全部省份') AS 省份,
    COALESCE(category, '全部分类') AS 商品分类,
    SUM(sales_amount) AS 销售额,
    CASE 
        WHEN province IS NULL AND category IS NULL THEN '每天销售额'
        WHEN category IS NULL THEN '每天各省份销售额'
        WHEN province IS NULL THEN '每天各商品分类销售额'
        ELSE '每天各省份各商品分类销售额'
    END AS 指标类型
FROM order_table
GROUP BY time, province, category
GROUPING SETS (
    (time),
    (time, province), 
    (time, category),
    (time, province, category)
)
ORDER BY time, 指标类型, province, category;

各方法对比:

  • GROUPING SETS:最灵活,明确指定需要的维度组合
  • CUBE:生成所有可能的维度组合(包含不需要的汇总)
  • UNION ALL:逻辑清晰但代码冗长,性能较差

推荐使用GROUPING SETS,因为它:

  1. 代码简洁易懂
  2. 性能最优
  3. 可以精确控制需要计算的维度组合
  4. 使用GROUPING()函数可以准确识别汇总级别
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值