【Hive SQL 每日一题】统计最近1天/7天/30天商品的销量

最新推荐文章于 2024-09-13 11:00:00 发布

原创最新推荐文章于 2024-09-13 11:00:00 发布 · 1.5k 阅读

21 ·

CC 4.0 BY-SA版权

文章标签：

#hive #sql #hadoop #数据仓库 #数据分析

Hive SQL 每日一题专栏收录该内容

14 篇文章

订阅专栏

文章目录

测试数据

create table if not exists sales(
id int,
product_id int,
quantity int,
sale_date string);

INSERT INTO sales (id, product_id, quantity, sale_date) VALUES
(1, 101, 2, '2024-05-16'),
(2, 102, 1, '2024-05-15'),
(3, 101, 3, '2024-05-15'),
(4, 103, 4, '2024-05-14'),
(5, 102, 2, '2024-05-14'),
(6, 101, 1, '2024-05-13'),
(7, 103, 3, '2024-05-13'),
(8, 104, 5, '2024-05-12'),
(9, 102, 4, '2024-05-11'),
(10, 105, 2, '2024-05-11'),
(11, 104, 2, '2024-05-11'),
(12, 106, 2, '2024-05-10'),
(13, 102, 2, '2024-05-10'),
(14, 101, 2, '2024-05-08'),
(15, 101, 2, '2024-05-08'),
(16, 105, 2, '2024-05-05'),
(17, 104, 2, '2024-05-01'),
(18, 106, 2, '2024-04-29'),
(19, 102, 2, '2024-04-20'),
(20, 101, 2, '2024-04-15');

需求说明

统计最近 1 天/ 7 天/ 30 天各个商品的销量（假设今天为 2024-05-17）。

结果示例：

product_id	recent_days	total_quantity	total_sales
101	1	3	3
101	7	6	4
101	30	10	6
…	…	…	…

结果按 recent_days 升序、total_quantity 降序排列。

其中：

product_id 表示商品 ID；
recent_days 表示最近 n 天；
total_quantity 表示该商品的销售数量；
total_sales 表示该商品的销售次数（用户一次性购买多件该商品，只记录一次销售）。

需求实现

-- 最近1天
select
  product_id,
  1 recent_days,
  sum(quantity) total_quantity, 
  count(product_id) total_sales 
from
  sales
where
  sale_date = "2024-05-16"
group by
  product_id
union all
-- 最近7天
select
  product_id,
  7 recent_days,
  sum(quantity) total_quantity, 
  count(product_id) total_sales 
from
  sales
where
  sale_date >= date_sub("2024-05-16",6) and sale_date <= "2024-05-16"
group by
  product_id
union all
-- 最近30天
select
  product_id,
  30 recent_days,
  sum(quantity) total_quantity, 
  count(product_id) total_sales 
from
  sales
where
  sale_date >= date_sub("2024-05-16",29) and sale_date <= "2024-05-16"
group by
  product_id
order by
  recent_days,total_quantity desc;

输出结果如下：

在这里插入图片描述

虽然这种方法可以算出结果，但是效率很低，我们需要算三次然后再进行合并，数据量一大的时候那就太慢了，那么有没有更好的方法呢？当然有！

首先来看优化完成后的 SQL 代码：

select
  product_id,
  rds recent_days,
  sum(quantity) total_quantity, 
  count(product_id) total_sales 
from
  sales lateral view explode(array(1,7,30)) tmp as rds
where
  sale_date >= date_sub("2024-05-16",rds - 1) and sale_date <= "2024-05-16"
group by
  rds,product_id
order by
  recent_days,total_quantity desc;