Platform address:vue-sqleditor
题目需求:
从订单明细表(order_detail)中查询累积销售件数高于其所属品类平均数的商品
期望结果如下:
sku_id<string> name<string> sum_num<bigint> cate_avg_num<bigint>
2 手机壳 6044 1546
5 破壁机 242 194
7 热水壶 252 194
8 微波炉 253 194
12 遮阳伞 20682 5373
需要用到的表:
商品信息表:sku_info
sku_id(商品id) name(商品名称) category_id(分类id) from_date(上架日期) price(商品价格)
1 xiaomi 10 1 2020-01-01 2000
6 洗碗机 2 2020-02-01 2000
9 自行车 3 2020-01-01 1000
订单明细表:order_detail
order_detail_id(订单明细id) order_id(订单id) sku_id(商品id) create_date(下单日期) price(商品单价) sku_num(商品件数)
1 1 1 2021-09-30 2000.00 2
2 1 3 2021-09-30 5000.00 5
22 10 4 2020-10-02 6000.00 1
23 10 5 2020-10-02 500.00 24
24 10 6 2020-10-02 2000.00 5
建表语句:
CREATE TABLE `sku_info`(
`sku_id` bigint,
`name` string,
`category_id` bigint,
`from_date` string,
`price` bigint)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
CREATE TABLE `order_detail`(
`order_detail_id` bigint,
`order_id` bigint,
`sku_id` bigint,
`create_date` string,
`price` double,
`sku_num` bigint)ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
插入样例数据:
Insert into sku_info
Select 1,'xiaomi 10',1,'2020-01-01',2000
Union all
Select 6,'洗碗机',2,'2020-02-01',2000
Union all
Select 9,'自行车',3,'2020-01-01',1000;
Insert into category_info
Select 1,'数码'
Union all
Select 2,'厨卫'
Union all
Select 3,'户外';
Insert into order_detail
Select 1,1,1,'2021-09-30',2000.00,2
Union all
Select 2,1,3,'2021-09-30',5000.00,5
Union all
Select 22,10,4,'2020-10-02',6000.00,1
Union all
Select 23,10,5,'2020-10-02',500.00,24
Union all
Select 24,10,6,'2020-10-02',2000.00,5;
需求分析:
select category_id,s1.sku_id,name,sum(sku_num) as sum_num from order_detail s1 join sku_info s2 on s1.sku_id=s2.sku_id group by category_id,s1.sku_id,name
首先获取到每个商品对应的品类号、名称及销量;
select category_id,sku_id,name,sum_num,avg(sum_num) over(partition by category_id) as cate_avg_num from (select category_id,s1.sku_id,name,sum(sku_num) as sum_num from order_detail s1 join sku_info s2 on s1.sku_id=s2.sku_id group by category_id,s1.sku_id,name)s3 group by category_id,sku_id,name,sum_num
其次对于每个品类开窗求得每个品类的销量均值;
select sku_id,name,sum_num,cast(cate_avg_num as bigint) as cate_avg_num from (select category_id,sku_id,name,sum_num,avg(sum_num) over(partition by category_id) as cate_avg_num from (select category_id,s1.sku_id,name,sum(sku_num) as sum_num from order_detail s1 join sku_info s2 on s1.sku_id=s2.sku_id group by category_id,s1.sku_id,name)s3 group by category_id,sku_id,name,sum_num)rs where sum_num>cate_avg_num
之后取出每个销量大于其品类销量均值的商品列表,并将销量均值由float转为bigint;
最终SQL:
select
sku_id,
name,
sum_num,
cast(cate_avg_num as bigint) as cate_avg_num
from
(
select
category_id,
sku_id,
name,
sum_num,
avg(sum_num) over(partition by category_id) as cate_avg_num
from
(
select
category_id,
s1.sku_id,
name,
sum(sku_num) as sum_num
from
order_detail s1
join sku_info s2 on s1.sku_id = s2.sku_id
group by
category_id,
s1.sku_id,
name
) s3
group by
category_id,
sku_id,
name,
sum_num
) rs
where
sum_num > cate_avg_num
AC时间:
2022年12月20日22:29:58