SQL之推荐商品问题

目录

0 需求

1 建表

2 数据分析

3 小结


0 需求

给定一个用户购买一次商品的记录,返回每个用户可能想要购买的商品。如果其余用户与这个用户购买至少两个相同的商品,则其余用户购买、这个用户没有购买的商品,就是这个用户可能想要购买的商品 。

数据如下:

用户id、商品id
A 1
A 2
A 1
A 3
B 2
B 3
B 4
B 5
B 2
C 1
C 2
C 1
D 1
D 3
D 6

1 建表

create table product as 
select 'A' as user_id,'1' product_id
UNION ALL
select 'A' as user_id,'2' product_id
UNION ALL
select 'A' as user_id,'1' product_id
UNION ALL
select 'A' as user_id,'3' product_id
UNION ALL
select 'B' as user_id,'2' product_id
UNION ALL
select 'B' as user_id,'3' product_id
UNION ALL
select 'B' as user_id,'4' product_id
UNION ALL
select 'B' as user_id,'5' product_id
UNION ALL
select 'B' as user_id,'2' product_id
UNION ALL
select 'C' as user_id,'1' product_id
UNION ALL
select 'C' as user_id,'2' product_id
UNION ALL
select 'C' as user_id,'1' product_id
UNION ALL
select 'D' as user_id,'1' product_id
UNION ALL
select 'D' as user_id,'3' product_id
UNION ALL
select 'D' as user_id,'6' product_id

2 数据分析

第一步对表中的数据去重,按用户、商品维度

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
user_id product_id
A       1
A       2
A       3
B       5
B       4
B       3
B       2
C       2
C       1
D       6
D       1
D       3

(2) 如何知道别的用户与该用户购买了相同的商品,要找出这种血缘关系,一般都是自关联

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
user_id1        user_id2        a.product_id
A       C       1
A       D       1
A       B       2
A       C       2
A       B       3
A       D       3
B       A       2
B       C       2
B       A       3
B       D       3
C       A       1
C       D       1
C       A       2
C       B       2
D       A       1
D       C       1
D       A       3
D       B       3

(3)通过步骤2可以找出与该用户购买相同商品的所有用户,找出两两用户购买至少2个相同商品的用户

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
,t2 as
(
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
)
select user_id1,user_id2
from t2
group by user_id1,user_id2
having count(1) >=2
user_id1        user_id2
A       B
A       C
A       D
B       A
C       A
D       A

经过步骤3可以得到购买相同商品次数超过2次的相同倾向用户关系表

(4)根据关系表,获取该用户及具有相同倾向的用户所购买的商品

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
,t2 as
(
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
)
,t3 as
(select user_id1,user_id2
from t2
group by user_id1,user_id2
having count(1) >=2
)
select t3.user_id1,t3.user_id2,a.product_id product_id_2
from t3
left join t1 a
on t3.user_id2 = a.user_id
t3.user_id1     t3.user_id2     product_id_2
A       B       2
A       B       3
A       B       4
A       B       5
A       C       1
A       C       2
A       D       1
A       D       3
A       D       6
B       A       1
B       A       2
B       A       3
C       A       1
C       A       2
C       A       3
D       A       1
D       A       2
D       A       3

找出该用户应该向他推荐的商品(商品推荐会有重复)

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
,t2 as
(
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
)
,t3 as
(select user_id1,user_id2
from t2
group by user_id1,user_id2
having count(1) >=2
)
select user_id1,product_id_2
from
(select t3.user_id1,t3.user_id2,a.product_id product_id_2
from t3
left join t1 a
on t3.user_id2 = a.user_id
) t
group by user_id1,product_id_2
user_id1        product_id_2
A       1
A       2
A       3
A       4
A       5
A       6
B       1
B       2
B       3
C       1
C       2
C       3
D       1
D       2
D       3

(5)计算差值,求出准确推荐的商品,hive中计算差值的方法用left join+ is null来判断获取

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
,t2 as
(
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
)
,t3 as
(select user_id1,user_id2
from t2
group by user_id1,user_id2
having count(1) >=2
)
,t4 as
(select user_id1,product_id_2
from
(select t3.user_id1,t3.user_id2,a.product_id product_id_2
from t3
left join t1 a
on t3.user_id2 = a.user_id
) t
group by user_id1,product_id_2
) 
select t4.user_id1 as user_id,t4.product_id_2 as product_id
from t4
left join t1
on t4.user_id1 = t1.user_id
and t4.product_id_2 = t1.product_id
where t1.product_id is null
user_id product_id
A       4
A       5
A       6
B       1
C       3
D       2

3 小结

本题主要考察对关联的认识,通过各种关联变换获取结果。通过本题可以获得认识:要获取表中数据之间的相互关系只能进行自关联获取;要想得到差集,需要通过left join+is null形式获取,hive中没有数组的交集、差集、并集的函数,因此只能采用关联得到结果。

  • 2
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值