hive like在join中的应用一(基础)

我们先看看生产上的两张表:

1.用户cpu信息表:

create table test.test_user as
select '10001' as uid,'AMD A8-5600K APU with Radeon(tm) HD Graphics   _AMD64 Family 21 Model 16 Stepping 1_AuthenticAMD' as cpu
union all
select '10002' as uid,'Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz_Intel64 Family 6 Model 142 Stepping 9_GenuineIntel' as cpu
union all
select '10003' as uid,'Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz_Intel64 Family 6 Model 58 Stepping 9_GenuineIntel' as cpu
union all
select '10004' as uid,'AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G _AMD64 Family 21 Model 48 Stepping 1_AuthenticAMD' as cpu;

2.cpu评分配置表:

create table test.test_cpu_config as
select 'A8-5600K' as cpu,2 as cpu_grade
union all
select 'i3-7100' as cpu,2 as cpu_grade
union all
select 'E3-1230' as cpu,3 as cpu_grade
union all
select 'A10-7800' as cpu,2 as cpu_grade;

注:这两张表都是从生产环境中截取的部分有代表性的数据

问题:

现有一张用户cpu信息表,一张cpu评分配置表,需要通过cpu评分表将用户的cpu打上评分的标签

分析:

用户cpu信息:

AMD A8-5600K APU with Radeon(tm) HD Graphics   _AMD64 Family 21 Model 16 Stepping 1_AuthenticAMD

配置cpu信息:

A8-5600K
  1. 用户表cpu信息比较复杂且没有规律,里面包含部分配置表cpu信息
  2. 若通过截取cpu信息得到配置表相似的cpu信息比较困难
  3. 可以通过模糊匹配方式关联去解决,但模糊匹配方式关联是否可行存在疑问
测试代码:

关联条件:

on t1.cpu like concat('%',t2.cpu,'%')
spark-sql> select 
         >   t1.uid,
         >   t1.cpu,
         >   t2.cpu,
         >   t2.cpu_grade
         > from 
         > (select 
         >   uid,
         >   cpu
         > from test.test_user) t1
         > 
         > left join
         > (select 
         >   cpu,
         >   cpu_grade
         > from test.test_cpu_config) t2
         > on t1.cpu like concat('%',t2.cpu,'%');
10001   AMD A8-5600K APU with Radeon(tm) HD Graphics   _AMD64 Family 21 Model 16 Stepping 1_AuthenticAMD	A8-5600K	2
10002	Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz_Intel64 Family 6 Model 142 Stepping 9_GenuineIntel			i3-7100		2
10003	Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz_Intel64 Family 6 Model 58 Stepping 9_GenuineIntel			E3-1230		3
10004	AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G _AMD64 Family 21 Model 48 Stepping 1_AuthenticAMD	A10-7800	2
Time taken: 1.795 seconds, Fetched 4 row(s)

注:查看结果,用户cpu信息全部打上了评分的标签,方案可行

总结: 上述问题在实际操作中可能存在很多的坑
例如:A8-5600K可以模糊匹配到A8-5600A8-5600K,所以我们在关联的时候可以根据实际数据进行调整,例如:数据...AMD A8-5600K APU...A8-5600K两边有空格,我们模糊匹配条件可以改成这样on t1.cpu like concat('%',t2.cpu,'','%')

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值