前言:Hive ql自己设计总结
1,遇到复杂的查询情况,就分步处理。将一个复杂的逻辑,分成几个简单子步骤处理。
2,但能合在一起的,尽量和在一起的。比如同级别的多个concat函数合并一个select
也就是说,字段之间是并行的同级别处理,则放在一个hive ql;而字段间有前后处理逻辑依赖(判断、补值、计算)则可分步执行,提前将每个字段分别处理好,然后进行相应的分步简单逻辑处理。
一、 场景:日志中region数据处理(国家,省份,城市)
select city_id,province_id,country_id
from wizad_mdm_cleaned_hdfs
where city_id = '' or country_id = '' or province_id = ''
group by city_id,province_id,country_id
二 、发现日志中有空数据:
38 1
73 1
75 1
64 81
76 1
(全空)
77
三、设定过滤逻辑
if country_id=''
if province_id != '' then
if city_id = '' thenCONCAT('region_','1','_',province_id)
elseCONCAT('region_','1','_',province_id,'_',city_id)
else
if city_id != '' thenCONCAT('region_','1','_',parent_region_id,'_',city_id)
else
if province_id=''
if city_id !='' thenCONCAT('region_',country_id,'_',parent_region_id,'_',city_id)
四、hive ql实现
SET mapred.queue.names=queue3;
SET mapred.reduce.tasks=14;
DROP TABLE IF EXISTS test_lmj_mdm_tmp1;
CREATE TABLE test_lmj_mdm_tmp1 AS
SELECT
guid,
(CASE country_id
WHEN '' THEN (CASE WHEN province_id='' THENIF(city_id = '','',CONCAT('region_','1','_',parent_region_id,'_',city_id)) ELSEIF(city_id='',CONCAT('region_','1','_',province_id),CONCAT('region_','1','_',province_id,'_',city_id))END)
ELSE (CASE when province_id='' THENIF(city_id='',CONCAT('region_',country_id),CONCAT('region_',country_id,'_',parent_region_id,'_',city_id))ELSE IF(city_id = '', CONCAT('region_',country_id,'_',province_id),CONCAT('region_',country_id,