4.HIVE函数

菠萝橡皮刀

已于 2023-05-09 10:33:35 修改

阅读量702

点赞数

文章标签： hive hadoop 数据仓库

于 2023-05-08 16:10:24 首次发布

本文链接：https://blog.csdn.net/m0_58420188/article/details/130545293

版权

文章介绍了Hive中处理空值的nvl和coalesce函数，以及如何使用if和case进行分支控制。此外，还详细讲解了如何进行行转列和列转行的数据转换，包括collect_set、collect_list、concat及explode等函数的应用。最后，文章提供了一个实战例子，展示如何根据天气状态对国家进行分类。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.hive函数

1.1 空值替换

两个输入：nvl(col,default_num) : 如果colum不为null，返回col.否则返回default_num

多个输入：coalesce(col1, col2, col3, ....) ：从左到右找第一个不为null的值

例如：求所有员工的平均薪水

select avg(nvl(salary, 0 ))from emp;

因为avg()会自动忽略null,这样可以保证null也参与了运算

又例如：

select ename, job, sal, coalesce(job, sal, '啥也没有') from emp;

解释：如果有job, 就输出job;如果有sal,就输出sal，什么都没有就输出“啥都没有”

1.2 分支控制

数据准备：

create table emp_sex(
    name string,     --姓名
    dept_id string, --部门id
    sex string       --性别
) 
row format delimited fields terminated by "\t";

load data local inpath '/opt/module/hive/datas/emp_sex.txt' 
into table emp_sex;

1.2.1 if函数：

if (boolean, result1, result2) 如果boolean为真，返回result1，否则返回result2

例如：

统计emp_sex表各部门男女的人数

select  dpt_id
        count(if(sex="男")，name,null)  male,
        count(if(sex="女")，name,null)  female
from emp_sex
group by dpt_id;

1.2.2 case函数：

-- case col 
--   when value1 then result1 
--   when value2 then result2 
--   else result3 
--   end
-- 如果col值为value1，返回result1；如果值为value2，返回result2；否则返回result3

-- case when
--   boolean1 then result1
--   boolean2 then result2
--   else result3
--   end
-- 如果boolean1为真，返回result1；如果boolean1为假，boolean2为真，返回result2；
   否则返回result3

例如：

统计emp_sex表各部门男女的人数

SELECT dept_id
       count(case sex when '男' then name else null end) male,
       count(case when sex='女' then name else null end) female
from emp_sex
group by dept_id;

1.3 行列转换

1.3.1 行转列

聚合函数	colect_set(col)	set会去重
聚合函数	colect_list(col)	list不会去重
字符串拼接	concat(V1,V2,V3)	字符串拼接

举例：

-- 数据准备
create table person_info(
    name string,            -- 姓名
    constellation string, -- 星座
    blood_type string      -- 血缘
) 
row format delimited fields terminated by "\t";

load data local inpath "/opt/module/hive/datas/constellation.txt" 
into table person_info;

要求：把星座和血型一样的人归类到一起，结果如下：

ps.行转列：大海和凤姐以前是同一列的人，现在变成同一行了

初始思路：

select constellation, 
       blood_type,
       collect_list(name) names
from person_info
group by constellation, blood_type;

进一步引申为：

select concat(constellation, ',', blood_type) xzxx,  //字符串拼接
       concat_ws('|', collect_list(name)) names     //concat_WithSeperater
from person_info
group by constellation, blood_type;

1.3.2 列转行

数据准备：

需求：将电影分类中的数组数据展开

explode(array或map) ：将一行输入变成多行多列，如果是array,就是一列，是map，就是多列

split(str, 分隔符) ：将str按照指定分隔符分成字符串数组

使用格式：from 原表格 lateral view UDTF函数

select m.movie,
       tbl.category_id
from movie_info m
lateral view explode(split(category, ',')) tbl as category_id;
// explode后的表格命名为tbl,其中只有一列，命名为categroy_id

将上述表转化成category_id,movies:

ps.在原来的表上套一个子查询：

练习：根据下表，查一下结果name,child,age.

select name ,child ,age

from test lateral view  explode(children) tbl as child, age

2.练习题

准备数据集：

返回不同国家11月份的天气状况：

天气根据weather_state决定，avg(weather_state)<=15为寒冷，<25为温暖，>=25为炎热

通过子查询的方式来写：

先写子语句：

select country_id,avg(weather_state) avg_w
from weather  
where substring(day,1,7) ='2017-04'
group by country_id;

然后写主语句：

select  country_name
            case  when avg_w<=15 then '寒冷'
                  when  avg_w>15 and avg_w<=25 then '温暖'
                  else '炎热‘ end
from countries c
join (
select country_id,avg(weather_state) avg_w
from weather  
where substring(day,1,7) ='2019-11'
group by country_id;
) t1 
on c.country_id = t1.country_id