聚合函数是hive内置函数,聚合函数对一组数据进行计算,并返回单个值。
常用聚合函数
count 计数
- count(*) --统计检索出的行的个数,包括NULL值的行
- count(col) --返回指定列非空值个数
- count(distinct col[, col_.]) --返回指定列不同的非空值的个数
sum 求和
- sum(col) --col累计加和
- sum(distinct col) --不同col值累计加和
max 最大值
- max(col) --col中最大值
min 最小值
- min(col) --col中最小值
avg 均值
- avg(col) --col的平均值
- avg(distinct col) --不同col值相加的平均值
collect_list/collect_set
其返回值类型是 array
- collect_list(col) --对col列做归并计算,不去重
- collect_set(col) --对col列做归并计算,去重
select name,collect_list(hobby) as hobby,collect_set(hobby) as distinct_hobby
from
(
select '李一' as name,'乒乓球' as hobby union all
select '李一' as name,'跑步' as hobby union all
select '赵小小' as name,'铅球' as hobby union all
select '赵小小' as name,'跑步' as hobby union all
select '赵小小' as name,'跑步' as hobby
) t
group by name;
结果:
+-------+-------------------+-----------------+--+
| name | hobby | distinct_hobby |
+-------+-------------------+-----------------+--+
| 李一 | ["乒乓球","跑步"] | ["乒乓球","跑步"] |
| 赵小小 | ["铅球","跑步","跑步"] | ["铅球","跑步"] |
+-------+-------------------+-----------------+--+