Hive SQL整理

collect_set

  • 作用:把一个字段中的值组合到一个集合中,搭配where(筛选)、group by使用;
  • 用途:可将非group by字段进行转换,使其可以查询;
  • 示例:
-- id:用户;time:登陆日期
select id,collect_set(time) as t from t_action_login where time<='20150906' group by id

该查询结果如下:

  123@163.com                                                                                      | ["20150620","20150619"]                                                                                    |
| abc@163.com                                                                                      | ["20150816"]                                                                                               |
| cde@qq.com                                                                                       | ["20150606","20150608","20150607","20150609","20150613","20150610","20150616","20150615"]                  |
| 789@sohu.com                                                                                     | ["20150827","20150623","20150627","20150820","20150823","20150612","20150717"]                             |
| 987@163.com                                                                                      | ["20150701","20150829","20150626","20150625","20150726","20150722","20150629","20150824","20150716","20150 |
| ddsf@163.com                                                                                     | ["20150804","20150803","20150801","20150809","20150807","20150806","20150905","20150904","20150730","20150 |
| 182@163.com                                                                                      |["20150803","20150801","20150809","20150808","20150805","20150806","20150906","20150904","20150730","20150 |
| 22225@163.com                                                                                    | ["20150604","20150609","20150622","20150827","20150625","20150620","20150613","20150610","20150614","20150 |
| 18697@qq.com                                                                                     | ["20150902"]                                                                                               |
| 1905@qq.com                                                                                      | ["20150709"]                                       


该结果显示每个用户在2015年9月6日之前的全部登陆时间

  • 场景应用:在hive中求出一个数据表中在2015年9月6日首次登陆的人数
    对应SQL语句:
select count(a.id) from (select id,collect_set(time) as t from t_action_login where time<='20150906' group by id) as a where size(a.t)=1 and a.t[0]='20150906'

注意:size(a.t)=1 and a.t[0]='20150906'

参考:https://blog.csdn.net/liyantianmin/article/details/48262109

group by

  • hive不允许直接查询访问非group by的字段,错误示例:

select name,sex from people group by sex;
该语句会报错,因为name是非group by字段,解决方法如下:

  1. 只查询group by字段:
    select sex from people group by sex;
  2. 对于非group by字段,只通过聚合函数查询:
    select sex,count(name) from people group by sex;
  3. 对于非group by字段,通过collect_set等函数转换后查询:
    select sex,collect_set(name)[0] as name from people group by sex;
    注:可以用原字段名称;

grouping sets

  • 用途:可以按照多个标准进行分组,增强group by的功能;
  • 示例:查询每个部门每类工作平均工资、每个部门的平均工资及整个企业中每个雇员的平均工资
select department_id,job_id,avg(salary)
from hr.employees
group by grouping sets ((department_id,job_id),department_id,()); 
-- ()表示对整体进行一次聚类
-- 也可以写成:
select department_id,job_id,avg(salary)
from hr.employees
group by department_id,grouping sets ((department_id,job_id),());

limit

  • 用途:限制查询/返回的条数;
  • 用法:
    • 指定初始位置: limit 初始位置,条数
    • 不指定初始位置: limit 条数
    • 由offset指定初始位置: limit 条数 offset 初始位置
  • 说明:
    • 初始位置:下标(从0开始),表示从该位置开始返回;
    • 条数:表示返回的记录数

参考:https://blog.csdn.net/qq_33801641/article/details/107196036

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值