分组查询
-- 有如下数据
-- 求8月4号以后,每天每个页面的总访问次数,及访问者中ip地址中最大的,且,只查询出总访问次数>2 的记录
/*
192.168.33.3,http://www.edu360.cn/stu,2017-08-04 15:30:20
192.168.33.3,http://www.edu360.cn/teach,2017-08-04 15:35:20
192.168.33.4,http://www.edu360.cn/stu,2017-08-04 15:30:20
192.168.33.4,http://www.edu360.cn/job,2017-08-04 16:30:20
192.168.33.5,http://www.edu360.cn/job,2017-08-04 15:40:20
192.168.33.3,http://www.edu360.cn/stu,2017-08-05 15:30:20
192.168.44.3,http://www.edu360.cn/teach,2017-08-05 15:35:20
192.168.33.44,http://www.edu360.cn/stu,2017-08-05 15:30:20
192.168.33.46,http://www.edu360.cn/job,2017-08-05 16:30:20
192.168.33.55,http://www.edu360.cn/job,2017-08-05 15:40:20
192.168.133.3,http://www.edu360.cn/register,2017-08-06 15:30:20
192.168.111.3,http://www.edu360.cn/register,2017-08-06 15:35:20
192.168.34.44,http://www.edu360.cn/pay,2017-08-06 15:30:20
192.168.33.46,http://www.edu360.cn/excersize,2017-08-06 16:30:20
192.168.33.55,http://www.edu360.cn/job,2017-08-06 15:40:20
192.168.33.46,http://www.edu360.cn/excersize,2017-08-06 16:30:20
192.168.33.25,http://www.edu360.cn/job,2017-08-06 15:40:20
192.168.33.36,http://www.edu360.cn/excersize,2017-08-06 16:30:20
192.168.33.55,http://www.edu360.cn/job,2017-08-06 15:40:20
*/
-- 方式1:用group by 进行查询
select dt,url,count(1) as cnts,max(ip)
from t_access
where dt>'2017-08-04'
group by dt,url having cnts>2;
结果:
+----------------+---------------------------------+-----------------------+--------------+--+
| t_access.ip | t_access.url | t_access.access_time | t_access.dt |
+----------------+---------------------------------+-----------------------+--------------+--+
| 192.168.33.46 | http://www.edu360.cn/job | 2017-08-05 16:30:20 | 2017-08-05 |
| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-05 15:40:20 | 2017-08-05 |
| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 2017-08-06 |
| 192.168.33.25 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 2017-08-06 |
| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 2017-08-06 |
+----------------+---------------------------------+-----------------------+--------------+
注意: 一旦有group by子句,那么,在select子句中就不能有 (分组字段,聚合函数) 以外的字段
为什么where必须写在group by的前面,为什么group by后面的条件只能用having
因为,where是用于在真正执行查询逻辑之前过滤数据用的
having是对group by聚合之后的结果进行再过滤;
上述语句的执行逻辑:
1、where过滤不满足条件的数据
2、用聚合函数和group by进行数据运算聚合,得到聚合结果
3、用having条件过滤掉聚合结果中不满足条件的数据
子查询
子查询就是在HiveSql语句中在嵌套一个查询语句:
select id,name,father
from
(select id,name,family_members['brother'] as father from t_person) tmp
where father is not null;```