1. hive查询语法
提示:在做小数据量查询测试时,可以让hive将mrjob提交给本地运行器运行,可以在hive会话中设置如下参数:
hive> set hive.exec.mode.local.auto=true;
1.1. 基本查询示例
select * from t_access;
select count(*) from t_access;
select max(ip) from t_access;
1.2. 条件查询
select * from t_access where access_time<'2017-08-06 15:30:20'
select * from t_access where access_time<'2017-08-06 16:30:20' and ip>'192.168.33.3';
1.3. join关联查询示例
假如有a.txt文件
a,1 b,2 c,3 d,4 |
假如有b.txt文件
a,xx b,yy d,zz e,pp |
进行各种join查询:
1、inner join(join)
select a.name as aname, a.numb as anumb, b.name as bname, b.nick as bnick from t_a a join t_b b on a.name=b.name |
结果:
+--------+--------+--------+--------+--+ | aname | anumb | bname | bnick | +--------+--------+--------+--------+--+ | a | 1 | a | xx | | b | 2 | b | yy | | d | 4 | d | zz | +--------+--------+--------+--------+--+ |
2、left outerjoin(left join)
select a.name as aname, a.numb as anumb, b.name as bname, b.nick as bnick from t_a a left outer join t_b b on a.name=b.name |
3、right outerjoin(right join)
select a.name as aname, a.numb as anumb, b.name as bname, b.nick as bnick from t_a a right outer join t_b b on a.name=b.name |
4、full outerjoin(full join)
select a.name as aname, a.numb as anumb, b.name as bname, b.nick as bnick from t_a a full join t_b b on a.name=b.name; |
1.4. left semi join
hive中不支持exist/IN子查询,可以用leftsemi join来实现同样的效果:
select a.name as aname, a.numb as anumb from t_a a left semi join t_b b on a.name=b.name; |
注意: left semi join的 select子句中,不能有右表的字段、
1.5. group by分组聚合
select dt,count(*),max(ip) as cnt fromt_access group by dt;
select dt,count(*),max(ip) as cnt fromt_access group by dt having dt>'20170804';
select
dt,count(*),max(ip) as cnt
from t_access
where url='http://www.edu360.cn/job'
group by dt having dt>'20170804';
注意: 一旦有group by子句,那么,在select子句中就不能有 (分组字段,聚合函数) 以外的字段;
## 为什么where必须写在group by的前面,为什么groupby后面的条件只能用having
因为,where是用于在真正执行查询逻辑之前过滤数据用的
having是对group by聚合之后的结果进行再过滤;
上述语句的执行逻辑:
1、where过滤不满足条件的数据
2、用聚合函数和groupby进行数据运算聚合,得到聚合结果
3、用having条件过滤掉聚合结果中不满足条件的数据
1.6. 子查询
select id,name,father
from
(select id,name,family_members['brother']as father from t_person) tmp
where father is not null;