一、基础语法
1.1、select … from … where …
注意:对于分区表,严格模式下,where必须对分区有描述
未描述报错如下:
select user_name from user_trade limit 10;
FAILED: SemanticException [Error 10056]: Queries against partitioned tables without a partition filter are disabled for safety reasons. If you know what you are doing, please set hive.strict.checks.no.partition.filter to false and make sure that hive.mapred.mode is not set to ‘strict’ to proceed. Note that you may get errors or incorrect results if you make a mistake while using some of the unsafe features. No partition predicate for Alias “user_trade” Table “user_trade”
select user_name from user_trade where dt = '2000-03-03';
1.2、group by
配合聚合函数使用:
- count() 计数
count(distinct …)去重计数 - sum()
- avg():
- max():
- min():
例:2019年一月到四月,每个品类有多少人购买,累计金额是多少
select goods_category,count(distinct user_name),sum(pay_amount)
from user_trade
where dt between '2019-01-01' and '2019-04-30'
group by goods_category;
聚合筛选group by … having
对分组后数据进行筛选
例:2019年4月支付超过5w的用户
select user_name,sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-04-30'
group by user_name
having sum(pay_amount)>50000;
order by 排序
order by … asc /desc 默认升序
注意:from>where>group by>having>select>order by语句执行顺序,对于重命名的部分尤其注意
hive3.1.1新版本可识别,旧版本会报错
例:支付金额排序
错误写法,order by后不能接原始名字,要写as后的
select user_name,sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-01-30'
group by user_name
order by sum(pay_amount) desc;
正确写法:
正确写法,order by后接as后的名字
select user_name,sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-01-30'
group by user_name
order by total_amount desc;