例子1:
select * from business;
jack 2017-01-01 10
jack 2017-01-02 10
tom 2017-01-01 10
tom 2017-01-02 10
tom 2017-01-03 10
andy 2017-01-01 10
andy 2017-01-02 10
andy 2017-01-03 10
lucy 2017-01-01 10
lucy 2017-01-02 10
lucy 2017-01-03 10
jack 2017-01-03 10
jack 2017-02-01 10
jack 2017-02-02 10
tom 2017-02-01 10
tom 2017-02-02 10
tom 2017-02-03 10
andy 2017-02-01 10
求二月份购买的顾客以及总人数:
方法1: 注意无条件join的方法1=1
select t1.name, t2.count
from
(select name from business where substr(orderdate,1,7)="2017-02" group by name) t1
left join
(select count(distinct name) count from business where substr(orderdate,1,7)="2017-02") t2
on 1 = 1;
方法二:
select name, count(*) over() from business where substr(orderdate,1,7)="2017-02" group by name;
最终输出:
OK
andy 3
jack 3
tom 3
Time taken: 19.661 seconds, Fetched: 3 row(s)
例子2
hive> select * from person_info1;
悟空 A 男
大海 A 男
宋宋 B 男
凤姐 A 女
婷姐 B 女
婷婷 B 女
hive> desc person_info1;
OK
name string
type string
sex string
需求:需要将type 的男女各个数量区分开来如下。
方法1:
select type,
sum(case when sex = '男' then 1 else 0 end) male_count,
sum(case when sex = '女' then 1 else 0 end) female_count
from person_info1
group by type;
方法2:
select type,
sum(case sex when '男' then 1 else 0 end) male_count,
sum(case sex when '女' then 1 else 0 end) female_count
from person_info1
group by type;
type 男数量 女数量
A 2 1
B 1 2
例子3
窗口函数,如果over里面一旦有sort by 或者order by那么前面的count(*) 就是累加的和,注意时累加的和!!!
hive> desc test33;
OK
dt string
order_id string
user_id string
amount decimal(10,2)
Time taken: 0.063 seconds, Fetched: 4 row(s)
===================================================================
hive> select * from test33;
OK
2017-01-01 10029028 1000003251 33.57
2017-01-01 10029029 1000003251 33.57
2017-01-01 100290288 1000003252 33.57
2017-02-02 10029088 1000003251 33.57
2017-02-02 100290281 1000003251 33.57
2017-02-02 100290282 1000003253 33.57
2017-11-02 10290282 100003253 234
2018-11-02 10290284 100003243 234
2017-01-00 10029028 1000003251 33.57
2017-02-03 100290281 1000003251 33.57
Time taken: 0.082 seconds, Fetched: 10 row(s)
==========================================================
select *, count(1) over(distribute by user_id sort by dt asc) from test33;
本质上是省略了这个rows between unbounded preceding and current row
即:
select *, count(1) over(distribute by user_id sort by dt asc
rows between unbounded preceding and current row
) from test33;
2017-01-00 10029028 1000003251 33.57 1
2017-01-01 10029028 1000003251 33.57 3
2017-01-01 10029029 1000003251 33.57 3
2017-02-02 10029088 1000003251 33.57 5
2017-02-02 100290281 1000003251 33.57 5
2017-02-03 100290281 1000003251 33.57 6
2017-01-01 100290288 1000003252 33.57 1
2017-02-02 100290282 1000003253 33.57 1
2018-11-02 10290284 100003243 234 1
2017-11-02 10290282 100003253 234 1
Time taken: 6.653 seconds, Fetched: 10 row(s)
============================================================
select *, count(1) over(distribute by user_id sort by user_id asc) from test33;
2017-01-00 10029028 1000003251 33.57 6
2017-01-01 10029028 1000003251 33.57 6
2017-01-01 10029029 1000003251 33.57 6
2017-02-02 10029088 1000003251 33.57 6
2017-02-02 100290281 1000003251 33.57 6
2017-02-03 100290281 1000003251 33.57 6
2017-01-01 100290288 1000003252 33.57 1
2017-02-02 100290282 1000003253 33.57 1
2018-11-02 10290284 100003243 234 1
2017-11-02 10290282 100003253 234 1
Time taken: 24.297 seconds, Fetched: 10 row(s)