思路
整个sql考一个sum if 一个开窗函数
if(a=b,1,2) 三目运算符,三元表达式 如果a=b为true,返回1,否则返回false.
如果满足条件计算,不满足条件不计数
第一步 查询出各个用户 一月份,二月份和三月份并且三月份订单金额大于10,个数
select uid,
sum(if(date_format(otime,'yyyy-MM')='2018-01' ,1,0)) month1count,
sum(if(date_format(otime,'yyyy-MM')='2018-02' ,1,0)) month2count,
sum(if(date_format(otime,'yyyy-MM')='2018-03' and oamount>10 ,1,0)) month3gt10count
from t_order
group by uid
结果一
uid month1count month2count month3gt10count
1 1 1 2
6 0 2 0
3 1 3 0
5 0 1 0
4 2 0 2
2 3 0 1
第二步
查询出那个用户1月份下过单,二月份没下过单,月份下单金额大于10的用户
在第一步的基础上加了个having
select uid,
sum(if(date_format(otime,'yyyy-MM')='2018-01' ,1,0)) month1count,
sum(if(date_format(otime,'yyyy-MM')='2018-02' ,1,0)) month2count,
sum(if(date_format(otime,'yyyy-MM')='2018-03' and oamount>10 ,1,0)) month3gt10count
from t_order
group by uid
having month1count>0 and month2count=0
第二步结果
uid month1count month2count month3gt10count
4 2 0 2
2 3 0 1
第三步 在一个范围内进行排序,使用开窗函数
需要提前开窗 取首单和末单,我关心的是时间顺序
开窗函数做分组排序的目的是,没一行数据在小组的排名取出来,如果有排名的话,一过滤就出来了.
按年和月进行分组,按照时间进行排序
嵌套了个子查询 先查询出子查询的来
sql语句
把相同日期的时间放到一个分区,得到monthcout按月份进行的统计
按照日期进行分组,并且排序的
select o.* , row_number()over(partition by date_format(otime,'yyyy-MM') order by otime) rk ,
count(*)over(partition by date_format(otime,'yyyy-MM') ) monthcount
from t_order o
结果
oid uid otime oamount rk monthcount
1001 1 2018-03-01 120 1 7
1003 2 2018-03-02 11 2 7
1002 2 2018-03-03 5 3 7
1004 3 2018-03-03 1 4 7
1005 4 2018-03-04 20 5 7
1007 1 2018-03-04 50 6 7
1006 4 2018-03-09 30 7 7
1001 5 2018-02-01 110 1 7
1002 3 2018-02-01 110 2 7
1004 3 2018-02-03 20 3 7
1003 1 2018-02-03 100 4 7
1005 3 2018-02-04 30 5 7
1006 6 2018-02-04 100 6 7
1007 6 2018-02-04 130 7 7
1003 2 2018-01-01 100 1 7
1004 2 2018-01-02 20 2 7
1005 2 2018-01-02 100 3 7
1006 4 2018-01-02 30 4 7
1007 1 2018-01-03 130 5 7
1008 4 2018-01-03 5 6 7
1009 3 2018-01-03 5 7 7
下一步
select o.* , row_number()over(partition by date_format(otime,'yyyy-MM'),uid order by otime) rk ,
count(*)over(partition by date_format(otime,'yyyy-MM'),uid ) monthcount
from t_order o
oid uid otime oamount rk monthcount
1005 4 2018-03-04 20 1 2
1006 4 2018-03-09 30 2 2
1007 1 2018-01-03 130 1 1
1003 2 2018-01-01 100 1 3
1004 2 2018-01-02 20 2 3
1005 2 2018-01-02 100 3 3
1001 5 2018-02-01 110 1 1
1003 1 2018-02-03 100 1 1
1001 1 2018-03-01 120 1 2
1007 1 2018-03-04 50 2 2
1006 6 2018-02-04 100 1 2
1007 6 2018-02-04 130 2 2
1004 3 2018-03-03 1 1 1
1002 3 2018-02-01 110 1 3
1004 3 2018-02-03 20 2 3
1005 3 2018-02-04 30 3 3
1009 3 2018-01-03 5 1 1
1006 4 2018-01-02 30 1 2
1008 4 2018-01-03 5 2 2
1003 2 2018-03-02 11 1 2
1002 2 2018-03-03 5 2 2
最后的sql语句 使用group by 前面必须要有这个字段
select uid,
sum(if(date_format(otime,'yyyy-MM')='2018-01' ,1,0)) month1count,
sum(if(date_format(otime,'yyyy-MM')='2018-02' ,1,0)) month2count,
sum(if(date_format(otime,'yyyy-MM')='2018-03' and oamount>10 ,1,0)) month3gt10count,
sum(if( oo.rk=1 and date_format(otime,'yyyy-MM')='2018-03' ,oamount,0)) firstamount,
sum(if( oo.rk=monthcount and date_format(otime,'yyyy-MM')='2018-03' ,oamount,0)) lastamount
from ( select o.* , row_number()over(partition by date_format(otime,'yyyy-MM'),uid order by otime) rk ,
count(*)over(partition by date_format(otime,'yyyy-MM'),uid ) monthcount
from t_order o) oo
group by uid
having month1count>0 and month2count=0
第三步结果 查询出一月份下过单,二月份没有下过单,三月份订单金额超过10的订单数,三月份首次下单,三月份最后一次下单的金额.
uid month1count month2count month3gt10count firstamount lastamount
4 2 0 2 20 30
2 3 0 1 11 5
搭建环境做数据
[pp@hadoop101 spark]$ bin/spark-sql
创建表,并且把数据导入进去
create table t_order
(oid int ,
uid int ,
otime date,
oamount int
)
insert into table t_order values(1003,2,'2018-01-01',100);
insert into table t_order values(1004,2,'2018-01-02',20);
insert into table t_order values(1005,2,'2018-01-02',100);
insert into table t_order values(1006,4,'2018-01-02',30);
insert into table t_order values(1007,1,'2018-01-03',130);
insert into table t_order values(1008,4,'2018-01-03',5);
insert into table t_order values(1009,3,'2018-01-03',5);
insert into table t_order values(1001,5,'2018-02-01',110);
insert into table t_order values(1002,3,'2018-02-01',110);
insert into table t_order values(1003,1,'2018-02-03',100);
insert into table t_order values(1004,3,'2018-02-03',20);
insert into table t_order values(1005,3,'2018-02-04',30);
insert into table t_order values(1006,6,'2018-02-04',100);
insert into table t_order values(1007,6,'2018-02-04',130);
insert into table t_order values(1001,1,'2018-03-01',120);
insert into table t_order values(1002,2,'2018-03-03',5);
insert into table t_order values(1003,2,'2018-03-02',11);
insert into table t_order values(1004,3,'2018-03-03',1);
insert into table t_order values(1005,4,'2018-03-04',20);
insert into table t_order values(1006,4,'2018-03-09',30);
insert into table t_order values(1007,1,'2018-03-04',50);
查询到的数据
oid uid otime oamount
1003 2 2018-01-01 100
1004 2 2018-01-02 20
1004 3 2018-02-03 20
1005 3 2018-02-04 30
1006 6 2018-02-04 100
1007 6 2018-02-04 130
1001 1 2018-03-01 120
1002 2 2018-03-03 5
1003 2 2018-03-02 11
1004 3 2018-03-03 1
1005 4 2018-03-04 20
1006 4 2018-03-09 30
1005 2 2018-01-02 100
1007 1 2018-03-04 50
1006 4 2018-01-02 30
1007 1 2018-01-03 130
1008 4 2018-01-03 5
1009 3 2018-01-03 5
1001 5 2018-02-01 110
1002 3 2018-02-01 110
1003 1 2018-02-03 100