有如下数据表dz_table:biz_code业务方,record_date清算日期,trade_type交易类型,merchant_id商户号,aggregate_type=10代表业务方对账记录,reco_status=1代表对平的记录。

对于同一天同一业务方的对账记录中可能会有多条,因为它的商户号或交易类型可能不同,对于这样的数据表,要求查找出满足连续10天对平的业务方。
首先查找出在同一天全部是对平记录的业务方,我们构建一个临时表如下,接下来的操作都是建立在这张临时表之上的:
select
biz_code,
record_date,
sum(reco_status = 1) dp, -- 对平的数量,实际没有用到
sum(reco_status != 1) bp -- 未对平的数量,等于0代表对平
from dz_table
where aggregate_type = 10 and biz_code != '' -- 只关心业务方数据
group by biz_code, record_date -- 根据业务方和日期分组
having bp = 0 -- 这样就能去掉同一业务方同一天但是有未对平记录的业务方
接下来就是从上面临时表中按业务方取得清算日期连续的记录,这里先不考虑连续10天,将所有大于一天的连续记录取出如下:
select
biz_code,
record_date
from (
select
biz_code,
record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table2
where exists(
select
my_table3.biz_code,
my_table3.record_date
from
(
select
biz_code,
record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table3
group by my_table3.biz_code, my_table3.record_date
having my_table3.record_date = my_table2.record_date + 1 and my_table3.biz_code = my_table2.biz_code -- 满足日期连续的条件
)
my_table2和my_table3就是步骤一中的临时表,在exists子句中,在临时表my_table3中对biz_code和record_date分组得出的就是同一天中所有记录都是对平的业务方,这里看似和my_table3没什么区别,其实是为了having子句服务的,将having子句写到my_table3中会造成找不到临时表my_table2,所以不得不在包一层写having子句,having子句就是保证统一业务方连续的条件。
但是上面having子句会有一个问题:那就是my_table2中连续记录的最后一条记录会丢失。所以下面就是将连续记录的最后一条补全。
select
biz_code,
max(record_date) + 1 record_date
from (
select
biz_code,
record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table2
where exists(
select
my_table3.biz_code,
my_table3.record_date
from
(
select
biz_code,
record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table3
group by my_table3.biz_code, my_table3.record_date
having my_table3.record_date = my_table2.record_date + 1 and my_table3.biz_code = my_table2.biz_code
)
group by biz_code
上面这句sql看似将连续记录的最后一条补全了,其实不然加入有一个业务方1到10连续,15号到25号连续,那么这段sql补全的是后一段的25号,所以上面这段sql是不能满足要求的。
还记得我们满足连续的条件吗:
my_table3.record_date = my_table2.record_date + 1 and my_table3.biz_code = my_table2.biz_code这么写丢弃连续记录的最后一条,那么我们在写一个丢弃连续记录的第一条,这两个条件组成一个或的形式不就找出了所有连续的记录吗!
select
biz_code,
record_date
from (
select
biz_code,
str_to_date(record_date, '%Y%m%d') record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table2
where exists(
select
my_table3.biz_code,
my_table3.record_date
from
(
select
biz_code,
record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table3
group by my_table3.biz_code, my_table3.record_date
having (my_table3.record_date = date_add(my_table2.record_date, INTERVAL 1 DAY) or
my_table3.record_date = date_add(my_table2.record_date, INTERVAL -1 DAY)) and
my_table3.biz_code = my_table2.biz_code
)
那么接下来是不是在group by biz_code下,满足max(record_date)-min(record_date)>=9就行了呢?
不行,还是因为断档问题,1到5连续,15到20连续,那么就是20减1显然是错误结果。
我们是不是可以将每段连续数据分个组呢,然后group by这个组在做max(record_date)-min(record_date)>=9不就行了吗!!!
所以我们的思路是将上述sql查询的数据依次生成一个连续的编号,相同的record_date-编号的差为一组。
select
biz_code,
record_date,
@row_num := @row_num + 1 as row_num
from
(
select
biz_code,
record_date
from (
select
biz_code,
str_to_date(record_date, '%Y%m%d') record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table2
where exists(
select
my_table3.biz_code,
my_table3.record_date
from
(
select
biz_code,
record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table3
group by my_table3.biz_code, my_table3.record_date
having (my_table3.record_date = date_add(my_table2.record_date, INTERVAL 1 DAY) or
my_table3.record_date = date_add(my_table2.record_date, INTERVAL -1 DAY)) and
my_table3.biz_code = my_table2.biz_code
)
) my_table, (select @row_num := 0) as row_num
效果如下:
...
有了编号,就可以分组了,看一下分组后连续的范围。
select
biz_code,
min(record_date) min,max(record_date) max
from
(
select
biz_code,
record_date,
@row_num := @row_num + 1 as row_num
from
(
select
biz_code,
record_date
from (
select
biz_code,
str_to_date(record_date, '%Y%m%d') record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table2
where exists(
select
my_table3.biz_code,
my_table3.record_date
from
(
select
biz_code,
record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table3
group by my_table3.biz_code, my_table3.record_date
having (my_table3.record_date = date_add(my_table2.record_date, INTERVAL 1 DAY) or
my_table3.record_date = date_add(my_table2.record_date, INTERVAL -1 DAY)) and
my_table3.biz_code = my_table2.biz_code
)
) my_table, (select @row_num := 0) as row_num
) b group by biz_code,date_add(record_date, INTERVAL -row_num DAY);
结果如下:

最后加上连续10天的条件,完整sql如下:
select
biz_code,
min(record_date) min,max(record_date) max
from
(
select
biz_code,
record_date,
@row_num := @row_num + 1 as row_num
from
(
select
biz_code,
record_date
from (
select
biz_code,
str_to_date(record_date, '%Y%m%d') record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table2
where exists(
select
my_table3.biz_code,
my_table3.record_date
from
(
select
biz_code,
record_date,
sum(reco_status = 1) dp,
sum(reco_status != 1) bp
from dz_table
where aggregate_type = 10 and biz_code != ''
group by biz_code, record_date
having bp = 0
) my_table3
group by my_table3.biz_code, my_table3.record_date
having (my_table3.record_date = date_add(my_table2.record_date, INTERVAL 1 DAY) or
my_table3.record_date = date_add(my_table2.record_date, INTERVAL -1 DAY)) and
my_table3.biz_code = my_table2.biz_code
)
) my_table, (select @row_num := 0) as row_num
) my_table4
group by biz_code,date_add(record_date, INTERVAL -row_num DAY)
having max-min>=9;
结果:

449

被折叠的 条评论
为什么被折叠?



