声明:
数据库为impala
需求:
1、3个数据源有一个相同的字段“batch_no”,根据此相同字段将3个数据源中的数据放到一个报表展示
2、其中需展示的”所属账期“字段数据源中没有,根据数据源中的账期开始日和账期结束日判断,开始日取最早日期,结束日取最晚日期
所属账期:账期最早2020-05-20 00:00:00,账期最晚2020-07-10 00:00:00
所属账期就是202005/202006/202007或者202005~202007
3、其中某系字段例如“发票号”“凭证号”“产业”“渠道名称”等包含多个内容时,在同一行显示。并且,如果发票号或者凭证号是连续的使用“~”,如果不是联系的使用“/”隔开,重复的数据只显示一个(也就是去重)。
例如:发票号:1,2,3,4,2;是连续的且2是重复的,则结果为:1~4
发票号:1,3,4,8,3;非连续的且3是重复的,则结果为:1/3/4/8,而不是1/3/4/8/3
解决步骤:
第一步:查看源表数据,如下
客户希望展现的结果:
第二步:impala的group_concat(column)方法,将某一个列的数据放到一起,需要配合group by使用。(同组的数据,通过group_concat(column)指定要放到一起的数据是哪列);
首先我们来实现多行合并到一行
select b.batch_no,
GROUP_CONCAT(b.invoice_no, '/') invoice_no,
GROUP_CONCAT(b.voucher_no, '/') voucher_no
from receive_data.CVP_V_GET_BATCH_NBMS a
left join receive_data.cvp_v_get_invoice_nbms b
on a.batch_no=b.batch_no
where a.deleted_flag='0'
and b.deleted_flag='0'
and a.batch_no='INBMS202005290100'
group by b.batch_no;
结果如下:
细心的你会发现,invoice_no是连续的啊,要使用“~”,不能使用“/”拼接啊
第三步:把连续的使用“~”,不连续的使用“/”连接
select b.batch_no,
case when max(invoice_no) = min(invoice_no) then max(invoice_no)
when cast(max(invoice_no) as int)+1 - cast(min(invoice_no) as int)=count(invoice_no) then concat(min(invoice_no),'~',max(invoice_no))
else GROUP_CONCAT(b.invoice_no, '/')
end invoice_no,
case when max(voucher_no) = min(voucher_no) then max(voucher_no)
when cast(max(voucher_no) as int)+1 - cast(min(voucher_no) as int)=count(voucher_no) then concat(min(voucher_no),'~',max(voucher_no))
else GROUP_CONCAT(b.voucher_no, '/')
end voucher_no
from receive_data.CVP_V_GET_BATCH_NBMS a
left join receive_data.cvp_v_get_invoice_nbms b
on a.batch_no=b.batch_no
where a.deleted_flag='0'
and b.deleted_flag='0'
and a.batch_no='INBMS202005290100'
group by b.batch_no;
第四步:现在离我们的目标又进了一步,但是需求还有一点就是要去重。你会想到使用distinct啊。我们来看一下结果:
select b.batch_no,
case when max(invoice_no) = min(invoice_no) then max(invoice_no)
when cast(max(invoice_no) as int)+1 - cast(min(invoice_no) as int)=count(distinct invoice_no) then concat(min(invoice_no),'~',max(invoice_no))
else GROUP_CONCAT(distinct b.invoice_no, '/')
end invoice_no,
case when max(voucher_no) = min(voucher_no) then max(voucher_no)
when cast(max(voucher_no) as int)+1 - cast(min(voucher_no) as int)=count(distinct voucher_no) then concat(min(voucher_no),'~',max(voucher_no))
else GROUP_CONCAT(distinct b.voucher_no, '/')
end voucher_no
from receive_data.CVP_V_GET_BATCH_NBMS a
left join receive_data.cvp_v_get_invoice_nbms b
on a.batch_no=b.batch_no
where a.deleted_flag='0'
and b.deleted_flag='0'
and a.batch_no='INBMS202005290100'
group by b.batch_no;
唉呀,报错了!
这里我把报错信息贴出来:
[Code: 500051, SQL State: HY000] [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT invoice_no); deviating function: count(DISTINCT voucher_no)
), Query: select b.batch_no,
case when max(invoice_no) = min(invoice_no) then max(invoice_no)
when cast(max(invoice_no) as int)+1 - cast(min(invoice_no) as int)=count(distinct invoice_no) then concat(min(invoice_no),'~',max(invoice_no))
else GROUP_CONCAT(distinct b.invoice_no, '/')
end invoice_no,
case when max(voucher_no) = min(voucher_no) then max(voucher_no)
when cast(max(voucher_no) as int)+1 - cast(min(voucher_no) as int)=count(distinct voucher_no) then concat(min(voucher_no),'~',max(voucher_no))
else GROUP_CONCAT(distinct b.voucher_no, '/')
end voucher_no
from receive_data.CVP_V_GET_BATCH_NBMS a
left join receive_data.cvp_v_get_invoice_nbms b
on a.batch_no=b.batch_no
where a.deleted_flag='0'
and b.deleted_flag='0'
and a.batch_no='INBMS202005290100'
group by b.batch_no.
我们只需要看这一部分:
[Code: 500051, SQL State: HY000] [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT invoice_no); deviating function: count(DISTINCT voucher_no)
这句话的意思:所有不同的聚合函数都需要有相同的参数集
原因:DISTINCT只允许在一个列组合上使用。
第五步:解决多个GROUP_CONCAT在一个select使用。我们可以使用join啊。
select
i.batch_no,
invoice_no,
voucher_no
from (
select b.batch_no,
max(invoice_no) invoice_no_max, --查询出最大的invoice_no
min(invoice_no) invoice_no_min, --查询出最小的invoice_no
count(distinct invoice_no) num, --计算同一个batch_no下invoice_no总数(去重)
case when max(invoice_no) = min(invoice_no) then max(invoice_no) --判断是否只有一个invoice_no
when cast(max(invoice_no) as int)+1 - cast(min(invoice_no) as int)=count(distinct invoice_no) then concat(min(invoice_no),'~',max(invoice_no)) --判断是否是连续的,如果是连续的使用“~”连接invoice_no
else GROUP_CONCAT(distinct b.invoice_no, '/') --如果不连续使用“/”拼接invoice_no
end invoice_no
from receive_data.CVP_V_GET_BATCH_NBMS a
left join receive_data.cvp_v_get_invoice_nbms b
on a.batch_no=b.batch_no
where a.deleted_flag='0'
and b.deleted_flag='0'
and a.batch_no='INBMS202005290100'
group by b.batch_no) i
join
(select b.batch_no,
max(voucher_no) voucher_no_max,
min(voucher_no) voucher_no_min,
count(distinct voucher_no) num,
case when max(voucher_no) = min(voucher_no) then max(voucher_no)
when cast(max(voucher_no) as int)+1 - cast(min(voucher_no) as int)=count(distinct voucher_no) then concat(min(voucher_no),'~',max(voucher_no))
else GROUP_CONCAT(distinct b.voucher_no, '/')
end voucher_no
from receive_data.CVP_V_GET_BATCH_NBMS a
left join receive_data.cvp_v_get_invoice_nbms b
on a.batch_no=b.batch_no
where a.deleted_flag='0'
and b.deleted_flag='0'
and a.batch_no='INBMS202005290100'
group by b.batch_no ) v
on i.batch_no = v.batch_no;
结果如下:
细心的你是不是又发现了,我们还有”所属账期“没处理呢。接下来就是处理“所属账期”。
我们来看一下所属账期的问题。
查看一下源表:
select c.batch_no,
sheet_start_date, --开始时间
sheet_end_date --结束时间
from receive_data.CVP_V_GET_BATCH_NBMS a
left join receive_data.cvp_v_get_sheet_nbms c on a.batch_no=c.batch_no
where a.deleted_flag='0'
and c.deleted_flag='0'
and a.batch_no='INBMS202005290100';
结果如下:共85条,只展现部分数据
客户希望的结果
我这里直接贴出sql,原理跟上面一样,这里就不废话了。
select batch_no,sheet_start_date,sheet_end_date,
case when start_date = end_date then end_date --判断是否只有一个月份
else concat(start_date,'~',end_date) end account_date --如果不是,使用“~”连接
from (
select c.batch_no,
min(c.sheet_start_date) sheet_start_date, --获取最早账期
max(c.sheet_end_date) sheet_end_date, --获取最晚账期
concat(substr(min(c.sheet_start_date),1,4),substr(min(c.sheet_start_date),6,2)) start_date, --取最早账期年月
concat(substr(max(c.sheet_end_date),1,4),substr(max(c.sheet_end_date),6,2)) end_date --取最晚账期年月
from receive_data.CVP_V_GET_BATCH_NBMS a
left join receive_data.cvp_v_get_sheet_nbms c on a.batch_no=c.batch_no
where a.deleted_flag='0'
and c.deleted_flag='0'
and a.batch_no='INBMS202005290100'
group by c.batch_no ) t
到此,我们就把客户提的需求完美解决。
前台展现:这里使用的是帆软(FineReport)
顺便秀一下我的个人证书:
关注我,带你体验更多真实案例。