结果展示
后面几个月没数据的没有截进来
基础数据准备
基础数据分为三部分:
按月分组获取每月工单数,
按月、事项名称分组获取每月办理次数最多的事项名称,
获取连续的月份
-- 按月分组获取每月工单数
SELECT date_format( create_time, 'yyyy-MM' ) ym, COUNT( * ) total
FROM pro_wo_form
GROUP BY date_format( create_time, 'yyyy-MM' )
这部分比较简单就省略结果截图了,可以按需要添加时间范围过滤
-- 按月、事项名称分组获取每月办理次数最多的事项名称
SELECT * FROM (
SELECT row_number() over ( PARTITION BY ym ORDER BY total DESC ) rn, *
FROM (SELECT date_format(create_time, 'yyyy-MM') ym, act_name, COUNT(*) total
FROM pro_wo_form
GROUP BY date_format(create_time, 'yyyy-MM'), act_name) t1
) t2
WHERE rn < 2
利用分组后行号获取每月办理次数最多的事项名称,具体语法可以参考我之前的文章
按两个字段分组,取字段a所有子集的第一行的优化部分
-- 获取连续的月份
SELECT pos, SUBSTR(
add_months(FROM_UNIXTIME(unix_timestamp(year, 'yyyy')), pos - 1)
, 1, 7) AS ym
FROM (SELECT '2020' AS year) tmp lateral VIEW posexplode
( split ( space( 12 ), '' ) ) t AS pos, val
这里获取连续月份以及行号用于后续计算环比,这个查询语法可以参考hive中获取连续的日期或者月份
完整查询语句
with base as (
SELECT date_format( create_time, 'yyyy-MM' ) ym, COUNT( * ) total
FROM pro_wo_form
GROUP BY date_format( create_time, 'yyyy-MM' )
),
t0 as (
SELECT * FROM (
SELECT row_number() over ( PARTITION BY ym ORDER BY total DESC ) rn, *
FROM (SELECT date_format(create_time, 'yyyy-MM') ym, act_name, COUNT(*) total
FROM pro_wo_form
GROUP BY date_format(create_time, 'yyyy-MM'), act_name) t1
) t2
WHERE rn < 2
),
t1 as (
SELECT pos, SUBSTR(
add_months(FROM_UNIXTIME(unix_timestamp(year, 'yyyy')), pos - 1)
, 1, 7) AS ym
FROM (SELECT '2020' AS year) tmp lateral VIEW posexplode
( split ( space( 12 ), '' ) ) t AS pos, val
),
t2 as (
select pos mm, base.total, t0.act_name
from t1 left join base on t1.ym = base.ym
left join t0 on t1.ym = t0.ym
where pos > 0
),
t3 as (
select pos + 1 mm, base.total
from t1 left join base on t1.ym = base.ym
)
select
t2.mm, if(t2.total is null, 0, t2.total),
if(cast((t2.total - t3.total) / t3.total as decimal(10,2)) is null, 0, cast(((t2.total - t3.total) / t3.total) * 100 as decimal(10,2))),
if(t2.act_name is null, '', t2.act_name)
from t2 left join t3 on t2.mm = t3.mm;
其中base、t0、t1这三块分别对应基础数据的三部分,可以略过,
主要关注t2、t3中根据行号关联计算环比部分