1.场景说明
平台每天都会接收分公司上报的指标(mq发消息给平台,平台解析入库,分别进device表和kpi表)。kpi表是其他指标表,device表是设备信息指标表,每天要统计分公司上报指标的完整率,一类指标发送一条就算发送,重复发送还是只算一条。其中,device表只存一类指标,kpi表存27类指标,27类指标根据字段kpi_id区分。
现在需要统计分公司每天的指标完整率和当月的指标完整率。
2.sql实现
2.1当天的指标完整率
2.1.1设备指标当天推送情况统计
SELECT
count( 1 )
FROM
monitor_device_info
WHERE
DATE_FORMAT( now( ), '%Y-%m-%d' ) = DATE_FORMAT( updated_at, '%Y-%m-%d' )
AND corporation_id = #{bcId}
上面那条sql是我第一次使用的sql,需要遍历分公司的id做参数,连续查询多次,效率较低。
SELECT
(IF(COUNT(d.device_id) > 0,1,0)) AS kpi_num,
bc.bc_id,
bc.bc_short_name
FROM
public_org_busicorp bc
LEFT JOIN ( SELECT * FROM monitor_device_info WHERE DATE_FORMAT(NOW(), '%Y-%m-%d' ) = DATE_FORMAT( updated_at, '%Y-%m-%d' ) ) d ON d.corporation_id = bc.bc_id
WHERE
bc.bc_id > 0 AND bc.bc_id < 99
GROUP BY
bc.bc_id
ORDER BY
bc.bc_desc
这条是改良后的sql,可以一次查询获取所有分公司设备信息类指标的当天推送情况,效率提升。
但是后来发现这样写是有问题的,如果今天设备第一次插入,那么updated_at
是null,但是你不能说分公司今天没有推送指标,所以我对sql进行了一下改良
SELECT
( IF ( COUNT( d.device_id ) > 0,1,0 ) ) AS kpi_num,
bc.bc_id,
bc.bc_short_name
FROM
public_org_busicorp bc
LEFT JOIN (
SELECT
*
FROM
monitor_device_info
WHERE
DATE_FORMAT( now(), '%Y-%m-%d' ) = DATE_FORMAT( IF ( updated_at IS NULL, created_at, updated_at ), '%Y-%m-%d' )
) d ON d.corporation_id = bc.bc_id
WHERE
bc.bc_id > 0
AND bc.bc_id < 99
GROUP BY
bc.bc_id
ORDER BY
bc.bc_desc
2.1.2其他指标当天推送情况统计
SELECT
COUNT( DISTINCT h.kpi_id ) AS kpi_num,
bc.bc_id,
bc.bc_short_name
FROM
public_org_busicorp bc
LEFT JOIN (
SELECT
*
FROM
monitor_statistics_history
WHERE
DATE_FORMAT(now(), '%Y-%m-%d' ) = DATE_FORMAT( kpi_time, '%Y-%m-%d' )
) h ON h.bc_id = bc.bc_id
WHERE
bc.bc_id > 0
AND bc.bc_id < 99
GROUP BY
bc.bc_id
ORDER BY
bc.bc_desc
后来随着数据量的增大,发现接口耗时较长,排查发现是sql查询效率、耗时长导致的,所以又对sql进行了优化,速度的提升还是比较明显的。优化后的sql如下:
SELECT
if(h.kpi_num is null,0,h.kpi_num) kpi_num,
bc.bc_id,
bc.bc_short_name
FROM
public_org_busicorp bc
LEFT JOIN (
SELECT
bc_id,
COUNT( DISTINCT kpi_id ) AS kpi_num
FROM
monitor_statistics_history
WHERE
DATE_FORMAT( NOW( ), '%Y-%m-%d' ) = DATE_FORMAT( kpi_time, '%Y-%m-%d' )
GROUP BY
CRC32(bc_id)
) h ON h.bc_id = bc.bc_id
WHERE
bc.bc_id > 0
AND bc.bc_id < 99
GROUP BY
CRC32(bc.bc_id)
ORDER BY
bc.bc_desc
这条改良后的sql查询速度提升主要来自两个方面:
①左连接的数据量大大减少,在进行左连接之前,对子查询进行了group by操作,使得子查询的数据得以整理,数据量大大减少;
②group by的字段如果是varchar类型,当表的数据量达到万级别的时候,就会明显感觉到sql语句的缓慢。所以使用了CRC32()函数对目标字段取int数,并按此int来分组,查询速度会得到极大的提升;
2.2当月指标完整率
2.2.1设备指标当月推送情况统计
SELECT
COUNT(DISTINCT d.corporation_id ,DATE_FORMAT(d.updated_at, '%Y-%m-%d' )) AS kpi_num,
bc.bc_id,
bc.bc_short_name
FROM
public_org_busicorp bc
LEFT JOIN (
SELECT
*
FROM
monitor_device_info
WHERE
DATE_FORMAT(NOW(), '%Y-%m' ) = DATE_FORMAT( updated_at, '%Y-%m' )
) d ON d.corporation_id = bc.bc_id
WHERE
bc.bc_id > 0 AND bc.bc_id < 99
GROUP BY
bc.bc_id
ORDER BY
bc.bc_desc
这是第一次写的,根据省公司id和日期的“年月日”去重查询作为当天设备指标推送的依据。
因为上面发现了设备指标统计sql的漏洞,所以,月统计的sql也要修改,而且要注意,这条sql里面有两处使用updated_at
的地方,都要做修改。
SELECT
COUNT(DISTINCT d.corporation_id ,DATE_FORMAT(IF ( d.updated_at IS NULL, d.created_at, d.updated_at ), '%Y-%m-%d' )) AS kpi_num,
bc.bc_id,
bc.bc_short_name
FROM
public_org_busicorp bc
LEFT JOIN ( SELECT * FROM monitor_device_info WHERE DATE_FORMAT(now(), '%Y-%m' ) = DATE_FORMAT( IF ( updated_at IS NULL, created_at, updated_at ), '%Y-%m' ) ) d ON d.corporation_id = bc.bc_id
WHERE
bc.bc_id > 0 AND bc.bc_id < 99
GROUP BY
bc.bc_id
ORDER BY
bc.bc_desc
2.2.2其他指标当月推送情况统计
SELECT
COUNT( DISTINCT h.kpi_id, DATE_FORMAT( h.kpi_time, '%Y-%m-%d' ) ) AS kpi_num,
bc.bc_id,
bc.bc_short_name
FROM
public_org_busicorp bc
LEFT JOIN ( SELECT * FROM monitor_statistics_history WHERE DATE_FORMAT( NOW( ), '%Y-%m' ) = DATE_FORMAT( kpi_time, '%Y-%m' ) ) h ON h.bc_id = bc.bc_id
WHERE
bc.bc_id > 0
AND bc.bc_id < 99
GROUP BY
bc.bc_id
ORDER BY
bc.bc_desc
统计思路与设备指标一样。
按照同样的优化思路,优化后的sql如下:
SELECT
h.kpi_num,
bc.bc_id,
bc.bc_short_name
FROM
public_org_busicorp bc
LEFT JOIN (
SELECT
bc_id,
COUNT( DISTINCT kpi_id, DATE_FORMAT( kpi_time, '%Y-%m-%d' ) ) AS kpi_num
FROM
monitor_statistics_history
WHERE
DATE_FORMAT( NOW( ), '%Y-%m' ) = DATE_FORMAT( kpi_time, '%Y-%m' )
GROUP BY
CRC32(bc_id)
) h ON h.bc_id = bc.bc_id
WHERE
bc.bc_id > 0
AND bc.bc_id < 99
GROUP BY
CRC32(bc.bc_id)
ORDER BY
bc.bc_desc
注意一点,查出来的kpi_num有可能是null,求和运算的时候需要用三目运算符处理一下。