MY sql 性能调优
现在做一个海量数据的报表查询 , 有两个报表查询时间太长 , 需要进行优化 .
首先做的工作就是增加索引 , 按照查询的条件的顺序 , 增加索引
create index I_SALE_DAY_TAC_MANU_TIME_PROV on RPT_SALE_DAY_TAC
(MERGE_MANU_ID,BIND_DAY,PROV_ID);
create index I_SALE_DAY_PROV_MANU_TIME_MODEL ON RPT_SALE_DAY_PROV(MERGE_MANU_ID,BIND_DAY,MODEL_ID);
create index I_SALE_DAY_PROV_TIME_MODEL_PROV ON RPT_SALE_DAY_PROV(BIND_DAY,MODEL_ID,PROV_ID);
……
然后通过用 explain 检查执行计划 , 是否按照预计的索引进行查询 .
但加上索引后 , 速度还是比较慢 .
接下来分析 sql 打算对 sql 语句进行优化
第一个报表 “ 按照软件版本终端销量统计 ” 原始 sql 为 :
select
t1.MERGE_MANU_ID,
t3.MANU_NAME,
t1.MODEL_ID,
t4.MODEL_NAME,
t1.FW_VER,
t1.PROV_ID,
t5.PROV_NAME,
sum(t1.SUM_MSISDN) as SELLNUM
from RPT_SALE_DAY_TAC t1
LEFT JOIN SRC_MODEL_SERVICE t2 on t1.MODEL_ID=t2.MODEL_ID
LEFT JOIN SRC_MANUFACTURER t3 on t1.MERGE_MANU_ID=t3.MERGE_MANU_ID
LEFT JOIN SRC_MODEL t4 on t1.MODEL_ID=t4.MODEL_ID
LEFT JOIN SRC_PROVINCE t5 on t1.PROV_ID=t5.PROV_ID
where 1=1
and t1.MERGE_MANU_ID=681753
and t1.BIND_DAY>='2009-08-01'
and t1.BIND_DAY<='2010-10-09'
group by t1.MERGE_MANU_ID,
t3.MANU_NAME,
t1.MODEL_ID,
t4.MODEL_NAME,
t1.FW_VER,
t1.PROV_ID,
t5.PROV_NAME
order by t3.MANU_NAME,t4.MODEL_NAME,t1.FW_VER,t5.PROV_NAME
这个 RPT_SALE_DAY_TAC 是有 2700 万条记录的大表 , 本身对此表进行查询就会比较慢 , 查询用了 45-50 秒
分析结果打算把这个报表查询 groupby 和联表显示和排序分别进行处理 , 再进行合并
调整后的 sql 语句如下 :
select sql_no_cache t0.MERGE_MANU_ID,
t3.MANU_NAME,
t0.MODEL_ID,
t4.MODEL_NAME,
t0.FW_VER,
t0.PROV_ID,
t5.PROV_NAME,
t0.SELLNUM from
(select t1.MERGE_MANU_ID,
t1.MODEL_ID,
t1.FW_VER,
t1.PROV_ID,
sum(t1.SUM_MSISDN) as SELLNUM
from RPT_SALE_DAY_TAC t1
LEFT JOIN SRC_MODEL_SERVICE t2 on t1.MODEL_ID=t2.MODEL_ID
where 1=1
and t1.MERGE_MANU_ID=681753
and t1.BIND_DAY>='2009-08-01'
and t1.BIND_DAY<='2010-10-09'
group by t1.MERGE_MANU_ID,
t1.MODEL_ID,
t1.PROV_ID,
t1.FW_VER ) t0
LEFT JOIN SRC_MANUFACTURER t3 on t0.MERGE_MANU_ID=t3.MERGE_MANU_ID
LEFT JOIN SRC_MODEL t4 on t0.MODEL_ID=t4.MODEL_ID
LEFT JOIN SRC_PROVINCE t5 on t0.PROV_ID=t5.PROV_ID
order by t3.MANU_NAME,t4.MODEL_NAME,t0.FW_VER,t5.PROV_NAME
优化后时间为 18-19 秒 .
事实上单独查询报表数据内容也需要花费 18 秒左右 . 并且索引已经生效 .
目前 19 秒的速度还算可接受范围
如果想进一步提高性能 , 只有考虑 , 进一步生成一个周统计表 , 用以减少数据条数 .
第二个需要优化的报表是 ” 不同数据业务终端销量统计 ” 原始 sql 为
select
'WAP_AVAIL' as SCOLUMN,'WAP' as SNAME,sum(t1.SUM_MSISDN) as SELLNUM
from RPT_SALE_DAY_PROV t1
LEFT JOIN SRC_MODEL_SERVICE t2 on t1.MODEL_ID=t2.MODEL_ID
where 1=1
and t1.BIND_DAY>='2010-02-01'
and t1.BIND_DAY<='2010-10-09'
and t1.MODEL_ID in
(
select
distinct t3.MODEL_ID
from SRC_MODEL_STATIC_INFO t3
where t3.WAP_AVAIL>=1
)
union
select
'GPRS_AVAIL' as SCOLUMN,'GPRS' as SNAME,sum(t1.SUM_MSISDN) as SELLNUM
from RPT_SALE_DAY_PROV t1
LEFT JOIN SRC_MODEL_SERVICE t2 on t1.MODEL_ID=t2.MODEL_ID
where 1=1
and t1.BIND_DAY>='2010-02-01'
and t1.BIND_DAY<='2010-10-09'
and t1.MODEL_ID in
(
select
distinct t3.MODEL_ID
from SRC_MODEL_STATIC_INFO t3
where t3.GPRS_AVAIL>=1
)
union
……
Union
……
有 16 的 sum 的 sql union 在一起 , 如果查询时间范围为 2 个月 , 每个 sql 需要 2.5 秒中左右 , 总共需要 50 秒 . 如果时间范围扩大为 8 个月 , 总共就需要 2-3 分钟 . 这样的确是个大问题
跟老大一起研究 , 老大提出个天才的方案 . 现将能力信息表根据 group 进行 sum, 然后和统计表进行计算 , 果然效果显著
优化后 sql 如下 :
select
sum(t0.SUM_MSISDN ) as TOTAL_NUM ,
sum(t0.SUM_MSISDN * t3.WAP_AVAIL) as sum_WAP_AVAIL,
sum(t0.SUM_MSISDN * t3.GPRS_AVAIL) as sum_GPRS_AVAIL,
sum(t0.SUM_MSISDN * t3.MMS_AVAIL) as sum_MMS_AVAIL,
sum(t0.SUM_MSISDN * t3.PIM_AVAIL) as sum_PIM_AVAIL,
sum(t0.SUM_MSISDN * t3.DM_AVAIL) as sum_DM_AVAIL,
sum(t0.SUM_MSISDN * t3.JAVA_AVAIL) as sum_JAVA_AVAIL,
sum(t0.SUM_MSISDN * t3.EDGE_STATUS) as sum_EDGE_STATUS,
sum(t0.SUM_MSISDN * t3.FLASH_STATUS) as sum_FLASH_STATUS,
sum(t0.SUM_MSISDN * t3.BARRECOG_STATUS) as sum_BARRECOG_STATUS,
sum(t0.SUM_MSISDN * t3.DCD_STATUS) as sum_DCD_STATUS,
sum(t0.SUM_MSISDN * t3.MOBILESTOCK_STATUS) as sum_MOBILESTOCK_STATUS,
sum(t0.SUM_MSISDN * t3.MOBILEEMAIL_STATUS) as sum_MOBILEEMAIL_STATUS,
sum(t0.SUM_MSISDN * t3.AGPS_STATUS) as sum_AGPS_STATUS,
sum(t0.SUM_MSISDN * t3.MUSICBOX_STATUS) as sum_MUSICBOX_STATUS,
sum(t0.SUM_MSISDN * t3.FETION_STATUS) as sum_FETION_STATUS,
sum(t0.SUM_MSISDN * t3.STREAMINGTV_STATUS) as sum_STREAMINGTV_STATUS
FROM (SELECT t1.MODEL_ID, SUM(t1.SUM_MSISDN) as SUM_MSISDN FROM RPT_SALE_DAY_PROV t1 LEFT JOIN SRC_MODEL_SERVICE t2 on t1.MODEL_ID=t2.MODEL_ID where 1=1 and t1.BIND_DAY>='2010-02-07' and t1.BIND_DAY<='2010-10-09' and t1.PROV_ID=210 and t2.REG_FLAG=1 group by t1.model_id) t0,
(
select
model_id,
IF(sum(WAP_AVAIL)=0,1,0) as WAP_AVAIL,
IF(sum(GPRS_AVAIL)=0,1,0) as GPRS_AVAIL,
IF(sum(MMS_AVAIL)=0,1,0) as MMS_AVAIL,
IF(sum(PIM_AVAIL)=0,1,0) as PIM_AVAIL,
IF(sum(DM_AVAIL)=0,1,0) as DM_AVAIL,
IF(sum(JAVA_AVAIL)=0,1,0) as JAVA_AVAIL,
IF(sum(EDGE_STATUS)=0,1,0) as EDGE_STATUS,
IF(sum(FLASH_STATUS)=0,1,0) as FLASH_STATUS,
IF(sum(BARRECOG_STATUS)=0,1,0) as BARRECOG_STATUS,
IF(sum(DCD_STATUS)=0,1,0) as DCD_STATUS,
IF(sum(MOBILESTOCK_STATUS)=0,1,0) as MOBILESTOCK_STATUS,
IF(sum(MOBILEEMAIL_STATUS)=0,1,0) as MOBILEEMAIL_STATUS,
IF(sum(AGPS_STATUS)=0,1,0) as AGPS_STATUS,
IF(sum(MUSICBOX_STATUS)=0,1,0) as MUSICBOX_STATUS,
IF(sum(FETION_STATUS)=0,1,0) as FETION_STATUS,
IF(sum(STREAMINGTV_STATUS)=0,1,0) as STREAMINGTV_STATUS
from SRC_MODEL_STATIC_INFO
group by model_id
)
t3
where t0.MODEL_ID=t3.MODEL_ID
将报表和能力信息表分成两个子查询 , 查询的结果再进行运算 , 这样就相当于只对海量表查询了一次 , 并且用避免了 select in 效率低的问题 .
现在一次查询 , 只需要 2 秒 , 提高了 30-60 倍的速度.