sql优化记录
关于sql语句的优化
现象
之前一直在开发环境使用的一个查询功能没有问题,然后换到sit环境也没有问题,之后突然查不到数据了
原因
仔细看了日志也没有报错
找到了其中的主逻辑查询sql
select
distinct p.PARTY_ID,
p.CUST_NAME,
p.CUST_SOURCE,
p.GENDER,
p.AGE,
p.PPC,
p.OPEN_DATE,
p.CURRENT_DEPOSIT_BAL,
p.CUST_LEVEL_CODE CUST_LEVEL,
p.COLLECT_FLG LODGE_DEPOSIT,
p.PAYROLL_CREDIT_FLAG PROXY_CUST,
p.existing_customer,
p.RECOMMEND_PERSON,
p.RECOMMEND_PERSON_ORG,
p.OPEN_OWNER_NAME,
p.OPEN_OWNER_ORG_NAME,
p.CUST_INTEGRAL,
pb.belong_org_name,
tm.CONTACT_NUMBER,
p.CREDIT_CARD,
p.PHONE_BANK,
p.WX_BANK,
p.DIRECT_SALES_BANK,
p.E_MALL,
p.OTHER_PAY,
p.PHONE_PAY,
p.YM_PAY,
p.CREDIT,
p.FINAL_ASSET_RISK ,
pi.PARTY_IDENTIFICATION_TYPE_ID ID_TYPE,
pi.ID_VALUE ID_NO,
p.MALL_INTEGRAL
from
person p
left join
party_belong pb
on p.PARTY_ID = pb.party_id
left join
party_identification pi
on p.PARTY_ID = pi.PARTY_ID
and pi.MAIN_FLAG = 'Y'
left join
(
select
pcmp.CONTACT_MECH_ID,
tn.PARTY_ID,
tn.CONTACT_NUMBER
from
PARTY_CONTACT_MECH_PURPOSE pcmp,
TELECOM_NUMBER tn,
PARTY_CONTACT_MECH pcm
where
pcmp.CONTACT_MECH_ID = tn.CONTACT_MECH_ID
and pcm.CONTACT_MECH_ID = pcmp.CONTACT_MECH_ID
and tn.POST_PHONE_FLAG = 'Y'
) tm
on tm.PARTY_ID = p.PARTY_ID
where
pb.status = 'ENABLED'
and p.existing_customer = 'Y'
and p.PARTY_ID = '1012260009'
and pb.belong_org_id = '664000'
and pb.user_login_id = '300012'
order by
p.PARTY_ID limit 0,
10
放到mysql中去执行,发现500秒以上都执行不出来数据
解决
加索引
先考虑到在不更改代码的前提下做优化,那就是增加索引
检查数据库 person表的party_id是主键,不需要加索引
party_belong表的party_id及belong_org_id ,user_login_id 已添加过索引,status 是一个枚举值,当全表有超过80%的数据是一样的时候,没有添加索引的必要
party_identification 表中party_id是联合主键,且已经尝试加了索引
一通操作之后,该sql还是几百秒查不出数据
被逼无奈尝试给枚举值,status MAIN_FLAG 等添加索引
但是数据库在查询无障碍的情况下,加不上索引,也是几百秒没有成功,当时觉得可能是数据量太大,导致索引加不上(事实上也不是很大,只有15w,可能跟mysql卡顿或者别的有关系)
于是尝试把数据库整个备份一下,然后重新建个表,加上索引,把数据还原回去
然后进行数据库备份
mysqldump -uroot -p密码 --databases 数据库 --tables 表名1 表名2 表名3 > bakup.sql
注:不采用Navicat直接备份是因为,dump会生成一个大的insert ,而工具导出会生成一个个小的insert ,dump会更快一点
这样dump的情况下也是很久没有成功,于是考虑到可能是mysql挂掉的问题,于是重启了mysql
service mysqld restart
之后再重新dump很快就成功了
但是想操作索引的增删,依然操作不了
于是考虑另一条路
优化sql
从sql中可以看出,有一段非常奇怪
left join
(
select
pcmp.CONTACT_MECH_ID,
tn.PARTY_ID,
tn.CONTACT_NUMBER
from
PARTY_CONTACT_MECH_PURPOSE pcmp,
TELECOM_NUMBER tn,
PARTY_CONTACT_MECH pcm
where
pcmp.CONTACT_MECH_ID = tn.CONTACT_MECH_ID
and pcm.CONTACT_MECH_ID = pcmp.CONTACT_MECH_ID
and tn.POST_PHONE_FLAG = 'Y'
) tm
on tm.PARTY_ID = p.PARTY_ID
就是这段三个表内连接查询,查询结果中需要的也仅仅是 tn.CONTACT_NUMBER 这个字段,查询条件中也仅仅用到tn.POST_PHONE_FLAG = ‘Y’,完全没有三个表内连接查询的必要,
在三个表数据都超过10w的情况下,101010笛卡尔积达到1000w,查询非常缓慢,单独拿出这条sql查询
select
pcmp.CONTACT_MECH_ID,
tn.PARTY_ID,
tn.CONTACT_NUMBER
from
PARTY_CONTACT_MECH_PURPOSE pcmp,
TELECOM_NUMBER tn,
PARTY_CONTACT_MECH pcm
where
pcmp.CONTACT_MECH_ID = tn.CONTACT_MECH_ID
and pcm.CONTACT_MECH_ID = pcmp.CONTACT_MECH_ID
and tn.POST_PHONE_FLAG = 'Y'
10分钟也查询不出结果
使用explain分析
explain
select
pcmp.CONTACT_MECH_ID,
tn.PARTY_ID,
tn.CONTACT_NUMBER
from
PARTY_CONTACT_MECH_PURPOSE pcmp,
TELECOM_NUMBER tn,
PARTY_CONTACT_MECH pcm
where
pcmp.CONTACT_MECH_ID = tn.CONTACT_MECH_ID
and pcm.CONTACT_MECH_ID = pcmp.CONTACT_MECH_ID
and tn.POST_PHONE_FLAG = 'Y'
前几张表的访问类型是ok的,但是后两张表的访问类型及其效率低下
rows显示也是需要10几w(此处截的是开发环境的图,即使这样依然很大)
于是修改sql为
left join
TELECOM_NUMBER tn
on tn.PARTY_ID = p.PARTY_ID
and tn.POST_PHONE_FLAG = 'Y'
执行sql,差不多5s内可以查出结果
select
distinct p.PARTY_ID,
p.CUST_NAME,
p.CUST_SOURCE,
p.GENDER,
p.AGE,
p.PPC,
p.OPEN_DATE,
p.CURRENT_DEPOSIT_BAL,
p.CUST_LEVEL_CODE CUST_LEVEL,
p.COLLECT_FLG LODGE_DEPOSIT,
p.PAYROLL_CREDIT_FLAG PROXY_CUST,
p.existing_customer,
p.RECOMMEND_PERSON,
p.RECOMMEND_PERSON_ORG,
p.OPEN_OWNER_NAME,
p.OPEN_OWNER_ORG_NAME,
p.CUST_INTEGRAL,
pb.belong_org_name,
tn.CONTACT_NUMBER,
p.CREDIT_CARD,
p.PHONE_BANK,
p.WX_BANK,
p.DIRECT_SALES_BANK,
p.E_MALL,
p.OTHER_PAY,
p.PHONE_PAY,
p.YM_PAY,
p.CREDIT,
p.FINAL_ASSET_RISK ,
pi.PARTY_IDENTIFICATION_TYPE_ID ID_TYPE,
pi.ID_VALUE ID_NO,
p.MALL_INTEGRAL
from
person p
left join
party_belong pb
on p.PARTY_ID = pb.party_id
left join
party_identification pi
on p.PARTY_ID = pi.PARTY_ID
and pi.MAIN_FLAG = 'Y'
left join
TELECOM_NUMBER tn
on tn.PARTY_ID = p.PARTY_ID
and tn.POST_PHONE_FLAG = 'Y'
where
pb.status = 'ENABLED'
and p.existing_customer = 'Y'
and p.PARTY_ID = '1012260009'
and pb.belong_org_id = '664000'
and pb.user_login_id = '300012'
order by
p.PARTY_ID limit 0,
10
之前这样修改的时候是15w数据 后来数据变成了300w执行一遍要5min显然是不可容忍的,因此需要继续优化
继续优化sql
经过测试发现,这段代码主要慢在distinct 这个部分,改为先查出来再distinct limit
ps 括号后面的a是不可省略的,否则会报错
select
distinct * from (
select p.PARTY_ID,
p.CUST_NAME,
p.CUST_SOURCE,
p.GENDER,
p.AGE,
p.PPC,
p.OPEN_DATE,
p.CURRENT_DEPOSIT_BAL,
p.CUST_LEVEL_CODE CUST_LEVEL,
p.COLLECT_FLG LODGE_DEPOSIT,
p.PAYROLL_CREDIT_FLAG PROXY_CUST,
p.existing_customer,
p.RECOMMEND_PERSON,
p.RECOMMEND_PERSON_ORG,
p.OPEN_OWNER_NAME,
p.OPEN_OWNER_ORG_NAME,
p.CUST_INTEGRAL,
pb.belong_org_name,
tn.CONTACT_NUMBER,
p.CREDIT_CARD,
p.PHONE_BANK,
p.WX_BANK,
p.DIRECT_SALES_BANK,
p.E_MALL,
p.OTHER_PAY,
p.PHONE_PAY,
p.YM_PAY,
p.CREDIT,
p.FINAL_ASSET_RISK ,
pi.PARTY_IDENTIFICATION_TYPE_ID ID_TYPE,
pi.ID_VALUE ID_NO,
p.MALL_INTEGRAL
from
person p
left join
party_belong pb
on p.PARTY_ID = pb.party_id
left join
party_identification pi
on p.PARTY_ID = pi.PARTY_ID
and pi.MAIN_FLAG = 'Y'
left join
TELECOM_NUMBER tn
on tn.PARTY_ID = p.PARTY_ID
and tn.POST_PHONE_FLAG = 'Y'
where
pb.status = 'ENABLED'
and p.existing_customer = 'Y'
and p.PARTY_ID = '1012260009'
and pb.belong_org_id = '664000'
and pb.user_login_id = '300012'
order by
p.PARTY_ID) a limit 0,
10
继续优化索引
之前只是给非枚举型的加上了索引,之后做了测试,将枚举类型的加上索引之后是可以提升速度的,即existing_customer ,status ,POST_PHONE_FLAG ,MAIN_FLAG
因为要根据aum排序,所以将sql改成了这样
select
distinct * from (
select p.PARTY_ID,
p.CUST_NAME,
p.CUST_SOURCE,
p.GENDER,
p.AGE,
p.PPC,
p.OPEN_DATE,
p.CURRENT_DEPOSIT_BAL,
p.CUST_LEVEL_CODE CUST_LEVEL,
p.COLLECT_FLG LODGE_DEPOSIT,
p.PAYROLL_CREDIT_FLAG PROXY_CUST,
p.existing_customer,
p.RECOMMEND_PERSON,
p.RECOMMEND_PERSON_ORG,
p.OPEN_OWNER_NAME,
p.OPEN_OWNER_ORG_NAME,
p.CUST_INTEGRAL,
pb.belong_org_name,
tn.CONTACT_NUMBER,
p.CREDIT_CARD,
p.PHONE_BANK,
p.WX_BANK,
p.DIRECT_SALES_BANK,
p.E_MALL,
p.OTHER_PAY,
p.PHONE_PAY,
p.YM_PAY,
p.CREDIT,
p.FINAL_ASSET_RISK ,
pi.PARTY_IDENTIFICATION_TYPE_ID ID_TYPE,
pi.ID_VALUE ID_NO,
p.MALL_INTEGRAL
from
person p
left join
party_belong pb
on p.PARTY_ID = pb.party_id
left join
party_identification pi
on p.PARTY_ID = pi.PARTY_ID
and pi.MAIN_FLAG = 'Y'
left join
TELECOM_NUMBER tn
on tn.PARTY_ID = p.PARTY_ID
and tn.POST_PHONE_FLAG = 'Y'
where
pb.status = 'ENABLED'
and p.existing_customer = 'Y'
and p.PARTY_ID = '1012260009'
and pb.belong_org_id = '664000'
and pb.user_login_id = '300012'
order by
p.aum desc) a limit 0,
10
所以aum也要加上索引 ,没加之前300w数据第一次查大概需要40s 第二次查需要20s,之后每次都需要20s左右
之前的代码在排序之后distinct ,distinct会重新对结果order by ,最后的输出结果会无序,所以改成这样
select
distinct * from (
select p.PARTY_ID,
p.CUST_NAME,
p.CUST_SOURCE,
p.GENDER,
p.AGE,
p.PPC,
p.OPEN_DATE,
p.CURRENT_DEPOSIT_BAL,
p.CUST_LEVEL_CODE CUST_LEVEL,
p.COLLECT_FLG LODGE_DEPOSIT,
p.PAYROLL_CREDIT_FLAG PROXY_CUST,
p.existing_customer,
p.RECOMMEND_PERSON,
p.RECOMMEND_PERSON_ORG,
p.OPEN_OWNER_NAME,
p.OPEN_OWNER_ORG_NAME,
p.CUST_INTEGRAL,
pb.belong_org_name,
tn.CONTACT_NUMBER,
p.CREDIT_CARD,
p.PHONE_BANK,
p.WX_BANK,
p.DIRECT_SALES_BANK,
p.E_MALL,
p.OTHER_PAY,
p.PHONE_PAY,
p.YM_PAY,
p.CREDIT,
p.FINAL_ASSET_RISK ,
pi.PARTY_IDENTIFICATION_TYPE_ID ID_TYPE,
pi.ID_VALUE ID_NO,
p.MALL_INTEGRAL
from
person p
left join
party_belong pb
on p.PARTY_ID = pb.party_id
left join
party_identification pi
on p.PARTY_ID = pi.PARTY_ID
and pi.MAIN_FLAG = 'Y'
left join
TELECOM_NUMBER tn
on tn.PARTY_ID = p.PARTY_ID
and tn.POST_PHONE_FLAG = 'Y'
where
pb.status = 'ENABLED'
and p.existing_customer = 'Y'
and p.PARTY_ID = '1012260009'
and pb.belong_org_id = '664000'
and pb.user_login_id = '300012'
) a
order by
aum desc limit 0,
10
但是改成这样后又没有使用索引了,于是决定不去重了,删除distinct
关于in条件查询的记录
in条件里有1000条数据
只查询一个字段
0.205s
查询全表(全表224个字段)
0.235s
in条件里有1500条数据
只查询一个字段
0.233
查询全表
0.286
in条件里有2000条数据
只查询一个字段
4.308
查询全表
72.879
in条件里有3000条数据
只查询一个字段
4.731s
查询全表
73.741
in条件里有5000条数据
只查询一个字段
4.924s
查询全表
73.319
in条件里有8000条数据
只查询一个字段
5.057s
查询全表
74.472
in条件里有10000条数据
只查询一个字段
47.835s 5.401 8.468
查询全表
81.580