最近,公司上线一个新项目,但上线后没几天,数据库的性能问题尤为明显,其中一个较为特殊,查询缓慢,还经常导致应用端服务内存溢出而崩溃。
原语句:
SELECT
media_id,ad_id,ad_name,
advertiser_id,ad_modify_time,
ad_create_time,
`status`,opt_status,delivery_range,inventory_type,open_url,bid,budget,budget_mode,mm.smart_bid_type,start_time,end_time,pricing,flow_control_mode,mm.campaign_id,download_url,hide_if_converted,
cam.campaign_name,
cam.campaign_status,
cam.campaign_type,
cam.promoted_object_type,
cam.campaign_budget,
cam.campaign_create_time,
cam.campaign_budget_mode,
cam.marketing_purpose,
cam.landing_type,
cam.marketing_goal,
cam.marketing_scene,
t.task_id
FROM
(
SELECT m.*,
IFNULL(h.view_count,0) AS `show`,
IFNULL(h.click_count,0) AS `click`,
IFNULL(h.cost,0) AS cost,
IFNULL(h.convert_cnt,0) AS `convert`,
IFNULL(h.avg_show_cost,0) AS avg_show_cost,
IFNULL(h.avg_click_cost,0) AS avg_click_cost,
IFNULL(h.ctr,0) AS ctr,
IFNULL(h.convert_cost,0) AS convert_cost,
IFNULL(h.convert_rate,0) AS convert_rate
FROM bigdata_media_ad_list m
LEFT JOIN (
SELECT adgroup_id,
SUM(view_count) AS view_count,
SUM(click_count) AS click_count,
SUM(cost) AS cost,
SUM(convert_cnt) AS convert_cnt,
SUM(avg_show_cost) AS avg_show_cost,
SUM(avg_click_cost) AS avg_click_cost,
SUM(ctr) AS ctr,
SUM(convert_cost) AS convert_cost,
SUM(convert_rate) AS convert_rate
FROM bigdata_saasreport_projects_adgroup_hour
WHERE stat_date BETWEEN '2022-05-19' AND '2022-05-26' GROUP BY adgroup_id) h
ON m.ad_id=h.adgroup_id
WHERE m.advertiser_id IN ( "22075143","22075142","22075141","22075140","22075138","22033241","21422817","19957653","19957651","19957646","19957642","19957640","24044093","24044092","24044091","24044090","24044088","24044086","24044085","24044084","24044083","24044082","23638767","23638766","23638764","23638762","23638760","23638758","23638755","23638752","23638749","23638744","22033339","22033337","19957649","19957647","24067537","24067536","24067534","24067532","24067531","24067519","24067518","24067516","24067514","24067513","24067512","24067511","24067509","24067508","24067507","24273760","24273759","24273758","24273757","24273756","24273753","24273752","24273751","24273750","24273749","24273061","24273060","24273059","24273058","24273057","24273054","24273053","24273050","24273048","24273047" )
) AS mm
LEFT JOIN media_campaign cam
ON mm.campaign_id=cam.campaign_id
LEFT JOIN ( SELECT aa.task_id,aa.adid FROM(
SELECT *,
JSON_UNQUOTE(JSON_EXTRACT(task_status_data, CONCAT('$."', api, '"'))) adid
FROM tenant_jtnpct.mk_task_record, JSON_TABLE(apis, '$[*]' COLUMNS (rowid FOR ORDINALITY , api VARCHAR(100) PATH '$')) apis ) aa
WHERE aa.api='/ad/create') t ON mm.ad_id=t.adid
ORDER BY ad_create_time DESC
LIMIT 0, 10
;
当获取到这个语句之后,利用sql查询器进行查询,最快的时候需要接近10秒钟,慢的时候要二十多秒,而且执行很不稳定。仔细分析,where 条件中的stat_date和advertiser_id如果范围大些慢的程度会是急速加大,况且系统刚上线,只有部分业务从老系统搬过来,bigdata_media_ad_list和bigdata_saasreport_projects_adgroup_hour 这两个大表目前数据量不大。
经过对业务的了解和对语句的分析,发现这个语句存在以下的不合理地方:
1. 语句的查询是以ad_id作为业务的键值,业务的键值应该设计一个基础表作为数据驱动,而不能从业务的执行表中取值,这样既容易遗漏数据,也容易出现用户不想要的数据。
2. 两个表中的stat_date和ad_create_time本来是日期类型,但都设计成varchar(32)类型,其中ad_create_time不统一,有些存字符串,有些存unix时间戳,真正用的时候,按照字符串的顺序,每次都是匹配到最后一位进行,导致查询的效率比较低,而且经常不走索引。
3. where 条件中的advertiser_id中的值过多,未来容易出现in里面有几十万个,导致送给数据库端一个超级庞大的文本。作者之前曾遇过有个语句放到文本中都过百兆,传到互联网服务器都要一两分钟,别说执行效果。
4. where条件中有个字段adid,这个字段没有在原表mk_task_record中分离单独作为一个字段,而是草草的一个json内容。
5. 查询中有3个比率的字段,应该在group by求和后后进行计算,而不是原字段上使用sum求和。
刚刚做了以上的分析,开发负责人就定了优化的目标,要求查询响应在1秒以内。从数据库角度看,可以从改变数据库结构简化查询,存储过程等,但都会导致现阶段代价太大,只能从优化这个语句入手。这个语句优化过程如下:
1. 这个语句多次出现派生表,阅读起来很困难。为了简化,先把最外层的select去掉。再次去掉bigdata_media_ad_list和bigdata_saasreport_projects_adgroup_hour的内层派生表。
SELECT
m.ad_create_time,
t.task_id,
m.media_id,
m.ad_id,
m.advertiser_id,
m.ad_modify_time,
m.ad_create_time,
m.`status`,
m.opt_status,
m.delivery_range,
m.inventory_type,
m.open_url,
m.bid,
m.budget,
m.budget_mode,
m.smart_bid_type,
m.start_time,
m.end_time,pricing,
m.flow_control_mode,
m.campaign_id,
m.download_url,
m.hide_if_converted,
IFNULL(h.view_count,0) AS `show`,
IFNULL(h.click_count,0) AS `click`,
IFNULL(h.cost,0) AS cost,
IFNULL(h.convert_cnt,0) AS `convert`,
IFNULL(h.avg_show_cost,0) AS avg_show_cost,
IFNULL(h.avg_click_cost,0) AS avg_click_cost,
IFNULL(h.ctr,0) AS ctr,
IFNULL(h.convert_cost,0) AS convert_cost,
IFNULL(h.convert_rate,0) AS convert_rate
FROM bigdata_media_ad_list m
LEFT JOIN (
SELECT
adgroup_id,
SUM(view_count) AS view_count,
SUM(click_count) AS click_count,
SUM(cost) AS cost,
SUM(convert_cnt) AS convert_cnt,
SUM(avg_show_cost) AS avg_show_cost,
SUM(avg_click_cost) AS avg_click_cost,
SUM(ctr) AS ctr,
SUM(convert_cost) AS convert_cost,
SUM(convert_rate) AS convert_rate
FROM bigdata_saasreport_projects_adgroup_hour h1
WHERE stat_date BETWEEN '2022-05-24' AND '2022-05-26'
GROUP BY adgroup_id
) h
ON m.ad_id=h.adgroup_id
LEFT JOIN media_campaign cam
ON m.campaign_id=cam.campaign_id
LEFT JOIN ( SELECT aa.task_id,aa.adid FROM(
SELECT *,
JSON_UNQUOTE(JSON_EXTRACT(task_status_data, CONCAT('$."', api, '"'))) adid
FROM tenant_jtnpct.mk_task_record,
JSON_TABLE(apis, '$[*]' COLUMNS (rowid FOR ORDINALITY , api VARCHAR(100) PATH '$')) apis ) aa
WHERE aa.api='/ad/create') t ON m.ad_id=t.adid
WHERE m.advertiser_id IN ( '22075143','22075142','22075141','22075140','22075138','22033241','21422817','19957653','19957651','19957646','19957642','19957640','24044093','24044092','24044091','24044090','24044088','24044086','24044085','24044084','24044083','24044082','23638767','23638766','23638764','23638762','23638760','23638758','23638755','23638752','23638749','23638744','22033339','22033337','19957649','19957647','24067537','24067536','24067534','24067532','24067531','24067519','24067518','24067516','24067514','24067513','24067512','24067511','24067509','24067508','24067507','24273760','24273759','24273758','24273757','24273756','24273753','24273752','24273751','24273750','24273749','24273061','24273060','24273059','24273058','24273057','24273054','24273053','24273050','24273048','24273047' )
ORDER BY m.ad_create_time DESC
LIMIT 0, 10
;
2. 在bigdata_media_ad_list,bigdata_saasreport_projects_adgroup_hour和media_campaign建立相关的索引,只关联这3个表发现查询效率并没有很多的提高。用explain查看,除了media_campaign这个表的where条件走了索引,其它的都不起作用。
3. 解决bigdata_saasreport_projects_adgroup_hour这个表的group by聚集行数慢的问题。
单独查询下面的子查询:
SELECT
adgroup_id,
SUM(view_count) AS view_count,
SUM(click_count) AS click_count,
SUM(cost) AS cost,
SUM(convert_cnt) AS convert_cnt,
SUM(avg_show_cost) AS avg_show_cost,
SUM(avg_click_cost) AS avg_click_cost,
SUM(ctr) AS ctr,
SUM(convert_cost) AS convert_cost,
SUM(convert_rate) AS convert_rate
FROM bigdata_saasreport_projects_adgroup_hour h1
WHERE stat_date BETWEEN '2022-05-24' AND '2022-05-26'
GROUP BY adgroup_id
;
竟然花了0.6到0.8秒的时间。检查表的索引,发现在stat_date和adgroup_id上都单独有独立的索引,考虑到group by 的字段是adgroup_id, 把这个字段的索引改成覆盖索引,包括stat_date. 然后执行查询。一下子变成0.02秒。
关联bigdata_media_ad_list,bigdata_saasreport_projects_adgroup_hour派生表和media_campaign,查询效果有所起色,一下子提高到了1.2秒。看来离目标迈进了一大步。
3. 当关联tenant_jtnpct.mk_task_record这个派生表时,结果查询时间又变成两到四秒钟。检查看下where条件中的adid是表mk_task_record中task_status_data字段提取出来的,key是“/ad/create”,看着这个/ad/create有两个不友好的字符/,简直崩溃,不能通过建立索引实现。那只能建立虚拟列和增加索引了。
ALTER TABLE mk_task_record ADD COLUMN (ad_create VARCHAR(32) GENERATED ALWAYS AS (task_status_data->'$."/ad/create"'));
CREATE INDEX idx_mk_task_record_02 ON mk_task_record(ad_create );
原来的关联变成:
SELECT
m.ad_create_time,
mtr.task_id,
m.media_id,
m.ad_id,
m.advertiser_id,
m.ad_modify_time,
m.ad_create_time,
m.`status`,
m.opt_status,
m.delivery_range,
m.inventory_type,
m.open_url,
m.bid,
m.budget,
m.budget_mode,
m.smart_bid_type,
m.start_time,
m.end_time,pricing,
m.flow_control_mode,
m.campaign_id,
m.download_url,
m.hide_if_converted,
IFNULL(h.view_count,0) AS `show`,
IFNULL(h.click_count,0) AS `click`,
IFNULL(h.cost,0) AS cost,
IFNULL(h.convert_cnt,0) AS `convert`,
IFNULL(h.avg_show_cost,0) AS avg_show_cost,
IFNULL(h.avg_click_cost,0) AS avg_click_cost,
IFNULL(h.ctr,0) AS ctr,
IFNULL(h.convert_cost,0) AS convert_cost,
IFNULL(h.convert_rate,0) AS convert_rate
FROM bigdata_media_ad_list m
LEFT JOIN (
SELECT SQL_BIG_RESULT
adgroup_id,
SUM(view_count) AS view_count,
SUM(click_count) AS click_count,
SUM(cost) AS cost,
SUM(convert_cnt) AS convert_cnt,
SUM(avg_show_cost) AS avg_show_cost,
SUM(avg_click_cost) AS avg_click_cost,
SUM(ctr) AS ctr,
SUM(convert_cost) AS convert_cost,
SUM(convert_rate) AS convert_rate
FROM bigdata_saasreport_projects_adgroup_hour h1
WHERE stat_date BETWEEN '2022-05-24' AND '2022-05-26'
GROUP BY adgroup_id
) h
ON m.ad_id=h.adgroup_id
LEFT JOIN media_campaign cam
ON m.campaign_id=cam.campaign_id
LEFT OUTER JOIN tenant_jtnpct.mk_task_record mtr
ON m.ad_id = mtr.ad_create
WHERE m.advertiser_id IN ( '22075143','22075142','22075141','22075140','22075138','22033241','21422817','19957653','19957651','19957646','19957642','19957640','24044093','24044092','24044091','24044090','24044088','24044086','24044085','24044084','24044083','24044082','23638767','23638766','23638764','23638762','23638760','23638758','23638755','23638752','23638749','23638744','22033339','22033337','19957649','19957647','24067537','24067536','24067534','24067532','24067531','24067519','24067518','24067516','24067514','24067513','24067512','24067511','24067509','24067508','24067507','24273760','24273759','24273758','24273757','24273756','24273753','24273752','24273751','24273750','24273749','24273061','24273060','24273059','24273058','24273057','24273054','24273053','24273050','24273048','24273047' )
ORDER BY m.ad_create_time DESC
LIMIT 0, 10
;
执行查询,结果在1.0到1.2秒之间。
4. 上一步完成之后,已经接近目标。要达到目标必须解决ORDER BY和limit较慢的问题。这里采用重读表join方法。语句变成:
SELECT
m.ad_create_time,
mtr.task_id,
m.media_id,
m.ad_id,
m.advertiser_id,
m.ad_modify_time,
m.ad_create_time,
m.`status`,
m.opt_status,
m.delivery_range,
m.inventory_type,
m.open_url,
m.bid,
m.budget,
m.budget_mode,
m.smart_bid_type,
m.start_time,
m.end_time,pricing,
m.flow_control_mode,
m.campaign_id,
m.download_url,
m.hide_if_converted,
IFNULL(h.view_count,0) AS `show`,
IFNULL(h.click_count,0) AS `click`,
IFNULL(h.cost,0) AS cost,
IFNULL(h.convert_cnt,0) AS `convert`,
IFNULL(h.avg_show_cost,0) AS avg_show_cost,
IFNULL(h.avg_click_cost,0) AS avg_click_cost,
IFNULL(h.ctr,0) AS ctr,
IFNULL(h.convert_cost,0) AS convert_cost,
IFNULL(h.convert_rate,0) AS convert_rate
FROM media_ad_list m
JOIN (SELECT id FROM media_ad_list
WHERE advertiser_id IN ( '22075143','22075142','22075141','22075140','22075138','22033241','21422817','19957653','19957651','19957646','19957642','19957640','24044093','24044092','24044091','24044090','24044088','24044086','24044085','24044084','24044083','24044082','23638767','23638766','23638764','23638762','23638760','23638758','23638755','23638752','23638749','23638744','22033339','22033337','19957649','19957647','24067537','24067536','24067534','24067532','24067531','24067519','24067518','24067516','24067514','24067513','24067512','24067511','24067509','24067508','24067507','24273760','24273759','24273758','24273757','24273756','24273753','24273752','24273751','24273750','24273749','24273061','24273060','24273059','24273058','24273057','24273054','24273053','24273050','24273048','24273047' )
ORDER BY ad_create_time DESC
LIMIT 0, 10
) m1
ON m.id = m1.id
LEFT JOIN (
SELECT SQL_BIG_RESULT
adgroup_id,
SUM(view_count) AS view_count,
SUM(click_count) AS click_count,
SUM(cost) AS cost,
SUM(convert_cnt) AS convert_cnt,
SUM(avg_show_cost) AS avg_show_cost,
SUM(avg_click_cost) AS avg_click_cost,
SUM(ctr) AS ctr,
SUM(convert_cost) AS convert_cost,
SUM(convert_rate) AS convert_rate
FROM projects_adgroup_hour h1
WHERE stat_date BETWEEN '2022-05-24' AND '2022-05-26'
GROUP BY adgroup_id
) h
ON m.ad_id=h.adgroup_id
LEFT JOIN media_campaign cam
ON m.campaign_id=cam.campaign_id
LEFT OUTER JOIN task_record mtr
ON m.ad_id = mtr.ad_create
WHERE m.advertiser_id IN ( '22075143','22075142','22075141','22075140','22075138','22033241','21422817','19957653','19957651','19957646','19957642','19957640','24044093','24044092','24044091','24044090','24044088','24044086','24044085','24044084','24044083','24044082','23638767','23638766','23638764','23638762','23638760','23638758','23638755','23638752','23638749','23638744','22033339','22033337','19957649','19957647','24067537','24067536','24067534','24067532','24067531','24067519','24067518','24067516','24067514','24067513','24067512','24067511','24067509','24067508','24067507','24273760','24273759','24273758','24273757','24273756','24273753','24273752','24273751','24273750','24273749','24273061','24273060','24273059','24273058','24273057','24273054','24273053','24273050','24273048','24273047' )
ORDER BY m.ad_create_time DESC
LIMIT 0, 10
;
执行查询。响应时间在0.5到0.7秒之间,这可宣布大功告成。其它的优化在后面迭代中慢慢在优化了。