一个sql优化案例

最近,公司上线一个新项目,但上线后没几天,数据库的性能问题尤为明显,其中一个较为特殊,查询缓慢,还经常导致应用端服务内存溢出而崩溃。

原语句:

SELECT
    media_id,ad_id,ad_name,
    advertiser_id,ad_modify_time,
    ad_create_time,
  `status`,opt_status,delivery_range,inventory_type,open_url,bid,budget,budget_mode,mm.smart_bid_type,start_time,end_time,pricing,flow_control_mode,mm.campaign_id,download_url,hide_if_converted,
    cam.campaign_name,
    cam.campaign_status,
    cam.campaign_type,
    cam.promoted_object_type,
    cam.campaign_budget,
    cam.campaign_create_time,
    cam.campaign_budget_mode,
    cam.marketing_purpose,
    cam.landing_type,
    cam.marketing_goal,
    cam.marketing_scene,
    t.task_id
    FROM
    (
    SELECT m.*,
    IFNULL(h.view_count,0) AS `show`,
    IFNULL(h.click_count,0) AS `click`,
    IFNULL(h.cost,0) AS cost,
    IFNULL(h.convert_cnt,0) AS `convert`,
    IFNULL(h.avg_show_cost,0) AS avg_show_cost,
    IFNULL(h.avg_click_cost,0) AS avg_click_cost,
    IFNULL(h.ctr,0) AS ctr,
    IFNULL(h.convert_cost,0) AS convert_cost,
    IFNULL(h.convert_rate,0) AS convert_rate

    FROM bigdata_media_ad_list m 
    LEFT JOIN (
    SELECT adgroup_id,
    SUM(view_count) AS view_count,
    SUM(click_count) AS click_count,
    SUM(cost) AS cost,
    SUM(convert_cnt) AS convert_cnt,
    SUM(avg_show_cost) AS avg_show_cost,
    SUM(avg_click_cost) AS avg_click_cost,
    SUM(ctr) AS ctr,
    SUM(convert_cost) AS convert_cost,
    SUM(convert_rate) AS convert_rate
    FROM bigdata_saasreport_projects_adgroup_hour 
        WHERE stat_date BETWEEN '2022-05-19' AND '2022-05-26' GROUP BY adgroup_id) h 
        ON m.ad_id=h.adgroup_id 

       WHERE m.advertiser_id IN ( "22075143","22075142","22075141","22075140","22075138","22033241","21422817","19957653","19957651","19957646","19957642","19957640","24044093","24044092","24044091","24044090","24044088","24044086","24044085","24044084","24044083","24044082","23638767","23638766","23638764","23638762","23638760","23638758","23638755","23638752","23638749","23638744","22033339","22033337","19957649","19957647","24067537","24067536","24067534","24067532","24067531","24067519","24067518","24067516","24067514","24067513","24067512","24067511","24067509","24067508","24067507","24273760","24273759","24273758","24273757","24273756","24273753","24273752","24273751","24273750","24273749","24273061","24273060","24273059","24273058","24273057","24273054","24273053","24273050","24273048","24273047" )
           ) AS mm 
     LEFT JOIN media_campaign cam 
       ON mm.campaign_id=cam.campaign_id 
     LEFT JOIN ( SELECT aa.task_id,aa.adid FROM(
                          SELECT *, 
                          JSON_UNQUOTE(JSON_EXTRACT(task_status_data, CONCAT('$."', api, '"'))) adid 
                          FROM tenant_jtnpct.mk_task_record, JSON_TABLE(apis, '$[*]' COLUMNS (rowid FOR ORDINALITY , api VARCHAR(100) PATH '$')) apis ) aa 
        WHERE aa.api='/ad/create') t ON mm.ad_id=t.adid 
ORDER BY ad_create_time DESC 
        LIMIT 0, 10

        当获取到这个语句之后,利用sql查询器进行查询,最快的时候需要接近10秒钟,慢的时候要二十多秒,而且执行很不稳定。仔细分析,where 条件中的stat_date和advertiser_id如果范围大些慢的程度会是急速加大,况且系统刚上线,只有部分业务从老系统搬过来,bigdata_media_ad_list和bigdata_saasreport_projects_adgroup_hour 这两个大表目前数据量不大。

        经过对业务的了解和对语句的分析,发现这个语句存在以下的不合理地方:

        1. 语句的查询是以ad_id作为业务的键值,业务的键值应该设计一个基础表作为数据驱动,而不能从业务的执行表中取值,这样既容易遗漏数据,也容易出现用户不想要的数据。

        2. 两个表中的stat_date和ad_create_time本来是日期类型,但都设计成varchar(32)类型,其中ad_create_time不统一,有些存字符串,有些存unix时间戳,真正用的时候,按照字符串的顺序,每次都是匹配到最后一位进行,导致查询的效率比较低,而且经常不走索引。

        3. where 条件中的advertiser_id中的值过多,未来容易出现in里面有几十万个,导致送给数据库端一个超级庞大的文本。作者之前曾遇过有个语句放到文本中都过百兆,传到互联网服务器都要一两分钟,别说执行效果。

        4. where条件中有个字段adid,这个字段没有在原表mk_task_record中分离单独作为一个字段,而是草草的一个json内容。

        5. 查询中有3个比率的字段,应该在group by求和后后进行计算,而不是原字段上使用sum求和。

        刚刚做了以上的分析,开发负责人就定了优化的目标,要求查询响应在1秒以内。从数据库角度看,可以从改变数据库结构简化查询,存储过程等,但都会导致现阶段代价太大,只能从优化这个语句入手。这个语句优化过程如下:

        1. 这个语句多次出现派生表,阅读起来很困难。为了简化,先把最外层的select去掉。再次去掉bigdata_media_ad_list和bigdata_saasreport_projects_adgroup_hour的内层派生表。


SELECT 
    m.ad_create_time,
    t.task_id,
    m.media_id,
    m.ad_id,
    m.advertiser_id,
    m.ad_modify_time,
    m.ad_create_time,
    m.`status`,
    m.opt_status,
    m.delivery_range,
    m.inventory_type,
    m.open_url,
    m.bid,
    m.budget,
    m.budget_mode,
    m.smart_bid_type,
    m.start_time,
    m.end_time,pricing,
    m.flow_control_mode,
    m.campaign_id,
    m.download_url,
    m.hide_if_converted,
    IFNULL(h.view_count,0) AS `show`,
    IFNULL(h.click_count,0) AS `click`,
    IFNULL(h.cost,0) AS cost,
    IFNULL(h.convert_cnt,0) AS `convert`,
    IFNULL(h.avg_show_cost,0) AS avg_show_cost,
    IFNULL(h.avg_click_cost,0) AS avg_click_cost,
    IFNULL(h.ctr,0) AS ctr,
    IFNULL(h.convert_cost,0) AS convert_cost,
    IFNULL(h.convert_rate,0) AS convert_rate
    FROM bigdata_media_ad_list m   
    LEFT JOIN (
           SELECT
                   adgroup_id,
                   SUM(view_count) AS view_count,
                   SUM(click_count) AS click_count,
                   SUM(cost) AS cost,
                   SUM(convert_cnt) AS convert_cnt,
                   SUM(avg_show_cost) AS avg_show_cost,
                   SUM(avg_click_cost) AS avg_click_cost,
                   SUM(ctr) AS ctr,
                   SUM(convert_cost) AS convert_cost,
                   SUM(convert_rate) AS convert_rate
                 FROM bigdata_saasreport_projects_adgroup_hour h1
        WHERE stat_date BETWEEN '2022-05-24' AND '2022-05-26'
        GROUP BY adgroup_id
        ) h 
        ON m.ad_id=h.adgroup_id 
      LEFT JOIN media_campaign cam 
       ON m.campaign_id=cam.campaign_id 
      LEFT JOIN ( SELECT aa.task_id,aa.adid FROM(
                          SELECT *, 
                          JSON_UNQUOTE(JSON_EXTRACT(task_status_data, CONCAT('$."', api, '"'))) adid 
                          FROM tenant_jtnpct.mk_task_record, 
                          JSON_TABLE(apis, '$[*]' COLUMNS (rowid FOR ORDINALITY , api VARCHAR(100) PATH '$')) apis ) aa 
        WHERE aa.api='/ad/create') t ON m.ad_id=t.adid 
WHERE m.advertiser_id IN ( '22075143','22075142','22075141','22075140','22075138','22033241','21422817','19957653','19957651','19957646','19957642','19957640','24044093','24044092','24044091','24044090','24044088','24044086','24044085','24044084','24044083','24044082','23638767','23638766','23638764','23638762','23638760','23638758','23638755','23638752','23638749','23638744','22033339','22033337','19957649','19957647','24067537','24067536','24067534','24067532','24067531','24067519','24067518','24067516','24067514','24067513','24067512','24067511','24067509','24067508','24067507','24273760','24273759','24273758','24273757','24273756','24273753','24273752','24273751','24273750','24273749','24273061','24273060','24273059','24273058','24273057','24273054','24273053','24273050','24273048','24273047' )
   ORDER BY m.ad_create_time DESC 
LIMIT 0, 10

;

2. 在bigdata_media_ad_list,bigdata_saasreport_projects_adgroup_hour和media_campaign建立相关的索引,只关联这3个表发现查询效率并没有很多的提高。用explain查看,除了media_campaign这个表的where条件走了索引,其它的都不起作用。

3. 解决bigdata_saasreport_projects_adgroup_hour这个表的group by聚集行数慢的问题。

    单独查询下面的子查询:

    SELECT
                   adgroup_id,
                   SUM(view_count) AS view_count,
                   SUM(click_count) AS click_count,
                   SUM(cost) AS cost,
                   SUM(convert_cnt) AS convert_cnt,
                   SUM(avg_show_cost) AS avg_show_cost,
                   SUM(avg_click_cost) AS avg_click_cost,
                   SUM(ctr) AS ctr,
                   SUM(convert_cost) AS convert_cost,
                   SUM(convert_rate) AS convert_rate
                 FROM bigdata_saasreport_projects_adgroup_hour h1
        WHERE stat_date BETWEEN '2022-05-24' AND '2022-05-26'
        GROUP BY adgroup_id

     ;

     竟然花了0.6到0.8秒的时间。检查表的索引,发现在stat_date和adgroup_id上都单独有独立的索引,考虑到group by 的字段是adgroup_id, 把这个字段的索引改成覆盖索引,包括stat_date. 然后执行查询。一下子变成0.02秒。

     关联bigdata_media_ad_list,bigdata_saasreport_projects_adgroup_hour派生表和media_campaign,查询效果有所起色,一下子提高到了1.2秒。看来离目标迈进了一大步。

3. 当关联tenant_jtnpct.mk_task_record这个派生表时,结果查询时间又变成两到四秒钟。检查看下where条件中的adid是表mk_task_record中task_status_data字段提取出来的,key是“/ad/create”,看着这个/ad/create有两个不友好的字符/,简直崩溃,不能通过建立索引实现。那只能建立虚拟列和增加索引了。

ALTER TABLE mk_task_record ADD COLUMN (ad_create VARCHAR(32) GENERATED ALWAYS AS (task_status_data->'$."/ad/create"'));

CREATE INDEX idx_mk_task_record_02 ON mk_task_record(ad_create );

原来的关联变成:

SELECT 
    m.ad_create_time,
    mtr.task_id,
    m.media_id,
    m.ad_id,
    m.advertiser_id,
    m.ad_modify_time,
    m.ad_create_time,
    m.`status`,
    m.opt_status,
    m.delivery_range,
    m.inventory_type,
    m.open_url,
    m.bid,
    m.budget,
    m.budget_mode,
    m.smart_bid_type,
    m.start_time,
    m.end_time,pricing,
    m.flow_control_mode,
    m.campaign_id,
    m.download_url,
    m.hide_if_converted,
    IFNULL(h.view_count,0) AS `show`,
    IFNULL(h.click_count,0) AS `click`,
    IFNULL(h.cost,0) AS cost,
    IFNULL(h.convert_cnt,0) AS `convert`,
    IFNULL(h.avg_show_cost,0) AS avg_show_cost,
    IFNULL(h.avg_click_cost,0) AS avg_click_cost,
    IFNULL(h.ctr,0) AS ctr,
    IFNULL(h.convert_cost,0) AS convert_cost,
    IFNULL(h.convert_rate,0) AS convert_rate
    FROM bigdata_media_ad_list m 
    
    LEFT JOIN (
           SELECT SQL_BIG_RESULT
                   adgroup_id,
                   SUM(view_count) AS view_count,
                   SUM(click_count) AS click_count,
                   SUM(cost) AS cost,
                   SUM(convert_cnt) AS convert_cnt,
                   SUM(avg_show_cost) AS avg_show_cost,
                   SUM(avg_click_cost) AS avg_click_cost,
                   SUM(ctr) AS ctr,
                   SUM(convert_cost) AS convert_cost,
                   SUM(convert_rate) AS convert_rate
                 FROM bigdata_saasreport_projects_adgroup_hour h1
        WHERE stat_date BETWEEN '2022-05-24' AND '2022-05-26'
        GROUP BY adgroup_id
        ) h 
        ON m.ad_id=h.adgroup_id 
      LEFT JOIN media_campaign cam 
       ON m.campaign_id=cam.campaign_id 
      LEFT OUTER JOIN tenant_jtnpct.mk_task_record mtr
        ON m.ad_id = mtr.ad_create
WHERE m.advertiser_id IN ( '22075143','22075142','22075141','22075140','22075138','22033241','21422817','19957653','19957651','19957646','19957642','19957640','24044093','24044092','24044091','24044090','24044088','24044086','24044085','24044084','24044083','24044082','23638767','23638766','23638764','23638762','23638760','23638758','23638755','23638752','23638749','23638744','22033339','22033337','19957649','19957647','24067537','24067536','24067534','24067532','24067531','24067519','24067518','24067516','24067514','24067513','24067512','24067511','24067509','24067508','24067507','24273760','24273759','24273758','24273757','24273756','24273753','24273752','24273751','24273750','24273749','24273061','24273060','24273059','24273058','24273057','24273054','24273053','24273050','24273048','24273047' )
   ORDER BY m.ad_create_time DESC 
LIMIT 0, 10

;

执行查询,结果在1.0到1.2秒之间。

4. 上一步完成之后,已经接近目标。要达到目标必须解决ORDER BY和limit较慢的问题。这里采用重读表join方法。语句变成:

SELECT 
    m.ad_create_time,
    mtr.task_id,
    m.media_id,
    m.ad_id,
    m.advertiser_id,
    m.ad_modify_time,
    m.ad_create_time,
    m.`status`,
    m.opt_status,
    m.delivery_range,
    m.inventory_type,
    m.open_url,
    m.bid,
    m.budget,
    m.budget_mode,
    m.smart_bid_type,
    m.start_time,
    m.end_time,pricing,
    m.flow_control_mode,
    m.campaign_id,
    m.download_url,
    m.hide_if_converted,
    IFNULL(h.view_count,0) AS `show`,
    IFNULL(h.click_count,0) AS `click`,
    IFNULL(h.cost,0) AS cost,
    IFNULL(h.convert_cnt,0) AS `convert`,
    IFNULL(h.avg_show_cost,0) AS avg_show_cost,
    IFNULL(h.avg_click_cost,0) AS avg_click_cost,
    IFNULL(h.ctr,0) AS ctr,
    IFNULL(h.convert_cost,0) AS convert_cost,
    IFNULL(h.convert_rate,0) AS convert_rate
    FROM media_ad_list m 
    JOIN (SELECT id FROM media_ad_list 
          WHERE advertiser_id IN ( '22075143','22075142','22075141','22075140','22075138','22033241','21422817','19957653','19957651','19957646','19957642','19957640','24044093','24044092','24044091','24044090','24044088','24044086','24044085','24044084','24044083','24044082','23638767','23638766','23638764','23638762','23638760','23638758','23638755','23638752','23638749','23638744','22033339','22033337','19957649','19957647','24067537','24067536','24067534','24067532','24067531','24067519','24067518','24067516','24067514','24067513','24067512','24067511','24067509','24067508','24067507','24273760','24273759','24273758','24273757','24273756','24273753','24273752','24273751','24273750','24273749','24273061','24273060','24273059','24273058','24273057','24273054','24273053','24273050','24273048','24273047' )
           ORDER BY ad_create_time DESC 
           LIMIT 0, 10
          ) m1
      ON m.id = m1.id     
    LEFT JOIN (
           SELECT SQL_BIG_RESULT
                   adgroup_id,
                   SUM(view_count) AS view_count,
                   SUM(click_count) AS click_count,
                   SUM(cost) AS cost,
                   SUM(convert_cnt) AS convert_cnt,
                   SUM(avg_show_cost) AS avg_show_cost,
                   SUM(avg_click_cost) AS avg_click_cost,
                   SUM(ctr) AS ctr,
                   SUM(convert_cost) AS convert_cost,
                   SUM(convert_rate) AS convert_rate
                 FROM projects_adgroup_hour h1
        WHERE stat_date BETWEEN '2022-05-24' AND '2022-05-26'
        GROUP BY adgroup_id
        ) h 
        ON m.ad_id=h.adgroup_id 
      LEFT JOIN media_campaign cam 
       ON m.campaign_id=cam.campaign_id 
      LEFT OUTER JOIN task_record mtr
        ON m.ad_id = mtr.ad_create
WHERE m.advertiser_id IN ( '22075143','22075142','22075141','22075140','22075138','22033241','21422817','19957653','19957651','19957646','19957642','19957640','24044093','24044092','24044091','24044090','24044088','24044086','24044085','24044084','24044083','24044082','23638767','23638766','23638764','23638762','23638760','23638758','23638755','23638752','23638749','23638744','22033339','22033337','19957649','19957647','24067537','24067536','24067534','24067532','24067531','24067519','24067518','24067516','24067514','24067513','24067512','24067511','24067509','24067508','24067507','24273760','24273759','24273758','24273757','24273756','24273753','24273752','24273751','24273750','24273749','24273061','24273060','24273059','24273058','24273057','24273054','24273053','24273050','24273048','24273047' )
   ORDER BY m.ad_create_time DESC 
LIMIT 0, 10

执行查询。响应时间在0.5到0.7秒之间,这可宣布大功告成。其它的优化在后面迭代中慢慢在优化了。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值