需求:
小表数据量20w条左右,大表数据量在4kw条左右,需要根据大表筛选出150w条左右的数据并关联更新小表中5k左右的数据。
性能问题:
对筛选条件中涉及的字段加index后,如下常规的update语句仍耗时半小时左右。
UPDATEWMOCDCREPORT.DM_WM_TRADINGALL ASET(
A.RELATIONSHIPNO,
A.PACKAGE
)=(SELECTB.RELATIONSHIPNO,CASE
WHEN (B.SEGMENTCODE='52'
OR B.SEGMENTCODE ='55'
OR B.SEGMENTCODE ='56'
OR B.SEGMENTCODE ='59')THEN 'BC'
WHEN (B.SEGMENTCODE='66')THEN 'PW'
WHEN (B.SEGMENTCODE='60')THEN 'MM'
WHEN (B.SEGMENTCODE='65')THEN 'EB'
WHEN (B.SEGMENTCODE='61')THEN 'PB'
ELSEB.SEGMENTCODEEND
FROMDATACORE.DF_CUST_HISTORY BWHERE B.ACCOUNT_NO=A.SETTLEMENTACCOUNTAND B.DATA_DATE = '2018-11-30'
AND rownum = 1)WHERE A.MONTH = 'SEP'
AND A.DATA_DATE = '2018-09-30'
AND EXISTS(SELECT 1 FROMDATACORE.DF_CUST_HISTORY CWHERE C.ACCOUNT_NO=A.SETTLEMENTACCOUNTAND C.DATA_DATE = '2018-11-30');
经过数次搜索,发现同关联更新有关的技术博客基本上是更新大表数据,比如here.(使用批量更新)。
也分析过执行计划,同预想的性能瓶颈一样,主要由以下两个方面造成
(1) DATACORE.DF_CUST_HISTORY数据量太大,本想将某一天的数据select出来提前插入到一张表中,但估计效果不会太明显,因为插入150w条数据本身也会耗时很长。
(2) 需要更新5k条数据,且每条数据需要到150w条数据中做关联查询(时间主要耗在这)。
性能优化:
小表5k,大表150w,理所应当想到采用join的方式并保留小表中的数据。接下来是怎么把join后的数据更新到小表中(不用update)?merge into!
这里还涉及到一个小问题,merge into中的on条件需要保证一一对应,而大表中很可能出现重复的ACCOUNT_NO,所以需要排重,怎么做?用partition by !
优化后的sql(运行时间8-10s):
merge intowmocdcreport.dm_wm_tradingall a
using (selectt.rid,
t.settlementaccount,
tx.relationshipno,case
when (tx.segmentcode = '52' or tx.segmentcode = '55' ortx.segmentcode= '56' or tx.segmentcode = '59') then
'BC'
when (tx.segmentcode = '66') then
'PW'
when (tx.segmentcode = '60') then
'MM'
when (tx.segmentcode = '65') then
'EB'
when (tx.segmentcode = '61') then
'PB'
elsetx.segmentcodeend aspackagefrom(selectrowid rid,
dwt.settlementaccountfromwmocdcreport.dm_wm_tradingall dwtwhere dwt.month = 'SEP'
and dwt.data_date = '2018-09-30') tinner join(select row_number() over (partition by c.account_no order byc.relationshipno) seq,
c.account_no,
c.relationshipno,
c.segmentcodefromdatacore.df_cust_history cwhere c.data_date = '2018-11-30') txon tx.account_no = t.settlementaccount and tx.seq = 1) bon (a.rowid =b.rid)when matched then
update set a.relationshipno =b.relationshipno,
a.package= b.package;