teradata sql中,各子句的执行顺序为where->join->group by->qualify->select,优化的目的就是最小化I/O,所以确保每个步骤过滤掉尽可能多的记录是很重要的。
下面这个例子主要描述qualify对语句的影响:
CREATE VOLATILE MULTISET TABLE A10_tmp,NO log AS
--取area_id
(
SELECT
A.*
,b.area_id custom_area_id
FROM xx_asset_k a Left Join
xxx.OFR_ASSET_MKT_HIST_a b
ON a.Asset_Row_Id = b.Asset_Row_Id
QUALIFY ROW_NUMBER() OVER(PARTITION BY b.Asset_Row_Id
ORDER BY Serv_Seq_Nbr DESC) = 1
) With data
PRIMARY INDEX (Asset_Row_Id)
ON
COMMIT PRESERVE ROWS;
(
SELECT
FROM xx_asset_k a Left Join
xxx.OFR_ASSET_MKT_HIST_a b
QUALIFY ROW_NUMBER() OVER(PARTITION BY b.Asset_Row_Id
ORDER BY Serv_Seq_Nbr DESC) = 1
) With data
PRIMARY INDEX (Asset_Row_Id)
COMMIT PRESERVE ROWS;
该语句中的qualify子句只引用了一个表xxx.OFR_ASSET_MKT_HIST_a,并且对于每个asset_row_id,都有1个到10多个Serv_Seq_Nbr,由于优化器先进行两表关联操作,这就会导致要关联的记录有很多的重复并产生更多的结果集。
所以,对于该语句的优化,只要改变执行计划确保优化器先对b表进行排重操作即可。
SELECT prodid, SUM(sales) AS sumsales,
RANK( ) OVER (ORDER BY sumsales DESC)
AS "Ranking"
FROM salestbl
GROUP BY 1
QUALIFY Ranking <= 3;
RANK( ) OVER (ORDER BY sumsales DESC)
AS "Ranking"
FROM salestbl
GROUP BY 1
QUALIFY Ranking <= 3;