Performance Tuning - NO.2 Execute Plan

http://www.codeproject.com/KB/database/RefactorTSQLs.aspx
How to analyze and identify the scope for improvement in your TSQLs?  

In an ideal world, you always prevent diseases rather than cure. But, in reality you just can’t prevent always. I know your team is composed of brilliant professionals. I know you have good review process, but still bad codes are written, still poor design takes place. Why? Because, no matter what advanced technology you are going to use, your client requirement will always be way much advanced and this is a universal truth in the Software development. As a result, designing, developing and delivering a system based on the requirement will always be a challenging job for you.

So, it’s equally important that you know how to cure. You really need to know how to troubleshoot a performance problem after it happens. You need to learn the ways to analyze the TSQLs, identify the bottlenecks and re-factor those to troubleshoot the performance problem. To be true, there are numerous ways to troubleshoot database and TSQL performance problems, but, at the most basic levels, you have to understand and review the execution plan of the TSQLs that you need to analyze.

Understanding the query execution plan  

Whenever you issue an SQL in the SQL Server engine, the SQL Server first has to determine the best possible way to execute it. In order to carry this out, the Query optimizer (A system that generates the optimal query execution plan before executing the query) uses several information like the data distribution statistics, index structure, metadata and other information to analyze several possible execution plans and finally selects one that is likely to be the best execution plan most of the cases.

Did you know? You can use the SQL Server Management Studio to preview and analyze the estimated execution plan for the query that you are going to issue. After writing the SQL in the SQL Server Management Studio, click on the estimated execution plan icon (See below) to see the execution plan before actually executing the query.

(Note: Alternatively, you can switch the Actual execution plan option “on” before executing the query. If you do this, the Management Studio will include the actual execution plan that is being executed along with the result set in the result window)

Estimated_execution_plan.jpg

Figure: Estimated execution plan in Management Studio

Understanding the query execution plan in detail

Each icon in the execution plan graph represents one action item (Operator) in the plan. The execution plan has to be read from right to left and each action item has a percentage of cost relative to the total execution cost of the query (100%).

In the above execution plan graph, the first icon in the right most part represents a “Clustered Index Scan” operation (Reading all primary key index values in the table) in the HumanResources Table (That requires 100% of the total query execution cost) and the left most icon in the graph represents a SELECT operation (That requires only 0% of the total query execution cost).

Following are the important icons and their corresponding operators you are going to see frequently in the graphical query execution plans:

QueryPlanOperators.JPG

(Each icon in the graphical execution plan represents a particular action item in the query. For a complete list of the icons and their corresponding action item, go to http://technet.microsoft.com/en-us/library/ms175913.aspx )

Note the “Query cost” in the execution plan given above. It has 100% cost relative to the batch. That means, this particular query has 100% cost among all queries in the batch as there is only one query in the batch. If there were multiple queries simultaneously executed in the query window, each query would have its own percentage of cost (Less than 100%).

To know more detail for each particular action item in the query plan, move the mouse pointer on each item/icon. You will see a window that looks like the following:

Query_plan_info.jpg

This window provides detailed estimated information about a particular query item in the execution plan. The above window shows the estimated detailed information for the clustered index scan and it looks for the row(s) which have/has Gender = ‘M’ in the Employee table in HumanResources schema in the AdventureWorks database. The window also shows estimated IO, CPU number of rows with size of each row and other costs that is uses to compare with other possible execution plans to select the optimal plan.

I found an article that can help you further understanding and analyzing the TSQL execution plans in detail. You can take a look at it here: http://www.simple-talk.com/sql/performance/execution-plan-basics/

What information do we get by viewing the execution plans?  

Whenever any of your query performs slowly, you can view the estimated (And, actual if required) execution plan and can identify the item that is taking the most amount of time (In terms of percentage) in the query. When you start reviewing any TSQL for any optimization, most of the cases, the first thing you would like to do is to view the execution plan. You will most likely to quickly identify the area in the SQL that is creating the bottlenecks in the overall SQL.

Keep watching for the following costly operators in the execution plan of your query. If you find one of these, you are likely to have problems in your TSQL and you need to re-factor the TSQL to try to improve performance.

Table Scan: Occurs when the corresponding table does not have a clustered index. Most likely, creating clustered index or defragmenting indexes will enable you to get rid of it.

Clustered Index Scan: Sometimes considered equivalent to Table Scan. Takes place when non-clustered index on an eligible column is not available. Most of the cases, creating non-clustered index will enable you to get rid of it.

Hash Join: Most expensive joining methodology. This takes place when the joining columns between two tables are not indexed . Creating indexes on those columns will enable you to get rid of it.

Nested Loops: Most cases, this happens when a non-clustered index does not include (Cover) a column that is used in the SELECT column list. In this case, for each member in the non-clustered index column the database server has to seek into the clustered index to retrieve the other column value specified in the SELECT list. Creating covered index will enable you to get rid of it.

RID Lookup: Takes place when you have a non-clustered index, but, the same table does not have any clustered index . In this case, the database engine has to look up the actual row using the row ID which is an expensive operation. Creating a clustered index on the corresponding table would enable you to get rid of it.

TSQL Refactoring-A real life story  

Knowledge comes into values only when applied to solve real-life problems. No matter how knowledgeable you are, you need to utilize your knowledge in an effective way in order to solve your problems.

Let’s read a real life story. In this story, Mr. Tom is one of the members of the development team that built the application that we have mentioned earlier.

When we started our optimization mission in the data access routines (TSQLs) of our application, we identified a Stored Procedure that was performing way below the expected level of performance. It was taking more than 50 seconds to process and retrieve sales data for one month for particular sales items in the production database. Following is how the stored procedure was getting invoked for retrieving sales data for ‘Caps’ for the year 2009:

Collapse
exec



uspGetSalesInfoForDateRange ‘1/1/2009’, 31/12/2009,’Cap’

Accordingly, Mr. Tom was assigned to optimize the Stored Procedure.

Following is a stored procedure that is somewhat close to the original one (I can’t include the original stored procedure for proprietary issue you know).

Collapse
ALTER PROCEDURE uspGetSalesInfoForDateRange

@startYear DateTime,

@endYear DateTime,

@keyword nvarchar(50)

AS

BEGIN

SET NOCOUNT ON;

SELECT

Name,

ProductNumber,

ProductRates.CurrentProductRate Rate,

ProductRates.CurrentDiscount Discount,

OrderQty Qty,

dbo.ufnGetLineTotal(SalesOrderDetailID) Total,

OrderDate,

DetailedDescription

FROM

Products INNER JOIN OrderDetails

ON Products.ProductID = OrderDetails.ProductID

INNER JOIN Orders

ON Orders.SalesOrderID = OrderDetails.SalesOrderID

INNER JOIN ProductRates

ON

Products.ProductID = ProductRates.ProductID

WHERE

OrderDate between @startYear and @endYear

AND

(

ProductName LIKE '' + @keyword + ' %' OR

ProductName LIKE '% ' + @keyword + ' ' + '%' OR

ProductName LIKE '% ' + @keyword + '%' OR

Keyword LIKE '' + @keyword + ' %' OR

Keyword LIKE '% ' + @keyword + ' ' + '%' OR

Keyword LIKE '% ' + @keyword + '%'

)

ORDER BY

ProductName

END



GO






Analyzing the indexes  

As a first step, Mr. Tom wanted to review the indexes of the tables that are being queried in the Stored Procedure. He had a quick look into the query and identified the fields that the tables should have indexes on (For example, fields that have been used in the join queries, WHERE conditions and ORDER BY clause ). Immediately he found that, several indexes are missing on some of these columns. For example, indexes on following two columns were missing:

OrderDetails.ProductID  

OrderDetails.SalesOrderID  

He created non-clustered indexes on those two columns and executed the stored procedure as follows:

Collapse
exec



uspGetSalesInfoForDateRange ‘1/1/2009’, 31/12/2009 with


recompile

The Stored Procedure’s performance was improved now, but still below the expected level (35 seconds ). (Note the “with recompile” clause. It forces the SQL Server engine to recompile the stored procedure and re-generate the execution plan to take advantage of the newly built indexes).

Analyzing the query execution plan  

Mr. Tom’s next step was to see the execution plan in the SQL Server Management Studio. He did this by writing the ‘exec’ statement for the stored procedure in the query window and viewing the “Estimated execution plan”. (The execution plan is not included here as it is quite a big one that is not going to fit in screen).

Analyzing the execution plan he identified some important scopes for improvement

  • A table scan was taking place on a table while executing the query even though the table has proper indexing implementation. The table scan was taking 30% of the overall query execution time.
  • A “Nested loop join” (One of three kinds of joining implementation) was occurring for selecting a column () from a table specified in the SELECT list in the query.

Being curious about the table scan issue, Mr. Tom wanted to know if any index fragmentation took place or not (Because, all indexes were properly implemented). He ran a TSQL that reports the index fragmentation information on table columns in the database (He collected this from a CodeProject article on Data access optimization) and was surprised to see that, 2 of the existing indexes (In the corresponding tables used in the TSQL in the Stored Procedure) had fragmentation that were responsible for the Table scan operation. Immediately, he defragmented those 2 indexes and found out that the table scan was not occurring and the stored procedure was taking 25 seconds now to execute.

In order to get rid of the “Nested loop join”, he implanted a “Covered index” in the corresponding table including the column in the SELECT list. As a result, when selecting the column, the database engine was able to retrieve the column value in the non-clustered index node. Doing this reduced the query performance up to 23 seconds now.

Implementing some best practices   

Mr. Tom now decided to look for any piece of code in the stored procedure that did not conform to the best practices. Following were the changes that he did to implement some best practices: 

Getting rid of the “Procedural code”   

Mr. Tom identified that, a UDF ufnGetLineTotal(SalesOrderDetailID) was getting executed for each row in the result set and the UDF simply was executing another TSQL using a value in the supplied parameter and was returning a scalar value. Following was the UDF definition:

 

Collapse
ALTER



FUNCTION


[dbo].[ufnGetLineTotal]



(



@SalesOrderDetailID int






)



RETURNS money






AS






BEGIN










DECLARE


@CurrentProductRate money






DECLARE


@CurrentDiscount money






DECLARE


@Qty int










SELECT






@CurrentProductRate = ProductRates.CurrentProductRate,



@CurrentDiscount = ProductRates.CurrentDiscount,



@Qty = OrderQty



FROM






ProductRates INNER


JOIN


OrderDetails ON






OrderDetails.ProductID = ProductRates.ProductID



WHERE






OrderDetails.SalesOrderDetailID = @SalesOrderDetailID







RETURN


(@CurrentProductRate-@CurrentDiscount)*@Qty



END






This seemed to be a “Procedural approach” for calculating the order total and Mr. Tom decided to implement the UDF’s TSQL as an inline SQL in the original query. Following was the simple change that he had to implement in the stored procedure:

Collapse
dbo.ufnGetLineTotal(SalesOrderDetailID) Total        --



Old Code


Collapse
(CurrentProductRate-CurrentDiscount)*OrderQty Total  --



New Code






Immediately after executing the query Mr. Tom found that the query was taking 14 seconds now to execute.

Getting rid of the unnecessary Text column in the SELECT list 

Exploring for further optimization scopes Mr. Tom decided to take a look at the column types in the SELECT list in the TSQL. Soon he discovered that one Text column (Products.DetailedDescription ) were included in the SELECT list. Reviewing the application code Mr. Tom found that this column values were not being processed by the application immediately. Few columns in the result set were being displayed in a listing page in the application, and, when user clicks on a particular item in the list, a detail page was appearing containing the Text column value.

Excluding that Text column from the SELECT list dramatically reduced the query execution time from 14 seconds to 6 seconds ! So, Mr. Tom decided to apply a “Lazy loading” strategy to load this Text column using a Stored Procedure that accepts an “ID” parameter and selects the Text column value. After implementation he found out that, the newly created Stored Procedure executes in a reasonable amount of time when user sees the detail page for an item in the item list. He also converted those two “Text ” columns to “VARCHAR(MAX) columns and that enabled him to use the len() function on one of these two columns in the TSQLs in other places (That also allowed him to save some query execution time because, he was calculating the length using len(Text_Column as Varchar (8000)) in the earlier version of the code.

Optimizing further : Process of elimination  

What’s next? All the optimization steps so far reduced the execution time to 6 seconds . Comparing to the execution time of 50 seconds before optimization, this is a big achievement so far. But, Mr. Tom thinks the query could have further improvement scopes. Reviewing the TSQLs Mr. Tom didn’t find any significant option left for further optimization. So, he indented and re-arranged the TSQL (So that each individual query statement (Say, Product.ProductID = OrderDetail.ProductID) is written in a particular line) and starts executing the Stored Procedure again and again by commenting out each line that he suspects for having improvement scope.

Surprise! Surprise! The TSQL had some LIKE conditions (The actual Stored procedure basically performed a keyword search on some tables) for matching several patterns against some column values. When he commented out the LIKE statements, suddenly the Stored Procedure execution time jumped below 1 second . Wow!

It seemed that, having done with all the optimizations so far, the LIKE searches were taking the most amount of time in the TSQL. After carefully looking at the LIKE search conditions, Mr. Tom became pretty sure that the LIKE search based SQL could easily be implemented using the Full Text search. It seemed that two columns needed to be full text search enabled. These were: ProductName and Keyword .

It just took 5 minutes for him to implement the FTS (Creating the Full text catalog, making the two columns full text enabled and replacing the LIKE clauses with the FREETEXT function) and the query started executing now within a stunning 1 second !

Great achievement, isn’t it? 

 

 

 

Additional

 

http://www.simple-talk.com/sql/performance/execution-plan-basics/

Execution Plan Format s

SQL Server offers only one type of execution plan (be it estimated or actual), but three different formats in which to view that execution plan.

· Graphical Plans

· Text Plans

· XML Plan s

The one you choose will depend on the level of detail you want to see, and on the individual DBA's preferences and meth­ods.

Graphical Plans

These are quick and easy to read but the detailed data for the plan is masked. Both Estimated and Actual execution plans can be viewed in graphical format.

Text Plans

These are a bit harder to read, but more information is immediately available. There are three text plan formats:

· SHOWPLAN_ALL : a reasonably complete set of data showing the Estimated execution plan for the query

· SHOWPLAN_TEXT : provides a very limited set of data for use with tools like osql.exe . It too only shows the Estimated execution plan

· STATISTICS PROFILE: similar to SHOWPLAN_ALL except it represents the data for the Actual execution plan

XML Plan s

XML plans present the most complete set of data available on a plan, all on display in the structured XML format. There are two varieties of XML plan:

· SHOWPLAN_XML : The plan generated by the optimizer prior to execution.

· STATISTICS_XML : The XML format of the Actual execution plan.

 

....

Saving XML Plans as Graphical Plan

You can save the execution plan without opening it by right-clicking within the results and selecting "Save As." You then have to change the filter to "*.*" and when you type the name of the file you want to save add the extension ".s qlplan." This is how the Books Online recommends saving an XML execution plan. In fact, what you get when you save it this way is actually a graphical execution plan file. This can actually be a very useful feature. For example, you might collect multiple plans in XML format, save them to file and then open them in easy-to-view (and compare) graphical format.

One of the benefits of extracting an XML plan and saving it as a separate file is that you can share it with others. For example, you can send the XML plan of a slow-running query to a DBA friend and ask them their opinion on how to rewrite the query. Once the friend receives the XML plan, they can open it up in Management Studio and review it as a graphical execution plan.

In order to actually save an XML plan as XML, you need to first open the results into the XML window. If you attempt to save to XML directly from the result window you only get what is on display in the result window. Another option is to go to the place where the plan is stored, as defined above, and copy it.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值