Eliminate the Use of Temporary Tables For HUGE Performance Gains

最新推荐文章于 2024-02-07 22:09:49 发布

envykok

最新推荐文章于 2024-02-07 22:09:49 发布

阅读量141

点赞数

分类专栏： SQL 性能优化文章标签： performance table query join sql server database

SQL 性能优化专栏收录该内容

35 篇文章 0 订阅

订阅专栏

Eliminate the Use of Temporary Tables For HUGE Performance Gains

http://www.sql-server-performance.com/articles/per/derived_temp_tables_p2.aspx

By : Justin Gunther
May 30, 2002

As queries become more complex, temporary tables are used more and more. While temporary table may sometimes be unavoidable, they can often be sidestepped by using derived tables instead. In brief, a derived table is the result of using another SELECT statement in the FROM clause of a SELECT statement. By using derived tables instead of temporary tables, we can boost our application's performance. Let's find out more.

How the Use of Temporary Tables Affect Performance

Temporary tables slow performance dramatically. The problem with temporary tables is the amount of overhead that goes along with using them. In order to get the fastest queries possible, our goal must be to make them do as little work as possible. For example, with a SELECT statement, SQL Server reads data from the disk and returns the data. However, temporary tables require the system to do much more.

For example, a piece of Transact-SQL code using temporary tables usually will:

1) CREATE the temporary table
2) INSERT data into the newly created table
3) SELECT data from the temporary table (usually by JOINing to other physical tables) while holding a lock on the entire tempdb database until the transaction has completed.
4) DROP the temporary table

This represents a lot of disk activity, along with the potential for contention problems. And all of this adds up to poor performance.

Eliminate A Few Steps!

The biggest benefit of using derived tables over using temporary tables is that they require fewer steps, and everything happens in memory instead of a combination of memory and disk. The fewer the steps involved, along with less I/O, the faster the performance.

Here are the steps when you use a temporary table:

1) Lock tempdb database
2) CREATE the temporary table (write activity)
3) SELECT data & INSERT data (read & write activity)
4) SELECT data from temporary table and permanent table(s) (read activity)
5) DROP TABLE (write activity)
4) Release the locks

Compare the above to the number of steps it takes for a derived table:

1) CREATE locks, unless isolation level of "read uncommitted" is used
2) SELECT data (read activity)
3) Release the locks

As is rather obvious from this example, using derived tables instead of temporary tables reduces disk I/O and can boost performance. Now let’s see how.

Using Derived Tables

Derived tables are essentially SELECT statements within SELECT statements. Let's look at a very simple example:

Take a look at this simple query where we SELECT data from a table:

USE northwind

GO

SELECT * FROM categories

Now, instead of selecting data from the categories table, let’s select our data from a derived table. For example:

USE northwind

GO

SELECT * FROM (SELECT * FROM categories) dt_categories

This is all there is to derived tables. Remember, a derived table is just the result of using another SELECT statement in the FROM clause of another SELECT statement. Simply put the query in parenthesis and add a table name after the query in the parenthesis.

Both of the above examples produce the exact same results. This example is designed to show you what a derived table is, and how easy they are to create. In the real world, if you needed to write a query like the one above, you would of course use the simpler of the two examples. But if your query is complex, consider using a derived table instead of a temporary table, as we will see in the following real world example.

An Example: Rewriting A Stored Procedure Using Temp Tables

For this example, we will use the Northwind database. The problem we want to solve is that we need a listing of categories and products in the Northwind database, with a column stating how many products are in each category. This is based on a real case where I needed the count of products within each category. For the curious, the reason was so that I could create JavaScript dynamically to populate a second listbox on the fly.

The desired output looks something like this:

Category Name>	Product Name>	Category Count>
Beverages	Outback Lager	2
Beverages	Chang	2
Condiments	Aniseed Syrup	3
Condiments	Cajun Seasoning	3
Condiments	Gumbo Mix	3

This result says that for the category beverages, there are two products.

The first query that follows, which is designed to produce the required results, is slow and uses a temporary table.

SELECT GETDATE()

GO

-- CREATE OUR TEMPORARY TABLE

CREATE TABLE #Temp_Example (
[CategoryID] INT NOT NULL,
[Category_Count] INT NOT NULL
)

-- INSERT THE VALUES WE LATER NEED INTO THE TEMP TABLE

INSERT INTO #Temp_Example (CategoryID, Category_Count)
SELECT C.CategoryID, COUNT(*) AS Category_Count
FROM Categories C
INNER JOIN Products P ON C.CategoryID = P.CategoryID
GROUP BY C.CategoryID, C.CATEGORYNAME

-- JOIN ON THE TEMP TABLE TO GET OUR VALUES

SELECT C.CategoryID, C.CategoryName, P.ProductName, P.UnitPrice, #Temp_Example.Category_Count
FROM Categories C
INNER JOIN Products P ON C.CategoryID = P.CategoryID
INNER JOIN #Temp_Example ON C.CategoryID = #Temp_Example.CategoryID
ORDER BY C.CategoryName

-- DROP TEMPORARY TABLE

DROP TABLE #Temp_Example

GO

SELECT GETDATE()

Now, let's take a look at a query that produces the same result, but does not use a temporary table, but instead uses a derived table.

SELECT GETDATE()

GO

-- NOTE HOW WE SIMPLY JOIN ON THE TABLE CREATED IN MEMORY BASED ON THE CATEGORY ID

SELECT C.CategoryID, C.CategoryName, P.ProductName, P.UnitPrice, CT.Category_Count
FROM Categories C
INNER JOIN Products P ON C.CategoryID = P.CategoryID
INNER JOIN (
            SELECT C.CategoryID, COUNT(*) AS Category_Count
            FROM Categories C
            INNER JOIN Products P ON C.CategoryID = P.CategoryID
            GROUP BY C.CategoryID, C.CategoryName
            )CT ON C.CategoryID = CT.CategoryID
ORDER BY C.CategoryName

GO

SELECT GETDATE()

The first query’s performance is dramatically improved with little effort by using a derived table, as demonstrated in the second query.

Here are the steps we took to improve the queries performance:

1. We took the SELECT query from our SELECT INTO query, and put it in parentheses followed by a table name.
2. We joined on the table now in memory, rather than a temporary table.
3. We changed the name of the column selected in our SELECT statement to the name of the column in the nested query.

As you can see, using derived tables is a great way to solve complex business problems!

Give It a Test

I recently read the chapter on advanced queries in the book, Professional SQL Server 2000 Programming, and it states that if your result is going to be large, then it might be better to create a temporary table with an index, because derived tables do not have an index. Don’t let the last statement scare you, I’ve seen a few queries go from using temporary tables or cursors to derived tables and the performance was increased by 50% or greater. As always, you will never know until you test!