Eliminate the Use of Temporary Tables For HUGE Performance Gains
http://www.sql-server-performance.com/articles/per/derived_temp_tables_p2.aspx
By : Justin Gunther May 30, 2002 |
As queries become more complex, temporary tables are used more and more. While temporary table may sometimes be unavoidable, they can often be sidestepped by using derived tables instead. In brief, a derived table is the result of using another SELECT statement in the FROM clause of a SELECT statement. By using derived tables instead of temporary tables, we can boost our application's performance. Let's find out more.
How the Use of Temporary Tables Affect Performance
Temporary tables slow performance dramatically. The problem with temporary tables is the amount of overhead that goes along with using them. In order to get the fastest queries possible, our goal must be to make them do as little work as possible. For example, with a SELECT statement, SQL Server reads data from the disk and returns the data. However, temporary tables require the system to do much more.
For example, a piece of Transact-SQL code using temporary tables usually will:
1) CREATE the temporary table
2) INSERT data into the newly created table
3) SELECT data from the temporary table (usually by JOINing to other physical tables) while holding a lock on the entire tempdb database until the transaction has completed.
4) DROP the temporary table
This represents a lot of disk activity, along with the potential for contention problems. And all of this adds up to poor performance.
Eliminate A Few Steps!
The biggest benefit of using derived tables over using temporary tables is that they require fewer steps, and everything happens in memory instead of a combination of memory and disk. The fewer the steps involved, along with less I/O, the faster the performance.
Here are the steps when you use a temporary table:
1) Lock tempdb database
2) CREATE the temporary table (write activity)
3) SELECT data & INSERT data (read & write activity)
4) SELECT data from temporary table and permanent table(s) (read activity)
5) DROP TABLE (write activity)
4) Release the locks
Compare the above to the number of steps it takes for a derived table:
1) CREATE locks, unless isolation level of "read uncommitted" is used
2) SELECT data (read activity)
3) Release the locks
As is rather obvious from this example, using derived tables instead of temporary tables reduces disk I/O and can boost performance. Now let’s see how.
Using Derived Tables
Derived tables are essentially SELECT statements within SELECT statements. Let's look at a very simple example:
Take a look at this simple query where we SELECT data from a table:
USE northwind
GO
SELECT * FROM categories
Now, instead of selecting data from the categories table, let’s select our data from a derived table. For example:
USE northwind
GO
SELECT * FROM (SELECT * FROM categories) dt_categories
This is all there is to derived tables. Remember, a derived table is just the result of using another SELECT statement in the FROM clause of another SELECT statement. Simply put the query in parenthesis and add a table name after the query in the parenthesis.
Both of the above examples produce the exact same results. This example is designed to show you what a derived table is, and how easy they are to create. In the real world, if you needed to write a query like the one above, you would of course use the simpler of the two examples. But if your query is complex, consider using a derived table instead of a temporary table, as we will see in the following real world example.
An Example: Rewriting A Stored Procedure Using Temp Tables
For this example, we will use the Northwind database. The problem we want to solve is that we need a listing of categories and products in the Northwind database, with a column stating how many products are in each category. This is based on a real case where I needed the count of products within each category. For the curious, the reason was so that I could create JavaScript dynamically to populate a second listbox on the fly.
The desired output looks something like this:
Category Name> | Product Name> | Category Count> |
Beverages | Outback Lager | 2 |
Beverages | Chang | 2 |
Condiments | Aniseed Syrup | 3 |
Condiments | Cajun Seasoning | 3 |
Condiments | Gumbo Mix | 3 |
This result says that for the category beverages, there are two products.
The first query that follows, which is designed to produce the required results, is slow and uses a temporary table.
SELECT GETDATE()
GO
-- CREATE OUR TEMPORARY TABLE
CREATE TABLE #Temp_Example (
[CategoryID] INT NOT NULL,
[Category_Count] INT NOT NULL
)
-- INSERT THE VALUES WE LATER NEED INTO THE TEMP TABLE
INSERT INTO #Temp_Example (CategoryID, Category_Count)
SELECT C.CategoryID, COUNT(*) AS Category_Count
FROM Categories C
INNER JOIN Products P ON C.CategoryID = P.CategoryID
GROUP BY C.CategoryID, C.CATEGORYNAME
-- JOIN ON THE TEMP TABLE TO GET OUR VALUES
SELECT C.CategoryID, C.CategoryName, P.ProductName, P.UnitPrice, #Temp_Example.Category_Count
FROM Categories C
INNER JOIN Products P ON C.CategoryID = P.CategoryID
INNER JOIN #Temp_Example ON C.CategoryID = #Temp_Example.CategoryID
ORDER BY C.CategoryName
-- DROP TEMPORARY TABLE
DROP TABLE #Temp_Example
GO
SELECT GETDATE()
Now, let's take a look at a query that produces the same result, but does not use a temporary table, but instead uses a derived table.
SELECT GETDATE()
GO
-- NOTE HOW WE SIMPLY JOIN ON THE TABLE CREATED IN MEMORY BASED ON THE CATEGORY ID
SELECT C.CategoryID, C.CategoryName, P.ProductName, P.UnitPrice, CT.Category_Count
FROM Categories C
INNER JOIN Products P ON C.CategoryID = P.CategoryID
INNER JOIN (
SELECT C.CategoryID, COUNT(*) AS Category_Count
FROM Categories C
INNER JOIN Products P ON C.CategoryID = P.CategoryID
GROUP BY C.CategoryID, C.CategoryName
)CT ON C.CategoryID = CT.CategoryID
ORDER BY C.CategoryName
GO
SELECT GETDATE()
The first query’s performance is dramatically improved with little effort by using a derived table, as demonstrated in the second query.
Here are the steps we took to improve the queries performance:
1. We took the SELECT query from our SELECT INTO query, and put it in parentheses followed by a table name.
2. We joined on the table now in memory, rather than a temporary table.
3. We changed the name of the column selected in our SELECT statement to the name of the column in the nested query.
As you can see, using derived tables is a great way to solve complex business problems!
Give It a Test
I recently read the chapter on advanced queries in the book, Professional SQL Server 2000 Programming, and it states that if your result is going to be large, then it might be better to create a temporary table with an index, because derived tables do not have an index. Don’t let the last statement scare you, I’ve seen a few queries go from using temporary tables or cursors to derived tables and the performance was increased by 50% or greater. As always, you will never know until you test!