When you add an index and your query blows up...
Alexander Kuznetsov
from: http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/07/16/when-you-add-an-index-and-your-query-blows-up.aspx
You cannot assume that the conditions in your WHERE clause will evaluate in the left-to-write order - making such assumptions leads to unsafe queries. For example, the following query is not safe:
SELECT * FROM dbo.Messages
WHERE ISDATE(
VarcharColumn
)=1
AND CAST(VarcharColumn
) AS DATETIME)='20090707'
It can blow up at any time, and the reason is simple: the conditions in your WHERE clause can evaluate in any order, and the order can change the next time your query executes. If an invalid value casts before the check if it is valid, the query will blow up. Of course, I am not the first to point this out. Right from the top of my head I recall that you can find similar recommendations in Itzik Ben Gan's book on T-SQl Fundamentals, and in a recent Plamen Ratchev's post "Predicates in SQL". However, I would like to demonstrate how little it may take to break such code, how brittle it may be. I will provide a repro script in which a query originally succeeds, but fails after I have added an index.
Do not make assumptions which you cannot guarantee
Whenever you write such code as the previous sample, you clearly assume that the first condition, the call to ISDATE, will evaluate before the CAST. However, this assumption cannot be guaranteed, and when this assumption is not true, the query blows up. Even if this query always succeeds in your test runs, you cannot assume that it will always succeed in the future. The safe way to ensure that invalid values are never cast to datetime is this:
SELECT * FROM dbo.Messages
WHERE CASE WHEN ISDATE(VarcharColumn
)=1
THEN CAST(VarcharColumn
AS DATETIME) END ='20090707'
It is safe becasue it is documented that CASE expression evaluates the WHEN clause before evaluating the THEN clause.
When order in which conditions are evaluated changes - a repro script.
I am going to provide a repro script demostrating that provide that the order in which conditions are evaluated can change. However, I have to begin with a big disclaimer: there is no guarantee that this repro script will work as expected on your server. My repro script does work for me, and most likely will work for you too, but I cannot promise that.
Setting up test data
The following script adds 1M messages, all of them except one have invalid dates:
-- helper table
CREATE TABLE dbo.Numbers(n INT NOT NULL PRIMARY KEY)
GO
DECLARE @i INT;
SET @i = 1;
INSERT INTO dbo.Numbers(n) SELECT 1;
WHILE @i<1024000 BEGIN
INSERT INTO dbo.Numbers(n)
SELECT n + @i FROM dbo.Numbers;
SET @i = @i * 2;
END;
GO
CREATE TABLE dbo.Messages(MessageID INT NOT NULL PRIMARY KEY,
SenderID INT NOT NULL,
ReceiverID INT NOT NULL,
MessageDateAsVarcharColumn
VARCHAR(30) NULL,
SomeMoreData CHAR(200) NULL);
GO
INSERT INTO dbo.Messages(MessageID,
SenderID,
ReceiverID,
MessageDateAs
VarcharColumn
,
SomeMoreData)
SELECT n, n%1000, n/1000, 'Wrong Date', 'SomeMoreData'
FROM dbo.Numbers;
GO
-- only one message has a valid date
UPDATE dbo.Messages SET MessageDateAs
VarcharColumn
='20090707'
WHERE SenderID = 123 AND ReceiverID = 456;
Originally an unsafe query against test data succeeds
The following query succeeds on my server, selecting the only one row which has a valid date:
SELECT * FROM dbo.Messages
WHERE SenderID = 123
AND ReceiverID = 456
AND CAST(MessageDateAs
VarcharColumn
AS DATETIME)='20090707'
Because the query succeeds, clearly for all the rows the first two conditions are evaluated before MessageDate is cast to DATETIME, because only the row for which the first two conditions are turn has a valid datetime value. Let me repeat my disclaimer: there is no guarantee that this query will succeed on your server. Even if you reshuffle the conditions in your WHERE clause, the query still succeeds, so clearly the cast is attempted only after both other conditions are evaluated:
SELECT * FROM dbo.Messages
WHERE CAST(
MessageDateAs
VarcharColumn
AS DATETIME)='20090707'
AND ReceiverID = 456
AND SenderID = 123
As you have seen, conditions in WHERE clause are not evaluated in left-to-right order. You can also play with parenthesis trying to enforce one particular order, but that will not make any difference either.
After adding an index, the same query blows up
Here is the index:
CREATE INDEX Messages_SenderID_MessageDate
ON dbo.Messages(SenderID, MessageDateAs
VarcharColumn
);
After it is created, the query blows up:
Msg 241, Level 16, State 1, Line 1
Conversion failed when converting datetime from character string.
My explanation is simple: the optimizer chooses to use the new non-clustered index, and it chooses to cast the MessageDate value and evaluate the third condition CAST(
MessageDateAs
VarcharColumn
AS DATETIME)='20090707'
) before doing an expensive bookmark lookup. Let me repeat my disclaimer again: there is no guarantee that this query will blow upon your server.
Also I played a little bit with parenthesis, trying to use them to enforce a particular order of evaluation - of course that did not make any difference.
As you have seen, it is unsafe to make assumptions about the order in which conditions in your WHERE clause are evaluated. Whenever you see a query developed under such an assumption, rewrite it eliminating the unsafe assumption altogether.
If you are still reading this, can you do me a favor and post if this repro script worked on your server the way it did on mine. Please also post the output of SELECT @@VERSION. I will be on vacation for a couple of weeks. This means that I will respond to questions, but not immediately.
This post continues my series on defensive database programming. Here are my previous posts from the series:
Defensive database programming: fun with triggers
Defensive database programming: fun with ROWCOUNT
Summarizing previous posts about defensive database programming
Defensive database programming: SET vs. SELECT.
Defensive database programming: fun with UPDATE.
Defensive database programming: eliminating IF statements.
Defensive database programming: fun with changing column widths.
Avoid mixing old and new styles of error handling.
Defensive database programming: adding ESCAPE clauses.
Defensive database programming: qualifying column names.
Defensive database programming: rewriting queries with NOT IN().