There's a very tricky programming error involving subqueries that I've seen occasionally and have even had the misfortune to introduce into production code myself. I'll first describe the bug, and then make recommendations on what you can do to avoid it.
Suppose that you are asked to return the shippers from the Northwind database that did not ship orders to customer LAZYK. Examining the data, shipper 1 (Speedy Express) is the only one that qualifies. The following query is supposed to return the desired result:
SELECT ShipperID, CompanyName FROM dbo.Shippers WHERE ShipperID NOT IN (SELECT ShipperID FROM dbo.Orders WHERE CustomerID = N'LAZYK');
Surprisingly, this query returns an empty set. Can you tell why? Can you identify the elusive bug in my code?
Well, apparently the column in the Orders table holding the ShipperID is called ShipVia and not ShipperID. There is no ShipperID column in the Orders table. Realizing this, you'd probably expect the query to have failed because of the invalid column name. Sure enough, if you run only the part that was supposed to be a self-contained subquery, it does fail: Invalid column name 'ShipperID'. However, in the context of the outer query, apparently the subquery is valid! The name resolution process works from the inner nesting level outward. The query processor first looked for a ShipperID column in the Orders table, which is referenced in the current level. Not having found such a column name, it looked for one in the Shippers tablethe outer leveland found it. Unintentionally, the subquery became correlated, as if it were written as the following illustrative code:
SELECT ShipperID, CompanyName FROM dbo.Shippers AS S WHERE ShipperID NOT IN (SELECT S.ShipperID FROM dbo.Orders AS O WHERE O.CustomerID = N'LAZYK');
Logically, the query doesn't make much sense of course; nevertheless, it is technically valid.
You can now understand why you got an empty set back. Unless there's no order for customer LAZYK anywhere in the Orders table, obviously shipper n is always going to be in the set (SELECT n FROM dbo.Orders WHERE CustomerID = 'LAZYK'). And the NOT IN predicate will always yield FALSE. This buggy query logically became a nonexistence query equivalent to the following illustrative code:
SELECT ShipperID, CompanyName FROM dbo.Shippers WHERE NOT EXISTS (SELECT * FROM dbo.Orders WHERE CustomerID = N'LAZYK');
To fix the problem, of course, you should use the correct name for the column from Orders that holds the ShipperIDShipVia:
SELECT ShipperID, CompanyName FROM dbo.Shippers AS S WHERE ShipperID NOT IN (SELECT ShipVia FROM dbo.Orders AS O WHERE CustomerID = N'LAZYK');
This will generate the expected result shown in Table 4-9.
ShipperID | CompanyName |
---|---|
1 | Speedy Express |
However, to avoid such bugs in the future, it's a good practice to always include the table name or alias for all attributes in a subquery, even when the subquery is self-contained. Had I aliased the ShipperID column in the subquery (as shown in the following code), a name resolution error would have been generated and the bug would have been detected:
SELECT ShipperID, CompanyName FROM dbo.Shippers AS S WHERE ShipperID NOT IN (SELECT O.ShipperID FROM dbo.Orders AS O WHERE O.CustomerID = N'LAZYK'); Msg 207, Level 16, State 1, Line 4 Invalid column name 'ShipperID'.
Finally, correcting the bug, here's how the solution query should look:
SELECT ShipperID, CompanyName FROM dbo.Shippers AS S WHERE ShipperID NOT IN (SELECT O.ShipVia FROM dbo.Orders AS O WHERE O.CustomerID = N'LAZYK');