Misbehaving Subqueries (from "Inside MSSQL")

There's a very tricky programming error involving subqueries that I've seen occasionally and have even had the misfortune to introduce into production code myself. I'll first describe the bug, and then make recommendations on what you can do to avoid it.

Suppose that you are asked to return the shippers from the Northwind database that did not ship orders to customer LAZYK. Examining the data, shipper 1 (Speedy Express) is the only one that qualifies. The following query is supposed to return the desired result:

SELECT ShipperID, CompanyName
FROM dbo.Shippers
WHERE ShipperID NOT IN
  (SELECT ShipperID FROM dbo.Orders
   WHERE CustomerID = N'LAZYK');

 

Surprisingly, this query returns an empty set. Can you tell why? Can you identify the elusive bug in my code?

Well, apparently the column in the Orders table holding the ShipperID is called ShipVia and not ShipperID. There is no ShipperID column in the Orders table. Realizing this, you'd probably expect the query to have failed because of the invalid column name. Sure enough, if you run only the part that was supposed to be a self-contained subquery, it does fail: Invalid column name 'ShipperID'. However, in the context of the outer query, apparently the subquery is valid! The name resolution process works from the inner nesting level outward. The query processor first looked for a ShipperID column in the Orders table, which is referenced in the current level. Not having found such a column name, it looked for one in the Shippers tablethe outer leveland found it. Unintentionally, the subquery became correlated, as if it were written as the following illustrative code:

SELECT ShipperID, CompanyName
FROM dbo.Shippers AS S
WHERE ShipperID NOT IN
  (SELECT S.ShipperID FROM dbo.Orders AS O
   WHERE O.CustomerID = N'LAZYK');

 

Logically, the query doesn't make much sense of course; nevertheless, it is technically valid.

You can now understand why you got an empty set back. Unless there's no order for customer LAZYK anywhere in the Orders table, obviously shipper n is always going to be in the set (SELECT n FROM dbo.Orders WHERE CustomerID = 'LAZYK'). And the NOT IN predicate will always yield FALSE. This buggy query logically became a nonexistence query equivalent to the following illustrative code:

SELECT ShipperID, CompanyName
FROM dbo.Shippers
WHERE NOT EXISTS
  (SELECT * FROM dbo.Orders
   WHERE CustomerID = N'LAZYK');

 

To fix the problem, of course, you should use the correct name for the column from Orders that holds the ShipperIDShipVia:

SELECT ShipperID, CompanyName
FROM dbo.Shippers AS S
WHERE ShipperID NOT IN
  (SELECT ShipVia FROM dbo.Orders AS O
   WHERE CustomerID = N'LAZYK');

 

This will generate the expected result shown in Table 4-9.

Table. Shippers that Did Not Ship Orders to Customer LAZYK

ShipperID

CompanyName

1

Speedy Express

 

However, to avoid such bugs in the future, it's a good practice to always include the table name or alias for all attributes in a subquery, even when the subquery is self-contained. Had I aliased the ShipperID column in the subquery (as shown in the following code), a name resolution error would have been generated and the bug would have been detected:

SELECT ShipperID, CompanyName
FROM dbo.Shippers AS S
WHERE ShipperID NOT IN
  (SELECT O.ShipperID FROM dbo.Orders AS O
   WHERE O.CustomerID = N'LAZYK');

Msg 207, Level 16, State 1, Line 4
Invalid column name 'ShipperID'.

 

Finally, correcting the bug, here's how the solution query should look:

SELECT ShipperID, CompanyName
FROM dbo.Shippers AS S
WHERE ShipperID NOT IN
  (SELECT O.ShipVia FROM dbo.Orders AS O
   WHERE O.CustomerID = N'LAZYK');
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值