4- 子查询

最新推荐文章于 2022-11-14 17:25:02 发布

pengpenglin

最新推荐文章于 2022-11-14 17:25:02 发布

阅读量3.4k

点赞数

分类专栏： Oracle SQL/PLSQL 文章标签： subquery date null query reference sql

本文链接：https://blog.csdn.net/pengpenglin/article/details/1743198

版权

Oracle SQL/PLSQL 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

1.子查询的概念：

A subquery is a SELECT statement that is nested within another SQL statement. For the purpose of this discussion, we will call the SQL statement that contains a subquery the containing statement.

Subqueries are executed prior to execution of the containing SQL statementand. the result set generated by the subquery is discarded( 丢弃) after the containing SQL statement has finished execution. Thus, a subquery can be thought of as a temporary table with statement scope

A subquery may either be correlated with its containing SQL statement, meaning that it references one or more columns from the containing statement, or it might reference nothing outside itself, in which case it is called a noncorrelated subquery.

A less-commonly-used but powerful variety of subquery, called the inline view, occurs in the FROM clause of a select statement. Inline views are always noncorrelated; they are evaluated first and behave like unindexed tables cached in memory for the remainder of the query.

2.非相关性子查询： (Noncorrelated Subquery)：

Noncorrelated subqueries allow each row from the containing SQL statement to be compared to a set of values. Divide noncorrelated subqueries into the following three categories, depending on the number of rows and columns returned in their result set:

A.Single-row, single-column subqueries

B.Multiple-row, single-column subqueries

C.Multiple-column subqueries

Depending on the category, different sets of operators may be employed by the containing SQL statement to interact with the subquery.

A. Single-Row, Single-Column Subqueries ：

A subquery that returns a single row with a single column is treated like a scalar(数值 ) by the containing statement; not surprisingly, these types of subqueries are known as scalar subqueries. The subquery may appear on either side of a condition, and the usual comparison operators (=, <, >, !=, <=, >=) are employed.

SELECT lname

FROM employee

WHERE salary > (SELECT AVG(salary)

FROM EMPLOYEE);

As this query demonstrates, it can be perfectly reasonable for a subquery to reference the same tables as the containing query. In fact, subqueries are frequently used to isolate a subset of records within a table.

非相关性子查询的一些注意问题：

.The FROM clause may contain any type of noncorrelated subquery.

(From子句可以包含任何类型的非相关型子查询 )

.The SELECT and ORDER BY clauses may contain scalar subqueries.

(Select和 Order by字句可以包含数值子查询)

.The GROUP BY clause may not contain subqueries.

( Group by 字句不能包含子查询)

.The START WITH and CONNECT BY clauses, used for querying hierarchical data, may contain

subqueries

(用于级联数据查询的 Start with和Connect by字句都可以包含子查询)

B.Multiple-Row ,Single-column Subqueries ：

When a subquery returns more than one row, it is not possible to use only comparison operators, since a single value cannot be directly compared to a set of values. However, a single value can be compared to each value in a set. To accomplish this, the special keywords ANY and ALL may be used with comparison operators to determine if a value is equal to (or less than, greater than, etc.) any members of the set or all members of the set. ( 当一个子查询返回超过一行的记录时，显然不可能仅仅使用比较运算符来比较，因为一条记录不能和一个记录集进行比较。但是一条记录可以和一个记录集中的每一条记录进行比较，两个特定的关键字Any和All可以帮助我们达到目的，它们可以和比较运算符一起使用用于判断某个值是否等于、小于、大于一个记录集中的任何一条记录的值)

1).ALL 关键字：

SELECT fname, lname

FROM employee

WHERE dept_id = 3 AND salary >= ALL (SELECT salary

FROM employee

WHERE dept_id = 3);

FNAME LNAME

-------------------- --------------------

Mark Russell

The subquery returns the set of salaries for department 3, and the containing query checks each employee in the department to see if her salary is greater or equal to every salary returned by the subquery. Thus, this query retrieves the name of the highest paid person in department 3. While everyone except the lowest paid employee has a salary >= some of the salaries in the departement, only the highest paid employee has a salary >= all of the salaries in the department. If multiple employees tie for the highest salary in the department, multiple names will be returned.

2).ANY 关键字：

SELECT fname, lname

FROM employee

WHERE dept_id = 3 AND NOT salary < ANY (SELECT salary

FROM employee

WHERE dept_id = 3);

There are almost always multiple ways to phrase the same query. One of the challenges of writing SQL is striking the right balance between efficiency and readability. In this case, I might prefer using AND salary >= ALL over AND NOT salary < ANY because the first variation is easier to understand; however, the latter form might prove more efficient, since each evaluation of the subquery results requires from 1 to N comparisons when using ANY versus( 对比) exactly N comparisons when using ALL.

If there are 100 people in the department, each of the 100 salaries needs to be compared to the entire set of 100. When using ANY, the comparison can be suspended( 停止) as soon as a larger salary is identified in the set, whereas using ALL requires 100 comparisons to ensure that there are no smaller salaries in the set.

注：使用 ALL和ANY的比较：

A.可读性：使用 ALL在可读性上可以比ANY更具可读性。具体参见前两例

B.性能：　使用 ALL时，子查询中的所有记录会被查询并进行比较，假如子查询的结果有N条记　　　　　　　录，则使用ALL时必须比较N次。如果使用ANY，因为只要有一条记录满足条件比较　就会停止，所以查询并比较的次数从1到N次不等。如果要从大量数据的筛选少量的

记录，则使用 ANY从性能上来说比较合适

ORA-01427: single-row subquery returns more than one row

What the error message is trying to convey is that a multiple-row subquery has been identified where only a single-row subquery is allowed. If we are not absolutely certain that our subquery will return exactly one row, we must include ANY or ALL to ensure our code doesn't fail in the future.( 如果查询要求的是单行子查询，但返回的结果却是多行的，那么就会出现上述的错误，假如我们不确定子查询一定会返回单条记录，我们必须使用ANY或 ALL 来确保我们的代码不会出现上述的错误)

3).IN 关键字：

Using IN with a subquery is functionally equivalent to using = ANY, and returns TRUE if a match is found in the set returned by the subquery

Finding members of one set that do not exist in another set is referred to as an anti-join. As the name implies, an anti-join is the opposite of a join; rows from table A are returned if the specified data is not found in table B.

C. Multiple-Column Subqueries ：

例： UPDATE monthly_orders SET (tot_orders, max_order_amt, min_order_amt, tot_amt) =　

(SELECT COUNT(*), MAX(sale_price), MIN(sale_price), SUM(sale_price)

FROM cust_order

　　 WHERE order_dt >= TO_DATE('01-NOV-2001','DD-MON-YYYY')

　　 AND order_dt < TO_DATE('01-DEC-2001','DD-MON-YYYY')

　　 AND cancelled_dt IS NULL)

WHERE month = 11 and year = 2001;

such subqueries may also be utilized in the WHERE clause of a SELECT, UPDATE, or DELETE statement

例： DELETE FROM line_item　WHERE (order_nbr, part_nbr) IN

　　　 (SELECT c.order_nbr, p.part_nbr

FROM cust_order c, line_item li, part p

WHERE c.ship_dt IS NULL AND c.cancelled_dt IS NULL

AND c.order_nbr = li.order_nbr

AND li.part_nbr = p.part_nbr

AND p.status = 'DISCONTINUED');

Note the use of the IN operator in the WHERE clause. Two columns are listed together in parentheses prior to the IN keyword. Values in these two columns are compared to the set of two values returned by each row of the subquery. If a match is found, the row is removed from the line_item table.

3.相关性查询 (Correlated subquery)：

A subquery that references one or more columns from its containing SQL statement is called a correlated subquery. Unlike noncorrelated subqueries, which are executed exactly once prior to execution of the containing statement, a correlated subquery is executed once for each candidate row in the intermediate result set of the containing query.(如果一个子查询中引用了包含查询语句的一个或多个字段，则该子查询就称为相关性子查询。和非相关性子查询不同的是－非相关性子查询在包含查询之前执行，而且只执行一次。相关性查询对包含查询中的中间结果集的每一条记录都执行一次子查询 )

例： SELECT p.part_nbr, p.name

　　 FROM supplier s, part p

WHERE s.name = 'Acme Industries'

AND s.supplier_id = p.supplier_id

AND 10 <= (SELECT COUNT(*) FROM cust_order co, line_item li

WHERE li.part_nbr = p.part_nbr

AND li.order_nbr = co.order_nbr

AND co.order_dt >= TO_DATE('01-DEC-2001','DD-MON-YYYY'));

The reference to p.part_nbr is what makes the subquery correlated; values for p.part_nbr must be supplied by the containing query before the subquery can execute. If there are 10,000 parts in the part table, but only 100 are supplied by Acme Industries, the subquery will be executed once for each of the 100 rows in the intermediate result set created by joining the part and supplier tables.( 因为子查询中引用了包含查询的:p.part_nbr字段，所以该子查询是相关性查询，p.part_nbr字段的值必须在子查询执行之前由包含查询给出，如果表part中有1000种零件，但只有100种零件由Acme Industries提供，子查询会在由part表和supplier表联合组成的临时表中的100符合条件的记录上执行100次查询，而不是在包含查询执行前一次执行完成)

Correlated subqueries are often used to test whether relationships exist without regard to cardinality. The EXISTS operator is used for these types of queries (相关性查询通常用于测试某种关系是否存在，而不管返回记录集的大小， Exists运算符就是用于这种类型的查询)

SELECT p.part_nbr, p.name, p.unit_cost

FROM part p

WHERE EXISTS (SELECT 1 FROM line_item li, cust_order co

WHERE li.part_nbr = p.part_nbr

AND li.order_nbr = co.order_nbr

AND co.ship_dt >= TO_DATE('01-JAN-2002','DD-MON-YYYY'))

As long as the subquery returns one or more rows, the EXISTS condition is satisfied without regard for how many rows were actually returned by the subquery. Since the EXISTS operator returns TRUE or FALSE depending on the number of rows returned by the subquery, the actual columns returned by the subquery are irrelevant. The SELECT clause requires at least one column, however, so it is common practice to use either the literal "1" or the wildcard " * ".( 因为Exists的返回值是True或False，所以Exists不会关注返回的记录到底有多少条，我们可以使用1或通配符*来获取返回记录)

4.内嵌视图： (Inline view)

the FROM clause contains a list of data sets. In this light, it is easy to see how the FROM clause can contain tables (permanent data sets), views (virtual data sets), and SELECT statements (temporary data sets). A SELECT statement in the FROM clause of a containing SELECT statement is referred to as an inline view 。Since it is a subquery that executes prior to the containing query, a more palatable name might have been a "pre-query."

Because the result set from an inline view is referenced by other elements of the containing query, we must give our inline view a name and provide aliases for all ambiguous columns. Similar to other types of subqueries, inline views may join multiple tables, call built-in and user-defined functions, specify optimizer hints, and include GROUP BY, HAVING, and CONNECT BY clauses. Unlike other types of subqueries, an inline view may also contain an ORDER BY clause（因为对于包含SQL来说，Inline view的结果集引用了其它的元素，所以我们必须給Inline view一个名称，同时也必须給它里面的字段赋予一个别名。和其它类型的子查询相同，Inline view可以和多个表连接，调用内置、自定义函数，包含Group by,Having,Connect by字句；不同与其它的子查询，Inline view同样也可以包含Order by字句。）

Inline views are particularly useful when we need to combine data at different levels of aggregation（ Inline view特别适用于当我们需要对多个聚集级别不同的数据集进行连接时，例如一个数据集的数据是来自于全体数据，另一个数据集的数据来自于统计后的结果）

When considering using an inline view, ask the following questions:

1).What value does the inline view add to the readability and, more importantly, the performance of the containing query?

2).How large will the result set generated by the inline view be?

3).How often, if ever, will I have need of this particular data set?

In general, using an inline view should enhance the readability and performance of the query, and it should generate a manageable data set that is of no value to other statements or sessions; otherwise, we may want to consider building a permanent or temporary table so that we can share the data between sessions and build additional indexes as needed

5． TOP N查询：

Certain queries that are easily described in English have traditionally been difficult to formulate in SQL. One common example is the "Find the top five salespeople" query. The complexity stems from the fact that data from a table must first be aggregated, and then the aggregated values must be sorted and compared to one another in order to identify the top or bottom performers.

需求：找出销售数量排在前5位的销售员和对应的销售额，并统计其对应的分红(销售额的1%)

准备工作：按销售员的 ID统计所有销售人员的销售记录：

SELECT e.lname employee, SUM(co.sale_price) total_sales

FROM cust_order co, employee e

WHERE co.order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')

AND co.order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')

AND co.ship_dt IS NOT NULL AND co.cancelled_dt IS NULL

AND co.sales_emp_id = e.emp_id

GROUP BY e.lname

ORDER BY 2 DESC;

结果： EMPLOYEE TOTAL_SALES

-------------------- -----------

Blake 1927580

Houseman 1814327

Russell 1784596

Boorman 1768813

Isaacs 1761814

McGowan 1761814

Anderson 1757883

Evans 1737093

Fletcher 1735575

Dunn 1723305

注意：结果中有两个销售员的销售总额是相同的。所以最后结果应该包含6条记录，而不是5条记录

方案一：

SELECT e.lname employee, top5_emp_orders.tot_sales total_sales,

　　　 ROUND(top5_emp_orders.tot_sales * 0.01) bonus

FROM (SELECT all_emp_orders.sales_emp_id emp_id, all_emp_orders.tot_sales tot_sales

FROM (SELECT sales_emp_id, SUM(sale_price) tot_sales

　 FROM cust_order

WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')

　 AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')

　 AND ship_dt IS NOT NULL AND cancelled_dt IS NULL

GROUP BY sales_emp_id

ORDER BY 2 DESC

) all_emp_orders --按销售员的 ID分类统计销售额

WHERE ROWNUM <= 5

) top5_emp_orders, employee e --筛选前 5位的销售记录

WHERE top5_emp_orders.emp_id = e.emp_id; --计算其红利

结果：

EMPLOYEE TOTAL_SALES BONUS

-------------------- ----------- ----------

Blake 1927580 19276

Houseman 1814327 18143

Russell 1784596 17846

Boorman 1768813 17688

McGowan 1761814 17618

可以看到，这个 SQL在执行的时候漏掉了：Isaacs 1761814　这条记录，因为这个SQL语句使用了ROWNUM进行筛选，所以会把这条记录給漏掉了

方案二：

This will require two steps: find the fifth highest sales total last year, and then find all salespeople whose total sales meet or exceed that figure

这个方案需要两个步骤：

1).按业务员的 ID分组统计去年的销售总额

2).找出销售总额排在前 5位的数字

3).找出售额等于这 5个数字中任意一个的所有记录

4).计算其红利

SELECT e.lname employee,

top5_emp_orders.tot_sales total_sales,

ROUND(top5_emp_orders.tot_sales * 0.01) bonus

FROM employee e,

(SELECT sales_emp_id, SUM(sale_price) tot_sales 　　　　　　　

FROM cust_order

WHERE order_dt >= TO_DATE('01-JAN-2001', 'DD-MON-YYYY')

AND order_dt < TO_DATE('01-JAN-2002', 'DD-MON-YYYY')

AND ship_dt IS NOT NULL

AND cancelled_dt IS NULL

GROUP BY sales_emp_id

　 --筛选所有销售额等于前5位销售额的记录

HAVING SUM(sale_price) IN (SELECT all_emp_orders.tot_sales

FROM (SELECT SUM(sale_price) tot_sales

FROM cust_order

WHERE order_dt >=

TO_DATE('01-JAN-2001',

'DD-MON-YYYY')

AND order_dt <

TO_DATE('01-JAN-2002',

'DD-MON-YYYY')

AND ship_dt IS NOT NULL

AND cancelled_dt IS NULL

GROUP BY sales_emp_id

ORDER BY 1 DESC) all_emp_orders

WHERE ROWNUM <= 5)) top5_emp_orders

WHERE top5_emp_orders.sales_emp_id = e.emp_id

ORDER BY 2 DESC;

结果：

EMPLOYEE TOTAL_SALES BONUS

-------------------- ----------- ----------

Blake 1927580 19276

Houseman 1814327 18143

Russell 1784596 17846

Boorman 1768813 17688

McGowan 1761814 17618

Isaacs 1761814 17618

可以看到，现在查询的结果就包含了所有正确的记录了，但是存在以下几个缺陷：

1).统计过程执行了两次

2).可读性很差

方案三：

the RANK function may be used to assign a ranking to each element of a set. The RANK function understands that there may be ties in the set of values being ranked and leaves gaps in the ranking to compensate. The following query illustrates how rankings would be assigned to the entire set of salespeople; notice how the RANK function leaves a gap between the fifth and seventh rankings to compensate for the fact that two rows share the fifth spot in the ranking(如果在要排序的队列中出现了相同的排列项， RANK函数会识别出来并且自动地调整排列项的序号，排列的结果序号可能不会是连续的。例如两个排列项的序号为5，那么下一个的序号就必须是7，而不是6了)

准备工作： RANK函数的用法：

SELECT sales_emp_id, SUM(sale_price) tot_sales,

RANK( ) OVER (ORDER BY SUM(sale_price) DESC) sales_rank

FROM cust_order

WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')

AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')

AND ship_dt IS NOT NULL AND cancelled_dt IS NULL

GROUP BY sales_emp_id;

结果：

SALES_EMP_ID TOT_SALES SALES_RANK

------------ ---------- ----------

11 1927580 1

24 1814327 2

34 1784596 3

18 1768813 4

25 1761814 5

26 1761814 5

30 1757883 7

21 1737093 8

19 1735575 9

可以看到，除了结果正确之外，序号也是正确的。 RANK函数会自动在5和7之间留出空隙。对于这个查询，TOP 5和TOP 6的结果都是相同的

最终解决方案：

SELECT e.lname employee, top5_emp_orders.tot_sales total_sales,

ROUND(top5_emp_orders.tot_sales * 0.01) bonus

FROM (SELECT all_emp_orders.sales_emp_id emp_id,

all_emp_orders.tot_sales tot_sales

FROM (SELECT sales_emp_id, SUM(sale_price) tot_sales,

RANK( ) OVER (ORDER BY SUM(sale_price) DESC) sales_rank

FROM cust_order

WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')

AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')

AND ship_dt IS NOT NULL AND cancelled_dt IS NULL

GROUP BY sales_emp_id

) all_emp_orders -- 使用RANK函数分组统计销售记录

WHERE all_emp_orders.sales_rank <= 5

) top5_emp_orders, employee e -- 取出排名前5位的记录(按RANK的值取)

WHERE top5_emp_orders.emp_id = e.emp_id -- 计算所有销售前5名人员的红利

ORDER BY 2 DESC;

pengpenglin

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
4- 子查询

1.子查询的概念：A subquery is a SELECT statement that is nested within another SQL statement. For the purpose of this discussion, we will call the SQL statement that contains a subquery the containing s
复制链接

扫一扫