4- 子查询

 
1.子查询的概念:
A subquery is a SELECT statement that is nested within another SQL statement. For the purpose of this discussion, we will call the SQL statement that contains a subquery the containing statement.
 
Subqueries are executed prior to execution of the containing SQL statementand. the result set generated by the subquery is discarded( 丢弃) after the containing SQL statement has finished execution. Thus, a subquery can be thought of as a temporary table with statement scope
 
A subquery may either be correlated with its containing SQL statement, meaning that it references one or more columns from the containing statement, or it might reference nothing outside itself, in which case it is called a noncorrelated subquery.
 
A less-commonly-used but powerful variety of subquery, called the inline view, occurs in the FROM clause of a select statement. Inline views are always noncorrelated; they are evaluated first and behave like unindexed tables cached in memory for the remainder of the query.
 
2.非相关性子查询: (Noncorrelated Subquery):
Noncorrelated subqueries allow each row from the containing SQL statement to be compared to a set of values. Divide noncorrelated subqueries into the following three categories, depending on the number of rows and columns returned in their result set:
    A.Single-row, single-column subqueries
    B.Multiple-row, single-column subqueries
    C.Multiple-column subqueries
Depending on the category, different sets of operators may be employed by the containing SQL statement to interact with the subquery.
 
A. Single-Row, Single-Column Subqueries : 
A subquery that returns a single row with a single column is treated like a scalar(数值 ) by the containing statement; not surprisingly, these types of subqueries are known as scalar subqueries. The subquery may appear on either side of a condition, and the usual comparison operators (=, <, >, !=, <=, >=) are employed.
    SELECT lname
     FROM employee
     WHERE salary > (SELECT AVG(salary)
                       FROM EMPLOYEE);
As this query demonstrates, it can be perfectly reasonable for a subquery to reference the same tables as the containing query. In fact, subqueries are frequently used to isolate a subset of records within a table.
 
非相关性子查询的一些注意问题:
.The FROM clause may contain any type of noncorrelated subquery.
 (From子句可以包含任何类型的非相关型子查询 )
.The SELECT and ORDER BY clauses may contain scalar subqueries.
 (Select和 Order by字句可以包含数值子查询)
.The GROUP BY clause may not contain subqueries.
 ( Group by 字句不能包含子查询)
.The START WITH and CONNECT BY clauses, used for querying hierarchical data, may contain    
 subqueries
 (用于级联数据查询的 Start with和Connect by字句都可以包含子查询)
 
B.Multiple-Row ,Single-column Subqueries
When a subquery returns more than one row, it is not possible to use only comparison operators, since a single value cannot be directly compared to a set of values. However, a single value can be compared to each value in a set. To accomplish this, the special keywords ANY and ALL may be used with comparison operators to determine if a value is equal to (or less than, greater than, etc.) any members of the set or all members of the set. ( 当一个子查询返回超过一行的记录时,显然不可能仅仅使用比较运算符来比较,因为一条记录不能和一个记录集进行比较。但是一条记录可以和一个记录集中的每一条记录进行比较,两个特定的关键字Any和All可以帮助我们达到目的,它们可以和比较运算符一起使用用于判断某个值是否等于、小于、大于一个记录集中的任何一条记录的值)
 
    1).ALL 关键字:
    SELECT fname, lname
     FROM employee
      WHERE dept_id = 3 AND salary >= ALL (SELECT salary
                                            FROM employee
                                           WHERE dept_id = 3);
            FNAME                LNAME
    -------------------- --------------------
            Mark                 Russell
    The subquery returns the set of salaries for department 3, and the containing query checks each employee in the department to see if her salary is greater or equal to every salary     returned by the subquery. Thus, this query retrieves the name of the highest paid person    in department 3. While everyone except the lowest paid employee has a salary >= some    of the salaries in the departement, only the highest paid employee has a salary >= all  of the salaries in the department. If multiple employees tie for the highest salary in  the department, multiple names will be returned.
 
    2).ANY 关键字:
    SELECT fname, lname
     FROM employee
     WHERE dept_id = 3 AND NOT salary < ANY (SELECT salary
                                               FROM employee
                                              WHERE dept_id = 3);
    There are almost always multiple ways to phrase the same query. One of the challenges  of writing SQL is striking the right balance between efficiency and readability. In this  case, I might prefer using AND salary >= ALL over AND NOT salary < ANY because the first  variation is easier to understand; however, the latter form might prove more efficient,     since each evaluation of the subquery results requires from 1 to N comparisons when using    ANY versus( 对比) exactly N comparisons when using ALL.
If there are 100 people in the department, each of the 100 salaries needs to be compared    to the entire set of 100. When using ANY, the comparison can be suspended( 停止) as soon    as a larger salary is identified in the set, whereas using ALL requires 100 comparisons  to ensure that there are no smaller salaries in the set.
 
    注:使用 ALL和ANY的比较:
    A.可读性:使用 ALL在可读性上可以比ANY更具可读性。具体参见前两例
    B.性能: 使用 ALL时,子查询中的所有记录会被查询并进行比较,假如子查询的结果有N条记               录,则使用ALL时必须比较N次。如果使用ANY,因为只要有一条记录满足条件比较          就会停止,所以查询并比较的次数从1到N次不等。如果要从大量数据的筛选少量的
记录,则使用 ANY从性能上来说比较合适
 
    ORA-01427: single-row subquery returns more than one row
    What the error message is trying to convey is that a multiple-row subquery has been     identified where only a single-row subquery is allowed. If we are not absolutely certain  that our subquery will return exactly one row, we must include ANY or ALL to ensure our  code doesn't fail in the future.( 如果查询要求的是单行子查询,但返回的结果却是多行的,  那么就会出现上述的错误,假如我们不确定子查询一定会返回单条记录,我们必须使用ANY或     ALL 来确保我们的代码不会出现上述的错误)
 
    3).IN 关键字:
    Using IN with a subquery is functionally equivalent to using = ANY, and returns TRUE    if a match is found in the set returned by the subquery
 
    Finding members of one set that do not exist in another set is referred to as an anti-join.     As the name implies, an anti-join is the opposite of a join; rows from table A are returned    if the specified data is not found in table B.
 
C. Multiple-Column Subqueries
例: UPDATE monthly_orders SET (tot_orders, max_order_amt, min_order_amt, tot_amt) = 
       (SELECT COUNT(*), MAX(sale_price), MIN(sale_price), SUM(sale_price)
          FROM cust_order
       WHERE order_dt >= TO_DATE('01-NOV-2001','DD-MON-YYYY')
         AND order_dt < TO_DATE('01-DEC-2001','DD-MON-YYYY')
         AND cancelled_dt IS NULL)
     WHERE month = 11 and year = 2001;
 
such subqueries may also be utilized in the WHERE clause of a SELECT, UPDATE, or DELETE statement
例: DELETE FROM line_item WHERE (order_nbr, part_nbr) IN
    (SELECT c.order_nbr, p.part_nbr
         FROM cust_order c, line_item li, part p
        WHERE c.ship_dt IS NULL AND c.cancelled_dt IS NULL
          AND c.order_nbr = li.order_nbr
          AND li.part_nbr = p.part_nbr
          AND p.status = 'DISCONTINUED');
Note the use of the IN operator in the WHERE clause. Two columns are listed together in parentheses prior to the IN keyword. Values in these two columns are compared to the set of two values returned by each row of the subquery. If a match is found, the row is removed from the line_item table.
 
3.相关性查询 (Correlated subquery):
A subquery that references one or more columns from its containing SQL statement is called a correlated subquery. Unlike noncorrelated subqueries, which are executed exactly once prior to execution of the containing statement, a correlated subquery is executed once for each candidate row in the intermediate result set of the containing query.(如果一个子查询中引用了包含查询语句的一个或多个字段,则该子查询就称为相关性子查询。和非相关性子查询不同的是-非相关性子查询在包含查询之前执行,而且只执行一次。相关性查询对包含查询中的中间结果集的每一条记录都执行一次子查询 )
 
例: SELECT p.part_nbr, p.name
    FROM supplier s, part p
    WHERE s.name = 'Acme Industries'
      AND s.supplier_id = p.supplier_id
      AND 10 <= (SELECT COUNT(*) FROM cust_order co, line_item li
                 WHERE li.part_nbr = p.part_nbr
                   AND li.order_nbr = co.order_nbr
                   AND co.order_dt >= TO_DATE('01-DEC-2001','DD-MON-YYYY'));
 
The reference to p.part_nbr is what makes the subquery correlated; values for p.part_nbr must be supplied by the containing query before the subquery can execute. If there are 10,000 parts in the part table, but only 100 are supplied by Acme Industries, the subquery will be executed once for each of the 100 rows in the intermediate result set created by joining the part and supplier tables.( 因为子查询中引用了包含查询的:p.part_nbr字段,所以该子查询是相关性查询,p.part_nbr字段的值必须在子查询执行之前由包含查询给出,如果表part中有1000种零件,但只有100种零件由Acme Industries提供,子查询会在由part表和supplier表联合组成的临时表中的100符合条件的记录上执行100次查询,而不是在包含查询执行前一次执行完成)
 
Correlated subqueries are often used to test whether relationships exist without regard to cardinality. The EXISTS operator is used for these types of queries (相关性查询通常用于测试某种关系是否存在,而不管返回记录集的大小, Exists运算符就是用于这种类型的查询)
    SELECT p.part_nbr, p.name, p.unit_cost
     FROM part p
     WHERE EXISTS (SELECT 1 FROM line_item li, cust_order co
                    WHERE li.part_nbr = p.part_nbr
                      AND li.order_nbr = co.order_nbr
                      AND co.ship_dt >= TO_DATE('01-JAN-2002','DD-MON-YYYY'))
As long as the subquery returns one or more rows, the EXISTS condition is satisfied without regard for how many rows were actually returned by the subquery. Since the EXISTS operator returns TRUE or FALSE depending on the number of rows returned by the subquery, the actual columns returned by the subquery are irrelevant. The SELECT clause requires at least one column, however, so it is common practice to use either the literal "1" or the wildcard " * ".( 因为Exists的返回值是True或False,所以Exists不会关注返回的记录到底有多少条,我们可以使用1或通配符*来获取返回记录)
 
4.内嵌视图: (Inline view)
the FROM clause contains a list of data sets. In this light, it is easy to see how the FROM clause can contain tables (permanent data sets), views (virtual data sets), and SELECT statements (temporary data sets). A SELECT statement in the FROM clause of a containing SELECT statement is referred to as an inline view 。Since it is a subquery that executes prior to the containing query, a more palatable name might have been a "pre-query."
 
Because the result set from an inline view is referenced by other elements of the containing query, we must give our inline view a name and provide aliases for all ambiguous columns. Similar to other types of subqueries, inline views may join multiple tables, call built-in and user-defined functions, specify optimizer hints, and include GROUP BY, HAVING, and CONNECT BY clauses. Unlike other types of subqueries, an inline view may also contain an ORDER BY clause( 因为对于包含SQL来说,Inline view的结果集引用了其它的元素,所以我们必须給Inline view一个名称,同时也必须給它里面的字段赋予一个别名。和其它类型的子查询相同,Inline view可以和多个表连接,调用内置、自定义函数,包含Group by,Having,Connect by字句;不同与其它的子查询,Inline view同样也可以包含Order by字句。)
 
Inline views are particularly useful when we need to combine data at different levels of aggregation( Inline view特别适用于当我们需要对多个聚集级别不同的数据集进行连接时,例如一个数据集的数据是来自于全体数据,另一个数据集的数据来自于统计后的结果)
 
When considering using an inline view, ask the following questions:
1).What value does the inline view add to the readability and, more importantly, the performance of the containing query?
2).How large will the result set generated by the inline view be?
3).How often, if ever, will I have need of this particular data set?
 
In general, using an inline view should enhance the readability and performance of the query, and it should generate a manageable data set that is of no value to other statements or sessions; otherwise, we may want to consider building a permanent or temporary table so that we can share the data between sessions and build additional indexes as needed
 
5. TOP N查询:
Certain queries that are easily described in English have traditionally been difficult to formulate in SQL. One common example is the "Find the top five salespeople" query. The complexity stems from the fact that data from a table must first be aggregated, and then the aggregated values must be sorted and compared to one another in order to identify the top or bottom performers.

  

需求:找出销售数量排在前5位的销售员和对应的销售额,并统计其对应的分红(销售额的1%)
 
准备工作:按销售员的 ID统计所有销售人员的销售记录:
SELECT e.lname employee, SUM(co.sale_price) total_sales
FROM cust_order co, employee e
WHERE co.order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
    AND co.order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
    AND co.ship_dt IS NOT NULL AND co.cancelled_dt IS NULL
    AND co.sales_emp_id = e.emp_id
GROUP BY e.lname
ORDER BY 2 DESC;
 
结果:  EMPLOYEE             TOTAL_SALES
--------------------    -----------
Blake                      1927580
Houseman            1814327
Russell                  1784596
Boorman               1768813
Isaacs                   1761814
McGowan             1761814
Anderson               1757883
Evans                     1737093
Fletcher                  1735575
Dunn                      1723305
注意:结果中有两个销售员的销售总额是相同的。所以最后结果应该包含6条记录,而不是5条记录
 
方案一:
SELECT e.lname employee, top5_emp_orders.tot_sales total_sales,
     ROUND(top5_emp_orders.tot_sales * 0.01) bonus
FROM (SELECT all_emp_orders.sales_emp_id emp_id, all_emp_orders.tot_sales tot_sales
FROM (SELECT sales_emp_id, SUM(sale_price) tot_sales
                  FROM cust_order
                  WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
                   AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
                   AND ship_dt IS NOT NULL AND cancelled_dt IS NULL
                  GROUP BY sales_emp_id
                  ORDER BY 2 DESC
                ) all_emp_orders            --按销售员的 ID分类统计销售额
          WHERE ROWNUM <= 5            
         ) top5_emp_orders, employee e       --筛选前 5位的销售记录
WHERE top5_emp_orders.emp_id = e.emp_id;     --计算其红利
 
结果:
EMPLOYEE              TOTAL_SALES      BONUS
-------------------- ----------- ----------
Blake                     1927580      19276
Houseman           1814327      18143
Russell                 1784596      17846
Boorman              1768813      17688
McGowan             1761814      17618
可以看到,这个 SQL在执行的时候漏掉了:Isaacs 1761814 这条记录,因为这个SQL语句使用了ROWNUM进行筛选,所以会把这条记录給漏掉了
 
方案二:
This will require two steps: find the fifth highest sales total last year, and then find all salespeople whose total sales meet or exceed that figure
这个方案需要两个步骤:
1).按业务员的 ID分组统计去年的销售总额
2).找出销售总额排在前 5位的数字
3).找出售额等于这 5个数字中任意一个的所有记录
4).计算其红利
 
SELECT e.lname employee,
       top5_emp_orders.tot_sales total_sales,
       ROUND(top5_emp_orders.tot_sales * 0.01) bonus
 FROM employee e,
       (SELECT sales_emp_id, SUM(sale_price) tot_sales        
          FROM cust_order
         WHERE order_dt >= TO_DATE('01-JAN-2001', 'DD-MON-YYYY')
           AND order_dt < TO_DATE('01-JAN-2002', 'DD-MON-YYYY')
           AND ship_dt IS NOT NULL
           AND cancelled_dt IS NULL
         GROUP BY sales_emp_id
        --筛选所有销售额等于前5位销售额的记录
  HAVING SUM(sale_price) IN (SELECT all_emp_orders.tot_sales
                                    FROM (SELECT SUM(sale_price) tot_sales
                                            FROM cust_order
                                           WHERE order_dt >=
                                                 TO_DATE('01-JAN-2001',
                                                         'DD-MON-YYYY')
                                             AND order_dt <
                                                 TO_DATE('01-JAN-2002',
                                                         'DD-MON-YYYY')
                                             AND ship_dt IS NOT NULL
                                             AND cancelled_dt IS NULL
                                           GROUP BY sales_emp_id
                                           ORDER BY 1 DESC) all_emp_orders
                                   WHERE ROWNUM <= 5)) top5_emp_orders
 WHERE top5_emp_orders.sales_emp_id = e.emp_id
 ORDER BY 2 DESC;
 
结果:
EMPLOYEE              TOTAL_SALES      BONUS
-------------------- ----------- ----------
Blake                     1927580      19276
Houseman           1814327      18143
Russell                 1784596      17846
Boorman               1768813      17688
McGowan              1761814      17618
Isaacs                   1761814      17618
可以看到,现在查询的结果就包含了所有正确的记录了,但是存在以下几个缺陷:
1).统计过程执行了两次
2).可读性很差
 
方案三:
the RANK function may be used to assign a ranking to each element of a set. The RANK function understands that there may be ties in the set of values being ranked and leaves gaps in the ranking to compensate. The following query illustrates how rankings would be assigned to the entire set of salespeople; notice how the RANK function leaves a gap between the fifth and seventh rankings to compensate for the fact that two rows share the fifth spot in the ranking(如果在要排序的队列中出现了相同的排列项, RANK函数会识别出来并且自动地调整排列项的序号,排列的结果序号可能不会是连续的。例如两个排列项的序号为5,那么下一个的序号就必须是7,而不是6了)
 
准备工作: RANK函数的用法:
SELECT sales_emp_id, SUM(sale_price) tot_sales,
  RANK( ) OVER (ORDER BY SUM(sale_price) DESC) sales_rank
FROM cust_order
WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
 AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
 AND ship_dt IS NOT NULL AND cancelled_dt IS NULL
GROUP BY sales_emp_id;
 
结果:
SALES_EMP_ID TOT_SALES SALES_RANK
------------ ---------- ----------
          11    1927580          1
          24    1814327          2
          34    1784596          3
          18    1768813          4
          25    1761814          5
          26    1761814          5
          30    1757883          7
          21    1737093          8
          19    1735575          9
可以看到,除了结果正确之外,序号也是正确的。 RANK函数会自动在5和7之间留出空隙。对于这个查询,TOP 5和TOP 6的结果都是相同的
 
最终解决方案:
SELECT e.lname employee, top5_emp_orders.tot_sales total_sales,
       ROUND(top5_emp_orders.tot_sales * 0.01) bonus
FROM (SELECT all_emp_orders.sales_emp_id emp_id,
             all_emp_orders.tot_sales tot_sales
          FROM (SELECT sales_emp_id, SUM(sale_price) tot_sales,
      RANK( ) OVER (ORDER BY SUM(sale_price) DESC) sales_rank
                  FROM cust_order
                 WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
                   AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
                   AND ship_dt IS NOT NULL AND cancelled_dt IS NULL
                 GROUP BY sales_emp_id
                ) all_emp_orders  -- 使用RANK函数分组统计销售记录
         WHERE all_emp_orders.sales_rank <= 5
        ) top5_emp_orders, employee e -- 取出排名前5位的记录(按RANK的值取)
WHERE top5_emp_orders.emp_id = e.emp_id  -- 计算所有销售前5名人员的红利
ORDER BY 2 DESC;
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值