relational database is based
upon sets, with each table representing a set. However, there are
some types of information that are not directly amenable to the set
data structure. Think, for example, of an organization chart, a
bill of material in a manufacturing and assembly plant, or a family
tree. These types of information are hierarchical in nature, and
most conveniently represented in a tree structure.
To represent hierarchical data, we need to make use of a
relationship such as when one
column of a table references another column of the same table. When
such a relationship is implemented using a database constraint, it
is known as self-referential integrity
constraint(当一个表中的某个字段引用了同一个表中的其他字段,就是我们常说的“自引用”完整性约束)
. The corresponding CREATE TABLE statement will look as
follows
CREATE TABLE EMPLOYEE
(
EMP_ID NUMBER (4) CONSTRAINT EMP_PK PRIMARY
KEY,
LNAME VARCHAR2 (15)NOT NULL,
DEPT_ID NUMBER
(2)NOT NULL,
MANAGER_EMP_ID NUMBER
(4) CONSTRAINT EMP_FK REFERENCES
EMPLOYEE(EMP_ID),
)
The column MANAGER_EMP_ID stores the EMP_ID of the employee's
manager.There is a foreign key constraint on the MANAGER_EMP_ID
column. This enforces the rule that any value we put in the
MANAGER_EMP_ID column must be the EMP_ID of a valid employee. Such
a constraint is not mandatory when representing hierarchical
information. However, it is a good practice to define database
constraints to enforce such business rules.
(在字段MANAGER_EMP_ID上有一个外键索引,该外键索引指向了同一个表中的另一个字段EMP_ID,这就强迫我们向该字段添加的任何值必须是employee表中一个有效的EMP_ID值,像这样的约束并不是强制必要的,但在执行某些商业逻辑上定义这样的约束是一个好习惯。)
1.简单的级联操作:
A.Finding the Root
Node:
Finding the root of a hierarchy tree is easy; we look for the one
node with no parent.
B.Finding a Node's Immediate
Parent:
We may wish to
link nodes to their immediate parents. For example, we might want
to print a report showing each employee's manager. The name of each
employee's manager can be derived by joining the EMPLOYEE table to
itself. This type of join is a self join. The following query
returns the desired result:
SELECT E.LNAME "Employee", M.LNAME "Manager"
FROM EMPLOYEE E, EMPLOYEE M
WHERE E.MANAGER_EMP_ID = M.EMP_ID;
The reason that
only 13 rows are returned from the self join is simple. This query
lists employees and their managers. But since the uppermost
employee KING doesn't have any manager, that row is not produced in
the output. If we want all the employees to be produced in the
result, we need an outer join, as in the following
example:
SELECT E.LNAME "Employee", M.LNAME "Manager"
FROM EMPLOYEE E, EMPLOYEE M
WHERE E.MANAGER_EMP_ID = M.EMP_ID (+);
C.Finding the
leaf nodes:
The opposite problem
from finding the root node, which has no parent, is to find leaf
nodes, which have no children. Employees who do not manage anyone
are the leaf nodes in the hierarchy tree shown in Figure 8-1. At
first glance, the following query seems like it should list all
employees from the EMPLOYEE table who are not managers of any other
employee:
SELECT * FROM EMPLOYEE
WHERE EMP_ID NOT IN (SELECT MANAGER_EMP_ID FROM EMPLOYEE);
However, when we execute this statement, we will see "No rows
selected." Why? It is because the MANAGER_EMP_ID column contains a
NULL value in one row (for the uppermost employee), and NULLs can't
be compared to any data
value.(然而,我们很奇怪地发现:没有一条记录返回!为什么?因为最顶层的节点其MANAGER_EMP_ID列包含了空值,而空值是无法和其他类型的数据进行比较的。)
Therefore, to get the
employees who don't manage anyone, we need to rewrite the query as
follows:
SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY,
HIRE_DATE
FROM EMPLOYEE E
WHERE EMP_ID NOT
IN (SELECT MANAGER_EMP_ID FROM EMPLOYEE
WHERE
MANAGER_EMP_ID IS NOT NULL);
In this example, the
subquery returns the EMP_IDs of all the managers. The outer query
then returns all the employees, except the ones returned by the
subquery. This query can also be written as a correlated subquery
using EXISTS instead of IN:
SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY,
HIRE_DATE
FROM EMPLOYEE E
WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEE E1
WHERE E.EMP_ID =
E1.MANAGER_EMP_ID);
2.Oracle SQL扩展:
For example, let's say we want to list each employee with his
manager. Using regular Oracle SQL, we can perform self outer joins
on the EMPLOYEE table, as shown here:
SELECT E_TOP.LNAME, E_2.LNAME, E_3.LNAME, E_4.LNAME
FROM EMPLOYEE E_TOP, EMPLOYEE E_2, EMPLOYEE E_3,
EMPLOYEE E_4
WHERE
E_TOP.MANAGER_EMP_ID IS NULL
AND E_TOP.EMP_ID = E_2.MANAGER_EMP_ID
(+)
AND E_2.EMP_ID = E_3.MANAGER_EMP_ID
(+)
AND E_3.EMP_ID = E_4.MANAGER_EMP_ID
(+);
The query returns eight rows, corresponding to the eight
branches of the tree. To get those results, the query performs a
self join on four instances of the EMPLOYEE table. Four EMPLOYEE
table instances are needed in this statement because there are four
levels to the hierarchy. Each level is represented by one copy of
the EMPLOYEE table. The outer join is required because one employee
(KING) has a NULL value in the MANAGER_EMP_ID column
This type
query has several drawbacks. First of all, we need to know the
number of levels in an organization chart when we write the query,
and it's not realistic to assume that we will know that
information. It's even less realistic to think that the
number of levels will remain stable over time. Moreover, we need to
join four instances of the EMPLOYEE table together for a four level
hierarchy. Imagine an organization with 20 levels—we'd need to join
20 tables. This would cause a huge performance problem.
A.START WITH...CONNECT BY and
PRIOR:
We can extract information in hierarchical form from a table
containing hierarchical data by using the SELECT statement's START
WITH...CONNECT BY clause. The syntax for this clause
is:
[[START WITH
condition1] CONNECT BY condition2]
The syntax elements are:
START WITH condition1
Specifies the
root row(s) of the hierarchy. All rows that satisfy
condition1 are considered root rows. If we don't specify the START
WITH clause, all rows are considered root rows, which is usually
not desirable. We can include
a subquery in condition1.
(Start
with用于指定级联结构中的根节点,我们可以在Start
with子句中使用子查询)
CONNECT BY condition2
Specifies the
relationship between parent rows and child rows in the
hierarchy. The relationship is expressed as a comparison
expression, where columns from the current row are compared to
corresponding parent columns. condition2
must contain the PRIOR operator, which is used to identify columns
from the parent row. condition2 cannot contain a
subquery.
(Connect
by用于指定级联结构中父记录和子记录之间的对应关系,condition2必须包含Prior操作符,该操作符用于表示父记录中的列,condition2不能含有子查询)
PRIOR is a
built-in Oracle SQL operator that is used with hierarchical queries
only. In a hierarchical query, the CONNECT BY clause
specifies the relationship between parent and child rows.
When we use
the PRIOR operator in an expression in the CONNECT BY condition,
the expression following the PRIOR keyword is evaluated for the
parent row of the current row in the query. In the following
example, PRIOR is used to connect each row to its parent by
connecting MANAGER_EMP_ID in the child to EMP_ID in the
parent:
(Prior操作符是Oracle中专门用于级联查询的操作符,当我们在一个表达式中使用Prior操作符时,紧跟着Prior关键字的表达式将在查询中被当成父记录看待,等号右边的表达式将被看成是子记录)
例:
SELECT LNAME, EMP_ID, MANAGER_EMP_ID
FROM EMPLOYEE
START WITH MANAGER_EMP_ID IS NULL
CONNECT BY PRIOR EMP_ID = MANAGER_EMP_ID;
LNAMEEMP_ID
MANAGER_EMP_ID
--------------------
--------------------------
KING7839
JONES7566 7839
SCOTT7788 7566
ADAMS 7876 7788
FORD7902 7566
The PRIOR column does not need to be listed first. The
previous query could be restated as:
SELECT LNAME, EMP_ID, MANAGER_EMP_ID
FROM EMPLOYEE
START WITH MANAGER_EMP_ID IS NULL
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;
Since the
CONNECT BY condition specifies the parent-child relationship, it
cannot contain a loop. If a row is both parent
(direct ancestor) and child (direct descendent) of another row,
then we have a loop. For example, if the EMPLOYEE table had the
following two rows, they would represent a
loop:
EMP_ID LNAMEDEPT_ID MANAGER_EMP_ID SALARY
HIRE_DATE
------ ---------- --------- -------------- ---------
---------
9001
SMITH20 9002 1800 15-NOV-61
9002
ALLEN30 9001 11600 16-NOV-61
***************************************************************************************
When a
parent-child relationship involves two or more columns, we need to
use the PRIOR operator before each parent
column.(当一个父子记录之间的关系受两个或多个字段影响时,我们就必须在每个父字段的前面都加上Prior关键字)
SELECT * FROM
ASSEMBLY
START WITH PARENT_ASSEMBLY_TYPE IS NULL AND
PARENT_ASSEMBLY_ID IS NULL
CONNECT BY PARENT_ASSEMBLY_TYPE = PRIOR
ASSEMBLY_TYPE
AND
PARENT_ASSEMBLY_ID = PRIOR ASSEMBLY_ID;
B.The LEVEL
Pseudocolumn:
In
a hierarchy tree, the term level refers to one layer of
nodes.Oracle provides a pseudocolumn, LEVEL, to represent these
levels in a hierarchy tree. Whenever we use the START
WITH...CONNECT BY clauses in a hierarchical query, we can use the
pseudocolumn LEVEL to return the level number for each row returned
by the
query.(在一棵级联结构的树中,level用来反映某一层的节点,Oracle提供了一个伪列(pseudocolumn)-Level,用来表示级联树中的层次,不管我们在级联查询中如何使用Start
with…Conect by子句,我们可以使用伪列“level”来返回查询结果中任何一条记录的层数)
The following example illustrates the use of the LEVEL
pseudocolumn
SELECT LEVEL,
LNAME, EMP_ID, MANAGER_EMP_ID
FROM EMPLOYEE
START WITH MANAGER_EMP_ID IS
NULL
CONNECT BY
MANAGER_EMP_ID = PRIOR EMP_ID;
LEVEL
LNAMEEMP_ID
MANAGER_EMP_ID
----- ----- -------------------- ---------------
1 KING7839
2 JONES 7566 7839
3 SCOTT 7788 7566
4 ADAMS 7876 778
3.复杂级联运算:
A.Finding the Number of
Levels:
Previously we showed
how the LEVEL pseudocolumn generates a level number for each record
when we use the START WITH...CONNECT BY clause. We can use the
following query to determine the number of levels in the hierarchy
by counting the number of distinct level numbers returned by the
LEVEL pseudocolumn:
SELECT
COUNT(DISTINCT LEVEL)
FROM EMPLOYEE
START WITH
MANAGER_EMP_ID IS NULL
CONNECT BY PRIOR
EMP_ID = MANAGER_EMP_ID;
COUNT(DISTINCTLEVEL)
--------------------
4
B.Listing Records in
Hierarchical Order
One of the very common programming challenges SQL programmers
face is to list records in a hierarchy in their proper hierarchical
order. For example, we might wish to list employees with their
subordinates underneath them, as is in the following query
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) ||
LNAME "EMPLOYEE",EMP_ID, MANAGER_EMP_ID
FROM EMPLOYEE
START WITH MANAGER_EMP_ID IS NULL
CONNECT BY PRIOR EMP_ID = MANAGER_EMP_ID;
LEVEL EmployeeEMP_ID MANAGER_EMP_ID
--------- ------------ --------- --------------
1
KING 7839
2 JONES 7566 7839
3 SCOTT 7788 7566
4 ADAMS 7876 7788
Notice that by using the expression LPAD(' ',2*(LEVEL - 1)),
we are able to align employee names in a manner that corresponds to
their level. As the level number increases, the number of spaces
returned by the expression increases, and the employee name is
further indented.
(注意:通过使用LPAD(' ',2*(LEVEL -
1))表达式,我们可以用一种特定的方式来对齐各个级别的用户名,随着级别数目的增加,该表达式返回的空格数目也随着增加,各个用户名之间更加交错排列)
Instead of reporting out the whole organization chart, we may
want to list only the subtree under a given employee, JONES for
example. To do this, we can modify the START WITH condition so that
it specifies JONES as the root of the query. For example
SELECT LEVEL, LPAD('',2*(LEVEL - 1))
|| LNAME "EMPLOYEE", EMP_ID, MANAGER_EMP_ID, SALARY
FROM EMPLOYEE
START WITH LNAME = 'JONES'
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;
LEVEL EmployeeEMP_ID MANAGER_EMP_ID SALARY
--------- ------------ ---------
-------------- ---------
1
JONES 7566 7839 2000
2 SCOTT 7788 7566 3000
3 ADAMS 7876 7788 1100
2 FORD 7902 7566 3000
3 SMITH 7369 7902 800
Notice that since we asked the query to consider JONES as the
root of the hierarchy, it assigned level 1 to JONES, level 2 to
employees directly reporting to him, and so forth. Be careful while
using conditions such as LNAME = 'JONES' in hierarchical queries.
In this case, if we have two JONES in our organization, the result
returned by the hierarchy may be wrong. It is better to use primary
or unique key columns, such as EMP_ID, as the condition in such
situations
(注意:因为我们在查询中指定JONES作为级联查询的起始点,它指定JONES的级别为1,其直接下属的级别为2,实际上这和我们原来查询得出的结果不同。当在级联查询中使用诸如LNAME
=
‘JONES’的条件时要特别注意,在这种情况下,如果我们刚好有两个同名的JONES,由此返回的结果集将有可能错误。相比之下,使用主键索引或惟一索引,比如EMP_ID作为条件更加合适)
In this example, we listed the portion of the organization
chart headed by a specific employee. There could be situations when
we may need to print the organization chart headed by any employee
that meets a specific condition. For example, we may want to list
all employees under the employee who has been working in the
company for the longest time. In this case, the starting point of
the query (the root) is dependent on a condition. Therefore, we
have to use a subquery to generate this information and pass it to
the main query, as in the following example
SELECT LEVEL, LPAD('',2*(LEVEL - 1))
|| LNAME "EMPLOYEE",EMP_ID, MANAGER_EMP_ID, SALARY
FROM EMPLOYEE
START WITH
HIRE_DATE = (SELECT MIN(HIRE_DATE) FROM EMPLOYEE)
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;
(该级联查询的起始条件中使用了子查询,先查出入职日期最早的时间,作为起始条件,然后查出其下的所有子节点的记录)
While using a subquery in the START WITH clause, be aware of how
many rows will be returned by the subquery. If more than
one row is returned when we are expecting just one row (indicated
by the = sign), the query will generate an error. We can
get around this by replacing = with the IN operator, but be warned
that the hierarchical query may then end up dealing with multiple
roots.
C.Checking for
Ascendancy(检查节点之间的父子关系):
Another common operation on hierarchical data is to check for
ascendancy. In an organization chart, we may ask whether one
employee has authority over another. For example:
"Does JONES have any authority over BLAKE?" To find out, we need to
search for BLAKE in the subtree headed by JONES. If we find BLAKE
in the subtree, then we know that BLAKE either directly or
indirectly reports to JONES. If we don't find BLAKE in the subtree,
then we know that JONES doesn't have any authority over BLAKE. The
following query searches for BLAKE in the subtree headed by
JONES:
SELECT * FROM
EMPLOYEE
WHERE LNAME =
'BLAKE'
START WITH
LNAME = 'JONES'
CONNECT BY
MANAGER_EMP_ID = PRIOR EMP_ID;
The START WITH...CONNECT BY clause in this example generates
the subtree headed by JONES, and the WHERE clause filters this
subtree to find BLAKE. As we can see, no rows were returned. This
means that BLAKE was not found in JONES' subtree, so we know that
JONES has no authority over BLAKE
D.Deleting a
Subtree(删除子树):
Let's assume that the organization we are dealing with splits,
and JONES and all his subordinates form a new company. Therefore,
we don't need to maintain JONES and his subordinates in our
EMPLOYEE table. Furthermore, we need to delete the entire subtree
headed by JONES, as shown in Figure 8-1, from our table. We can do
this by using a subquery as in the following
example:
DELETE FROM EMPLOYEE
WHERE EMP_ID IN (SELECT EMP_ID FROM
EMPLOYEE
START
WITH LNAME = 'JONES'
CONNECT
BY MANAGER_EMP_ID = PRIOR EMP_ID);
In this example, the subquery generates the subtree headed by
JONES, and returns the EMP_IDs of the employees in that subtree,
including JONES'. The outer query then deletes the records with
these EMP_ID values from the EMPLOYEE table.
E.Listing Multiple Root
Nodes(列出所有的根节点):
An interesting variation on the problem of listing the root
node of a hierarchy is to find and list the root nodes from several
hierarchies that are all stored in the same table. For example, we
might consider department manager's to represent root nodes, and we
might further wish to list all department managers found in the
EMPLOYEE table
There are no constraints on the employees belonging to any
department. However, we can assume that if A reports to B and B
reports to C, and A and C belong to the same department, then B
also belongs to the same department.
If an employee's manager belongs to another department, then
that employee is the uppermost employee, or manager, of his
department.Therefore, to find the uppermost employee in each
department, we need to search the tree for those employees whose
managers belong to a different department then their own.
(假如一个雇员的上级隶属于另一个部门,那么该雇员肯定是其所在部门中的最顶端的雇员,或者说是该部门的经理。所以为了找出employee表中所有部门的最顶端的雇员,我们必须查找整棵树,找出那些雇员的上级隶属于另一个部门的节点)
SELECT EMP_ID, LNAME, DEPT_ID,
MANAGER_EMP_ID, SALARY, HIRE_DATE
FROM EMPLOYEE
START WITH
MANAGER_EMP_ID IS NULL
CONNECT BY
MANAGER_EMP_ID = PRIOR EMP_ID
AND DEPT_ID !=
PRIOR DEPT_ID;
EMP_ID
LNAME DEPT_ID MANAGER_EMP_ID SALARY
HIRE_DATE
------ --------
-------- -------------- ------ ---------
7839
KING10 5000 17-NOV-81
7566
JONES20 7839 2975 02-APR-81
7698
BLAKE30 7839 2850 01-MAY-81
F.Listing the Top Few Levels of
a Hierarchy(列出级联记录中的若干层记录):
Another common task in dealing with hierarchical data is
listing the top few levels of a hierarchy tree. For example, we may
want to list top management employees in an organization. Let's
assume that the top two levels in our organization chart constitute
top management. We can then use the LEVEL pseudocolumn to identify
those employees, as in the following example
SELECT EMP_ID, LNAME, DEPT_ID,
MANAGER_EMP_ID, SALARY, HIRE_DATE
FROM EMPLOYEE
WHERE LEVEL
<= 2
START WITH
MANAGER_EMP_ID IS NULL
CONNECT BY
MANAGER_EMP_ID = PRIOR EMP_ID;
G.Aggregating a
Hierarchy(级联记录的统计):
Another challenging requirement on hierarchical data is to
aggregate a hierarchy. For example, we may want to sum the salaries
of all employees reporting to a specific employee. Or, we may want
to consider each employee as a root, and for each employee report
out the sum of the salaries of all subordinate employees
I.统计某个特定雇员及其所有下属的工资总和:
SELECT
SUM(SALARY) FROM EMPLOYEE
START WITH
LNAME = 'JONES'
CONNECT BY
MANAGER_EMP_ID = PRIOR EMP_ID;
II.统计每个雇员及其所有下属的工资总和:
相对于第一个问题,我们必须把所有的节点都看成是根节点,对每个节点都计算该员工及其所有下属的工资总和。明显地,我们必须不断重复执行第一个问题所使用地查询。
SELECT LNAME, SALARY, (SELECT SUM(SALARY)
FROM EMPLOYEE T1
START
WITH LNAME = T2.LNAME
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID) SUM_SALARY
FROM EMPLOYEE T2;
4.级联查询中的限制:
A.A
hierarchical query can't use a join.
B.A
hierarchical query cannot select data from a view that involves a
join
C.We can use an ORDER BY clause within a hierarchical query;
however, the ORDER BY clause takes precedence over the hierarchical
ordering performed by the START WITH...CONNECT BY clause.
Therefore, unless all we care about is the level number, it doesn't
make sense to use ORDER BY in a hierarchical query