《Pro Oracle SQL》Chapter 8 -- 8.7 Other Analytic Functions 之一

最新推荐文章于 2014-12-10 15:16:18 发布

caohong286

最新推荐文章于 2014-12-10 15:16:18 发布

阅读量204

点赞数

分类专栏： Pro Oracle SQL Chapter 8 文章标签： Pro Oracle SQL Oracle SQL Apress DBA

Pro Oracle SQL Chapter 8 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Other Analytic Functions (page254)
Oracle Database implements a great many other analytic functions. Some of those more commonly used are described in the following subsections. The functions that follow are the ones that should be on your short list of good functions to know about.

Oracle数据库支持多种其他的分析函数。其中的一些使用较多的在后面的章节中给予介绍。下面的函数是那些你应该记在必备函数“ 速查名单 ”中的。

Nth_value (11gR2)
While first_value and last_value functions provide the ability to fetch the first or last row in an ordered
result set, it is not quite straightforward to fetch any arbitrary row with these functions. In fact, fetching
the second row using either the first_value or last_value function is a complex task.

尽管 first_value 和last_value函数提供在一已排序的结果集中取出第一行或第二行的功能。但使用这些函数取出任意行还是那么直接。事实上，使用first_value 和last_value函数取出第二行数据是一项复杂的任务。
Oracle Database Version 11gR2 introduced another analytic function: nth_value, which is a
generalization of first_value and first_value functions. Using nth_value function, you can fetch any
row in the ordered result set, not just first or last values. The first_value function can be written as
nth_value (column_name, 1).

Oracle数据库自版本11gR2起引入另一个分析函数：nth_value，是first_value 和last_value函数的泛化。使用 nth_value 函数，你能从已排序的结果集中取出任一行，不止是第一个或最后一个值。first_value函数可以写作是nth_value (column_name, 1)。

In statistics analysis, outliers can occur in the head or tail of the result set. In some cases, it might be
important to ignore first_value or first_value in an ordered result set and fetch the value from the
next row. The second value in a result set can be fetched using the nth_value function passing two as
the offset to the function.

    在统计分析中，极端值会出现在结果集的头部或者尾部。在某些情况下，忽略在已排序列表中的第一个值或最后一个值，然后从下一行取值是重要的。在结果集中的第二个值能通过向nth_value函数传递2，作为偏移量，取得。
    The nth_value function also supports windowing clauses. As discussed earlier, a windowing clause
provides the ability to implement sliding dynamic window. This, in turn, effectively allows you to write
simple queries to answer complex questions such as Which store had second highest Sales in a span of 12
weeks for a product ?
    nth_value也支持开窗子句。之前讨论过，开窗子句提供执行滑动动态窗口的功能。这样就可以，允许你写简单的查询解答复杂的问题，诸如“ 哪一家商店在过去的12周内某一产品的销售额排前第二名 ”。

Syntax for the nth_value function is: nth_value 函数的句法是：
NTH_VALUE (measure, n ) [ FROM FIRST| FROM LAST] [RESPECT NULLS|IGNORE NULLS]
OVER (partitioning-clause order-by-clause windowing-clause)

The first argument to the nth_value function is the column name, and the second argument is the
offset in a window. For example, the clause nth_value(sale, 2) is accessing the second row in a
window. In the Listing 8-10, the SQL statement is fetching the Week column value with the second
highest Sale column value at product, country, region, and year level. The second row in this result set is
the row with second highest value for the Sale column since the rows are sorted by Sale column in the
descending order. The clause partition by product, country, region, year is specifying the
partitioning columns.

nth_value函数的第一个参数是列名，第二个参数是窗口的偏移值。例如，子句nth_value(sale, 2)访问窗口中的第二行。在列表8-10中，SQL语句取出在product, country, region, 和year层级上Sale列值第二高的Week列值。该结果集的第二行就是Sale列第二高的值所在行，因为行按照Sale列值降序排列了。子句 partition by product, country, region, year指定分区列。

Listing 8-10. Nth Value
1 select year, week, sale,
2 nth_value ( sale, 2) over (
3 partition by product,country, region, year
4 order by sale desc
5 rows between unbounded preceding and unbounded following
6 ) sale_2nd_top
7 from sales_fact
8 where country in ('Australia') and product='Xtend Memory'
9* order by product, country , year, week
YEAR      WEEK      SALE         SALE_2ND_TOP
----------   ----------   ----------      ------------
...
2000        49             42.38        187.48
2000        50             21.19        187.48
2000        52             67.45        187.48
2001      1               92.26        256.7
2001        2              118.38       256.7
2001        3               47.24        256.7
...

For the nth_value function, clauses FROM FIRST and RESPECT NULLS are the defaults. If the clause FROM FIRST is specified, then the nth_value function finds the offset row from the beginning of the window. The clause RESPECT NULLS returns null values if the column contains null values in the offset row. With an ability to specify a windowing clause, the nth_value function is quite powerful in accessing
an arbitrary row in the result set or in a partition.

对nth_value函数而言，子句FROM FIRST and RESPECT NULLS是默认值。如果指定子句FROM FIRST 则nth_value函数从窗口的开端查找偏移行。如果在偏移行的列包含null值则子句RESPECT NULLS返回null值。因为支持开窗子句，nth_value函数在访问结果集或者分区中任意行上相当强大。

Rank
The rank function returns the position of a row, as a number, in an ordered set of rows. If the rows are
sorted by columns, then the position of a row in a window reflects the rank of the value in that window
of rows. In the case of a tie, rows with equal value will have the same rank and the ranks are skipped,
leaving gaps in the rank values. This means that two rows can have the same rank, and the ranks are not
necessarily consecutive.

rank函数用数字返回有序行集中行的位置。如果行已按列排序，则行在窗口中的位置反映出(行在)该窗口中的秩(rank)值。如果值相等，相等值的行将有相同的秩且秩值将被跳过，秩值将留下间隔。这意味着两行能有相同的rank值，且rank值不一定连续。
The rank function is useful to compute the top or bottom N rows. For example, a query to find the
top 10 weeks by sales quantity is a typical retail industry data warehouse query. Such a query will greatly
benefit from the use of rank. If you need to write any query that computes top or bottom N-elements of a result set, use the rank or dense_rank function.

rank函数在计算前或者末尾N行是有用的。例如，查询“找出按销售量排名的前10周”是在零售业数据仓库典型的查询。查询中使用rank函数将极大的受益。如果你要写一个计算一结果集中头或底N个元素的查询，使用rank或dense_rank函数。
The rank function is also useful in finding inner-N rows. For example, if the goal is to fetch rows
from 21 through 40 sorted by sales, then you can use the rank function in a subquery with a predicate
between 21 and 40 filtering 20 inner rows.

rank函数在查找内部N行时也是有用的。例如，如果目标是取出按销售值排列的自21到40的行，则你可以在用带有谓词between 21 and 40，筛选20条内部行，的子查询中使用rank函数。
Syntax for the rank function is: rank函数的句法是：
rank() over (partition-clause order-by-clause)

In the Listing 8-11, you calculate the top 10 rows by sale for a Product, Country, Region, and Year
column values. The clause partition by product, country, region, week is specifying the partitioning
columns and the rows are sorted by Sale column descending order in that data partition using the order
by Sale desc clause. The rank function is calculating the rank of the row in that data partition. This SQL
is wrapped inside an inline view, and then a predicate of sales_rank <=10 is applied to fetch the top ten
weeks by Sale column.

在列表8-11中，你计算按Product, Country, Region, 和Year分区，销售值排名前10的行。子句partition by product, country, region, week指定分区列而数据区中的行按照Sale列降序排列，使用order by Sale desc 子句。rank函数计算在数据分区中行的秩。SQL被封装在一内联视图中，再用谓词sales_rank <=10筛选按Sale列排名的前10周。
Also, notice that the windowing clause is not applicable in the rank functions and the rank function
is applied over all the rows in a data partition.

也要注意开窗子句不能应用于rank函数且rank函数应用于数据分区中的整体行中。

Listing 8-11. Use of Rank Function: Top 10 Sales Weeks
1 select * from (
2 select year, week,sale,
3 rank() over(
4 partition by product, country, region ,year
5 order by sale desc
6 ) sales_rank
7 from sales_fact
8 where country in ('Australia') and product ='Xtend Memory'
9 order by product, country,year, week
10 ) where sales_rank<=10
11* order by 1,4
YEAR     WEEK   SALE       SALES_RANK
----         ----        ----------    ----------
...
2001      16         278.44       1
2001      4          256.70      2
2001       21         233.70     3
2001      48         182.96       4
2001      30         162.91       5
2001      14         162.91       5

2001 22 141.78 7
2001 43 139.58 8
...
The rank function assigns same rank in case of ties. In the output of Listing 8-11, notice that there
are two rows with a sales rank of 5, as the Sale column value is 162.91 for these two rows. Also, notice
that next rank is 7, not 6. In a nutshell , the rank function skips the ranks if there are ties. Number of rank
values skipped equals to number of rows with tied values. If there have been ties for three rows, then the
next rank will be 8.

对于值相等的情况，rank函数赋予相同的秩。在列表8-11输出中，注意有两行的销售秩值是5，因为这两行的Sale列值是162.91。注意下一个秩值是7，而非6。简而言之，rank函数跳过相等情况下的秩值。跳过的秩值数量等于具有相同值的行数量。如果有三行值相同，则下一个秩为8。

Dense_rank
Dense_rank is a variant of the rank function. The difference between the rank and dense_rank functions is that the dense_rank function does not skip the ranks in case of ties. As discussed in the earlier section, the dense_rank function is useful in finding top, bottom, or Inner N rows in a result set. In the Listing 8-12, the dense_rank function is used instead of rank function. Note that rank column for the week=22 is 6 in the Listing 8-12 and it is 7 in the Listing 8-11.

Dense_rank是rank函数的变种。rank和dense_rank之间的区别在于dense_rank函数不跳过相等的情况下的秩。正如之前节讨论的，dense_rank函数对于找出结果集中头，尾或者内部N行很有用。在列表8-12，dense_rank函数用于代替rank函数。注意在列表8-12中，week=22的rank列值是6，而在列表8-11中它是7。.
The dense_rank function is useful in the queries where the ranks need to be consecutive. For
example, ranks may not be skipped in a query to compute the top 10 students in a class roster. On the
other hand, the rank function is useful where the ranks need not be consecutive.

dense_rank函数对于查询需要连续型秩值的时候是有用的。例如，在计算一个班级名册前十名学生的查询中rank值可能不能跳过。另一方面，rank函数在秩值不必连续的时又是有用的。

Listing 8-12. dense_rank Function
1 select * from (
2 select year, week,sale,
3 dense_rank() over(
4 partition by product, country, region ,year
5 order by sale desc
6 ) sales_rank
7 from sales_fact
8 where country in ('Australia') and product ='Xtend Memory'
9 order by product, country,year, week
10 ) where sales_rank<=10
11* order by 1,4
/

/
YEAR     WEEK     SALE        SALES_RANK
-----        ----         ----------       ----------
2001      16           278.44        1
2001       4            256.70         2
2001     21           233.70        3
2001      48          182.96          4
2001      14          162.91          5
2001      30          162.91         5
2001      22           141.78         6
    Sort order for nulls can be controlled by the NULLS FIRST or NULLS LAST clause in the dense_rank
function. NULLS LAST is the default for ascending sort order, and NULLS FIRST is the default for the
descending sort order. In the Listing 8-12, descending sort order is used and the default NULLS FIRST
clause is in effect. Rows with null values will have a rank of 1 in this case.

在dense_rank函数中可以NULLS FIRST或NULLS LAST子句控制null值的排序。NULLS LAST默认为升序，而NULLS FIRST默认为降序。在列表8-12中，使用了降序且默认NULLS FIRST子句生效。这时带有null值的行rank值为1。

Row_number
The row_number function assigns a unique number for each row in the ordered result set. If the
partitioning clause is specified, then each row is assigned a number unique within a data partition,
based upon its position in the sort order in that partition. If the partitioning clause is not specified, then
each row in the result set is assigned a unique number.

row_number函数对一有序结果集中的每行赋予一唯一数值。如果指定了分区子句，则对数据分区中每行赋予一唯一数值，基于它在那个分区中排序的位置。如果分区子句没有指定，则结果集中的每行赋予一唯一值。
The row_number function is also useful to fetch top, bottom, or Inner N queries, similar to rank and
dense_rank functions. Even though the rank, dense_rank, and row_number functions have similar
functionality, there are subtle differences between them. One is that the row_number function does not
allow windowing clauses.

    row_number函数也用于取出头，尾或者内部N行的查询，相似于rank和dense_rank函数。虽然rank，dense_rank和row_number函数有相似的功能，他们间有微妙的差别。有一点就是row_number函数不支持开窗子句。
Syntax for the row_number function is:      row_number 函数的语法是：
    Row_number() over (partition-clause order-by-clause)

The row_number function is a non-deterministic function. The value of the row_number function is
undetermined if two rows have same value in a data partition. For example, in the Listing 8-13, rows
with column values of 19, 8, 12, and 4 have same value of 46.54 in the Sale column. The row_number
function returns values of 31, 32, 34, and 33 respectively for these rows, in the example output. But the
result could just as easily be 34, 31, 32, 33 or 32, 34, 31, 33. In fact, you might get different results with
execution of the query. On the contrary, rank and dense_rank functions are deterministic and will always
return consistent values if a query is re-executed.

row_number函数是一非确定性函数。如果在数据分区中的两行具有相同的值row_number函数的值是不确定的。例如，在列表8-13 中，week列值为19，8，12和4具有相同销售列值46.54。row_number函数对这些行分别返回值31，32，34和33 ，在例子输出中。但是结果也很可能会是 34，31，32，33 或 32，34，31，33。事实上，你可能每次执行查询得到不同的结果集。相比而言，rank和dense_rank函数则是确定性的，且如果查询反复执行，总是会返回一致的值。

Listing 8-13. row_number Function
1 select year, week,sale,
2 row_number() over(
3 partition by product, country, region ,year
4 order by sale desc
5 ) sales_rn,
6 rank() over(
7 partition by product, country, region ,year
8 order by sale desc
9 ) sales_rank
10 from sales_fact
11 where country in ('Australia') and product ='Xtend Memory'
12* order by product, country,year,sales_rank
YEAR        WEEK         SALE         SALES_RN        SALES_RANK
-----          ----           ----------        ----------            ----------
...
2000        19             46.54             31                     31
2000         8             46.54             32                     31
2000        12             46.54            34                     31
2000          4             46.54             33                     31
...

caohong286

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
《Pro Oracle SQL》Chapter 8 -- 8.7 Other Analytic Functions 之一

Other Analytic Functions (page254) Oracle Database implements a great many other analytic functions. Some of those more commonly used are described in the following subsections. The fu...
复制链接

扫一扫

专栏目录