oracle分析函数应用

最新推荐文章于 2023-05-04 23:57:56 发布

fenglx871004

最新推荐文章于 2023-05-04 23:57:56 发布

阅读量1.3k

点赞数

分类专栏：数据库技术文章标签： oracle report 报表 date null 引擎

数据库技术专栏收录该内容

9 篇文章 0 订阅

订阅专栏

一.over函数：

用法：over（partition by col1,col2..order by col3,col4)

①Over函数指明在那些字段上做分析，其内跟Partition by表示对数据进行分组。注意Partition by可以有多个字段。

②Over函数可以和其它聚集函数、分析函数搭配，起到不同的作用。例如这里的SUM，还有诸如Rank，Dense_rank等。

eg：

SQL> select all_sales.*,
  2         100 * round(cust_sales / region_sales, 2) || '%' Percent
  3    from (select o.cust_nbr customer,
  4                 o.region_id region,
  5                 sum(o.tot_sales) cust_sales,
  6                 sum(sum(o.tot_sales)) over(partition by o.region_id) region_sales
  7            from orders_tmp o
  8           where o.year = 2001
  9           group by o.region_id, o.cust_nbr) all_sales
10   where all_sales.cust_sales > all_sales.region_sales * 0.2;

  CUSTOMER     REGION CUST_SALES REGION_SALES PERCENT
---------- ---------- ---------- ------------ ----------------------------------------
         4            5                  37802        37802    100%
        10           6                  64315        68065      94%
        11           7                  12204        12204    100%

分析函数OVER解析：

请注意上面的绿色高亮部分，group by的意图很明显：将数据按区域ID，客户进行分组，那么Over这一部分有什么用呢？假如我们只需要统计每个区域每个客户的订单总额，那么我们只需要group by o.region_id,o.cust_nbr就够了。但我们还想在每一行显示该客户所在区域的订单总额，这一点和前面的不同：需要在前面分组的基础上按区域累加。很显然group by和sum是无法做到这一点的(因为聚集操作的级别不一样，前者是对一个客户，后者是对某个区域的一批客户)。

这就是over函数的作用了！它的作用是告诉SQL引擎：按区域对数据进行分区，然后累积每个区域每个客户的订单总额(sum(sum(o.tot_sales)))。

二.窗口函数：

前面我们介绍的分析函数用于计算/统计一个明确的阶段/记录集，而这里有部分需求，需要随着遍历记录集的每一条记录的同时进行统计。也即是说：统计不止发生一次，而是发生多次。统计不至发生在记录集形成后，而是发生在记录集形成的过程中。

1.oracle子句介绍如下：

1）对于全统计或者滚动记录统计，使用rows between ... preceding and ... following。从字面上猜测它的意思是：在XXX之前和XXX之后的所有记录。

unbounded preceding and unbouned following的意思针对当前所有记录的前一条、后一条记录，也就是表中的所有记录。

但是rows between 1 preceding and unbounded following) all_sales，实际1在这里不是从第1条记录开始的意思，而是指当前记录的前一条记录。我们能够把and unbounded following换成代表当前记录集curreent row。

eg：列出每月的订单总额、截至到当前月的订单总额以及全年的订单总额，如求平均值则将sum换为avg

SQL> select month,
2 sum(tot_sales) month_sales,

   sum(sum(tot_sales)) over(order by month
  4         rows between unbounded preceding and current row) current_total_sales,
  3         sum(sum(tot_sales)) over (order by month
  4            rows between unbounded preceding and unbounded following) total_sales
  5    from orders
  6   group by month;

MONTH MONTH_SALES CURRENT_TOTAL_SALES TOTAL_SALES

---------- ----------- ------------------- -----------
         1      610697              610697     6307766
         2      428676             1039373     6307766
         3      637031             1676404     6307766

2）某一范围统计，使用 range between interval '2' day preceding and interval '2' day following，查找当前日期的前2天，后2天范围内的记录（五天之内）。

eg：统计当天销售额和五天内的平均销售额：

select trunc(order_dt) day,
             sum(sale_price) daily_sales,
             avg(sum(sale_price)) over (order by trunc(order_dt)
                      range between interval '2' day preceding
                                     and interval '2' day following) five_day_avg
   from cust_order
where sale_price is not null
     and order_dt between to_date('01-jul-2001','dd-mon-yyyy')
     and to_date('31-jul-2001','dd-mon-yyyy')

3）first_value、last_value，用于在窗口记录集中查找第一条记录和最后一条记录。

eg：报表需要显示当前月、上一个月、后一个月的销售情况，以及每3个月的销售平均值

select month,
             first_value(sum(tot_sales)) over (order by month
                                    rows between 1 preceding and 1 following) prev_month,

             sum(tot_sales) monthly_sales,

             last_value(sum(tot_sales)) over (order by month
                                  rows between 1 preceding and 1 following) next_month,

             avg(sum(tot_sales)) over (order by month
                                 rows between 1 preceding and 1 following) rolling_avg
    from orders
where year = 2001
      and region_id = 6
  group by month
order by month;

4）lag函数类似于preceding和following子句，它能够通过和当前记录的相对位置而被应用，在比较同一个相邻的记录集内两条相邻记录的时候特别有用。

eg：显示当月的销售额和上个月的销售额

elect  month,
          sum(tot_sales) monthly_sales,
          lag(sum(tot_sales), 1) over (order by month) prev_month_sales
   from orders
where year = 2001
      and region_id = 6
  group by month
order by month;

三.报表函数：

1.对于上面个的全统计每检索一条记录就执行一次，它总共执行了12次。这是非常费时的。实际可以用over（）函数替代。

sum(sum(tot_sales)) over(order by month rows between unbounded preceding and unbounded following) win_sales,
等同 sum(sum(tot_sales)) over() rpt_sales

over函数的空括号表示该记录集的所有记录都应该被列入统计的范围，如果使用了partition by则先分区，再依次统计各个分区。

2.RATIO_TO_REPORT函数：

报表函数特(窗口函数)特别适合于报表中需要同时显示详细数据和统计数据的情况。例如在销售报告中经常会出现这样的需求：列出上一年度每个月的销售总额、年底销售额以及每个月的销售额占全年总销售额的比例：

方法②：

select region_id, salesperson_id,

sum(tot_sales) sp_sales,

round(sum(tot_sales) / sum(sum(tot_sales))
over (partition by region_id), 2) percent_of_region

from orders

where year = 2001

group by region_id, salesperson_id

order by region_id, salesperson_id;

方法③

select region_id, salesperson_id,

sum(tot_sales) sp_sales,

round( ratio_to_report(sum(tot_sales))

over (partition by region_id), 2) sp_ratio

from orders

where year = 2001

group by region_id, salesperson_id

order by region_id, salesperson_id;

Oracle提供的Ratio_to_report函数允许我们计算每条记录在其对应记录集或其子集中所占的比例。

四.使用分析函数rand，dense_rank，row_number来为记录排名：

①ROW_NUMBER：

Row_number函数返回一个唯一的值，当碰到相同数据时，排名按照记录集中记录的顺序依次递增。

②DENSE_RANK：
Dense_rank函数返回一个唯一的值，除非当碰到相同数据时，此时所有相同数据的排名都是一样的。

③RANK：
Rank函数返回一个唯一的值，除非遇到相同的数据时，此时所有相同数据的排名是一样的，同时会在最后一条相同记录和下一条不同记录的排名之间空出排名。
在over函数中order by的前面增加一个分组子句：partition by region_id。则为分组排名，
Partition by 子句在排列函数中的作用是将一个结果集划分成几个部分，这样排列函数就能够应用于这各个子集。

eg：
各个地区的订单总额进行排名

SQL> select region_id, customer_id,

sum(customer_sales) total,

2 rank() over(partition by region_id
order by sum(customer_sales) desc) rank,

3 dense_rank() over(partition by region_id
order by sum(customer_sales) desc) dense_rank,

4 row_number() over(partition by region_id
order by sum(customer_sales) desc) row_number

5 from user_order

6 group by region_id, customer_id;

REGION_ID CUSTOMER_ID TOTAL RANK DENSE_RANK ROW_NUMBER

---------- ----------- ---------- ---------- ---------- ----------

5 4 1878275 1 1 1

5 2 1224992 2 2 2

5 5 1169926 3 3 3

6 6 1788836 1 1 1

6 9 1208959 2 2 2

五.分析函数Top/Bottom N、First/Last、NTile

1.空值NUll排名：

对于某些排名，值为null的排名第一，如何解决呢？可以用NULLS LAST/FIRST告诉Oracle让空值排名最后后第一。

eg：

SQL> select region_id, customer_id,
  2         sum(customer_sales) cust_total,
  3         sum(sum(customer_sales)) over(partition by region_id) reg_total,
  4         rank() over(partition by region_id
                        order by sum(customer_sales) desc NULLS LAST) rank
  5        from user_order
  6       group by region_id, customer_id;

REGION_ID CUSTOMER_ID CUST_TOTAL  REG_TOTAL       RANK
---------- ----------- ---------- ---------- ----------
        10          26    1808949    6238901           1
        10          27    1322747    6238901           2
        10          30    1216858    6238901           3
        10          28     986964     6238901           4
        10          29     903383     6238901           5
        10          31     6238901                           6

2.Top/Bottom N查询：

在日常的工作生产中，我们经常碰到这样的查询：找出排名前5位的订单客户、找出排名前10位的销售人员等等。可以用1中的rank值取某个范围的值，eg：where rank <= 3;

3.First/Last排名查询：

oracle中用first、last函数来解决此类问题

eg：找出订单总额最多、最少的客户

SQL> select min(customer_id)
  2         keep (dense_rank first order by sum(customer_sales) desc) first,
  3         min(customer_id)
  4         keep (dense_rank last order by sum(customer_sales) desc) last
  5    from user_order
  6   group by customer_id;

     FIRST       LAST
---------- ----------
        31          1

解释：

min函数的作用是用于当存在多个First/Last情况下保证返回唯一的记录。

从上面的结果我们已经知道Oracle对排名的结果只“保留”2条数据，这就是keep的作用。告诉Oracle只保留符合keep条件的记录。

dense_rank是告诉Oracle排列的策略，first/last则告诉最终筛选的条件。

4.层次查询分析函数NTile

eg：找出订单总额排名前1/5的客户。

SQL> select region_id,
  2         customer_id,
  3         ntile(5) over(order by sum(customer_sales) desc) til
  4    from user_order
  5   group by region_id, customer_id;

REGION_ID CUSTOMER_ID       TILE
---------- ----------- ----------
        10          31          1
         9          25           1
        10          26          1
         6           6            1
         8          18           2
         5           2            2
         9          23           3
         6           9            3
         7          11           3
         5           3            4
         6           8            4
         8          16           4
         6           7            5
        10          29          5
         5           1            5

Ntil函数为各个记录在记录集中的排名计算比例，我们看到所有的记录被分成5个等级，那么假如我们只需要前1/5的记录则只需要截取TILE的值为1的记录就可以了。假如我们需要排名前25%的记录(也就是1/4)那么我们只需要设置ntile(4)就可以了。

fenglx871004

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
oracle分析函数应用

 一.over函数： 用法：over（partition by col1,col2..order by col3,col4) ①Over函数指明在那些字段上做分析，其内跟Partition by表示对数据进行分组。注意Partition by可以有多个字段。 ②Over函数可以和其它聚集函数、分析函数搭配，起到不同的作用。例如这里的SUM，还有诸如Rank，Dense_rank等。eg： SQL> select all_sales
复制链接

扫一扫