oracle sql 语法之 over (partitioin by ...)

最新推荐文章于 2019-06-10 12:27:43 发布

cong8684

最新推荐文章于 2019-06-10 12:27:43 发布

阅读量146

点赞数

文章标签：数据库

select * from test

数据:
A B C
1 1 1
1 2 2
1 3 3
2 2 5
3 4 6

---将B栏位值相同的对应的C 栏位值加总
select a,b,c, SUM(C) OVER (PARTITION BY B) C_Sum
from test

A B C C_SUM
1 1 1 1
1 2 2 7
2 2 5 7
1 3 3 3
3 4 6 6

The GROUP BY solution results into one row per group, but the SUM over (partitioin by ...) solution returns the summary value for each detail row. Thus, you have both detail and summary data available at the same time, making it easy to compare each record's value with the sum for that group.

[@more@]

Analytic functions provide powerful ways to view your data.

It's 5:00 p.m. You're tired, hungry, and ready to go home for the day, when your manager calls you into his office to request a new report he needs "right now."

"Find the highest salary in each department, and give me a list of employees earning that amount. Can you do that before you leave? Just that one thing?"

You've heard the "just that one thing" line before. You phone home to say you'll be late for dinner, and then you get back to your keyboard to think about that query. "Find the highest salary in each department" is easy enough. A simple GROUP BY provides that information:

SELECT department_id, MAX(salary)
FROM employees
GROUP BY department_id;

But you also need to list the highest-paid employees in each department. Perhaps a subquery is what you need. Correlated or uncorrelated? It's late, and you really don't want to think through subqueries right now.

Fortunately, there is an easier way. Although you can solve this query problem by using a subquery, you can solve it much more easily by using the analytic function syntax that Oracle has supported since the release of Oracle8i Database. Listing 1 shows the analytic function solution and some of its output. (The examples in this article use the hr sample schema.) Not only is the solution in Listing 1 easier to think through than a subquery solution but verifying at a glance that it's correct is also easier, in our opinion.

Code Listing 1: Listing the highest salary and highest-paid employee

SELECT department_id dept, first_name || ' ' || last_name name, salary
FROM ( 
SELECT department_id, first_name, last_name,
       MAX(salary) OVER (PARTITION BY department_id) dept_max_sal, salary 
FROM employees e
) WHERE salary = dept_max_sal;

      DEPT       NAME                       SALARY
  ---------     ------------------------    --------
        10      Jennifer Whalen             4400
        20      Michael Hartstein           13000
        30      Den Raphaely                11000
...

The query in Listing 1 uses a two-step process to solve our reporting problem:

1. The inner query—in boldface text in Listing 1—generates a list of employees. Each row returned by this query contains a salary column with the employee's salary and also a dept_max_sal column with the maximum salary paid in the employee's department (more on the dept_max_sal column in a moment).

2. The outer query simply compares salary with dept_max_sal, returning only those rows where the two amounts match.

The key to this solution is the analytic function call that computes the maximum salary for each employee's department:

MAX(salary) OVER (
   PARTITION BY department_id
   )

MAX is an aggregate function, and aggregate functions serve double duty as analytic functions. When you see the OVER keyword, you know you are dealing with the analytic version of a function. In this case, the PARTITION BY department_id clause causes MAX to return the maximum salary for each department_id value. Think of department_id as defining a window of rows over which MAX is computed. The GROUP BY solution to the first part of your manager's request collapses the query results into one row per group, but the analytic solution returns the maximum salary value for each detail row. Thus, you have both detail and summary data available at the same time, making it easy to compare each employee's salary with the maximum for that person's department.

Analytic functions are powerful, and they make possible some elegant solutions to problems that previously were difficult to solve in SQL. Your manager's request highlights one problem scenario that can benefit from analytic syntax:

You need access to both detail and summary data at the same time.

Aside from being elegant, analytic function solutions tend to be less brittle and more changeable than their nonanalytic counterparts. Suppose you've taken the report from Listing 1 and handed it to your manager, only to be told, "You know, what I really need is for you to find the maximum salary for each job and then report on the employees in each position that are earning that maximum. Second, please include the maximum department salary for each employee you list."

The first part of this new request is quite easy to carry out. Simply begin with Listing 1 and change all occurrences of department_id to job_id. Also change the column aliases; for example, change dept_max_sal to job_max_sal. The revised query and solution to the first part of your manager's latest request are in Listing 2.

Code Listing 2: Listing the highest-paid employees in each job

SELECT job_id job, first_name || ' ' || last_name name, salary
FROM ( 
SELECT job_id, first_name, last_name,
       MAX(salary) OVER (PARTITION BY job_id) job_max_sal, salary 
       FROM employees e
) WHERE salary = job_max_sal;

The second part of this new request is also straightforward. All you need to do is add the following column to the inner query and include it in the outer query's select list:

 
 MAX(salary) OVER (
    PARTITION BY department_id
    ) dept_max_sal

Listing 3 shows the final query for this new request from your manager, along with some sample output. For good measure, Listing 3 also selects the department_id column for each employee. The query in Listing 3 highlights another problem scenario to which analytic syntax can be applied:

You need to summarize data over two or more dimensions at the same time.

Code Listing 3 Same as Listing 2, but with departmental maximun salary

The amount of effort required to go from Listing 2 to Listing 3 is minimal.

Happy that it was so easy to get through "that one thing," you present the report to your manager, who looks it over and then (of course) asks, "Would it be possible to report on the employees with the top five salaries in each department? I really do need to look by department, not by job. And I'd like the salaries to be ranked."

Walking away and muttering under your breath, you stop and remember that there is a RANK function that assigns ranks to rows within a window. You write the query in Listing 4 and get ready to give the results to your manager. The part of Listing 4 that generates the ranking is the following function call:

RANK() OVER (
   PARTITION BY department_id 
   ORDER BY salary DESC
) dept_rank

Code Listing 4: Listing the employees using RANK

Here is, conceptually, how this RANK function call is evaluated:

1. Rows are grouped by department (PARTITION BY department_id).
2. Within each group, rows are sorted in descending order by salary (ORDER BY salary DESC).
3. Each row in each group is given a number to indicate its rank in the group.

There's no argument to RANK, because the value returned is a function of how the rows are ordered in preparation for ranking. RANK returns 1 for the topmost row, 2 for the next row, and so forth. The ORDER BY clause is mandatory for use of any of the analytic ranking functions, because you use it to specify the basis for ranking. Keep in mind that the ORDER BY for ranking the employees by salary is not the same as the final ORDER BY that ranks the entire result set. In Listing 4, the final result set is in ascending order by department (the dept column) and then by descending order of salary rank. The PARTITION BY clause in Listing 4 causes separate ranking of each department's employees.

Listing 4 highlights yet another problem scenario for which you should consider an analytic solution:

You need to assign a numerical rank to rows as a result of sorting them by one or more columns.

You put on your coat and present the report to your manager, who looks it over and seems quite pleased until ... "Hey, what's with the ranking for department 60?" Examining the report more closely, you notice that employees David Austin and Valli Pataballa have the same salary and are tied for the third-highest in the department. The next employee, Diana Lorentz, is ranked fifth, due to the tie of the previous two employees' salaries. That is the way RANK works. Whenever n rows are tied, n-1 ranks are skipped, so that the next row's ranking will be what it would have been without the tie. Your manager then says, "So, I don't want to see any gaps in the rankings, OK?"

Standing there, you recall the DENSE_RANK function, for just such a request.

You go back to your desk and, making a small adjustment to the query, come up with Listing 5. You race back to your manager's office and present the report. Your manager looks it over and is pleased, "This is perfect; thank you very much." But as you are leaving, he asks, "For each employee in this report, do the rankings correspond to when the employee was hired? I assume that if one employee earns more than another, that employee has most likely been here longer. The report confirms that, right?"

Code Listing 5: Using DENSE_RANK instead of RANK

Stomach growling, you go back to your cube one more time.

To answer your manager's query, you will need to be able to compare the salary of each employee with that of the employee hired immediately afterward. You could write a self-join for this purpose, but you're too tired and hungry to want to think through doing that. Fortunately, you recall that the LAG and LEAD functions let you look at the preceding and following rows. After giving the matter a few moments' thought, you come up with the query in Listing 6, which brings us to our final problem scenario for which analytic functions provide a solution:

You need to compare values that lie in separate rows.

Code Listing 6: Using LEAD

Key to your solution is the following function in the outermost SELECT list in Listing 6:

LEAD(salary) OVER(
   PARTITION BY department_id 
   ORDER BY hire_date)

For the LAG and LEAD functions to work, you must impose order on the rows you are working with. Otherwise there's no sense of "preceding" or "following" among the results. Thus, as with RANK and DENSE_RANK, the ORDER BY clause is mandatory for the LAG and LEAD functions. What's great about analytic syntax is that the order you choose for one function is completely independent of the order you choose for another. Thus, your call to LEAD looks at the next row sorted by hire_date, whereas your call to DENSE_RANK still ranks rows by salary.

The CASE expression in Listing 6 compares each employee's salary with that of the next employee hired (ORDER BY hire_date) within the same department. Anytime the newer employee has a higher salary, the expression returns 'No'. Your manager can tell at a glance whenever salary progression in a department does not reflect strict seniority.

Anticipating your manager's likely next request, you decide to enhance your query even further to return the answer to the following question: "How does each employee's salary compare to the average salary of all employees hired within a one-year moving window centered on that one employee?

To answer this question, you need a moving average. In effect, you need a one-year-wide partition that is redefined for each employee row the query returns. This is another problem scenario to which you can apply analytic syntax:

You need the results of an aggregate function computed over a moving window.

To generate the one-year moving average you seek, you add the following function call to your query:

ROUND(AVG(salary) OVER (
   ORDER BY hire_date
   RANGE BETWEEN 
      hire_date - ADD_MONTHS(
         hire_date,-6) PRECEDING
   AND 
      ADD_MONTHS(hire_date,6) 
         - hire_date FOLLOWING)

3. The two expressions in the RANGE BETWEEN clause, one for PRECEDING and one for FOLLOWING, will both yield values of approximately 180. The numbers will vary somewhat, depending on the current hire_date, because the number of days in a month varies.
When the query is executed, the database engine subtracts the PRECEDING value from hire_date to determine the beginning of the moving window. Likewise, the database adds the FOLLOWING value to hire_date to determine the end of the moving window. This subtraction and addition is done for each row returned by the query.

4. There is no PARTITION BY clause, because you want the moving average to consider all salaries within the window. You could add PARTITION BY department_id if you wanted each moving average to consider only employees in the current department.

5. The ROUND function rounds the result to an integer.

Listing 7 shows the final query, with some sample results. Confident that this is the last time, you present the report to your manager. He is pleased with the report and your initiative. Happily, it's only 5:30 p.m. You've done all this work in a mere 30 minutes. You'll be home for dinner after all.

Code Listing 7: Adding a 12-month, moving-average


   ) moving_average

The key aspects of this analytic function call are as follows:

1. The RANGE BETWEEN clause defines a moving window that is computed anew for each row the query returns.

2. The ORDER BY hire_date clause specifies that hire_date is to be the basis for the moving window.

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/119395/viewspace-912863/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/119395/viewspace-912863/

cong8684

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
oracle sql 语法之 over (partitioin by ...)

select * from test数据:A B C 1 1 1 1 2 2 1 3 3 2 2 5 3 4 6 ---将B栏位值相同的对应的C 栏位值加总select a,b,c...
复制链接

扫一扫