SQLServer Ranking

SQL Server 2005 introduced several new functions that belong to a group termed "Ranking Functions". In Books Online (BOL), Ranking Functions are described as “Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are nondeterministic.” The four ranking functions are ROW_NUMBER(), NTILE(), RANK() and DENSE_RANK(). Note that you might find references to these as “Windowing Functions” in some places on the internet.

BOL describes the ROW_NUMBER() function as “Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.” Its syntax is described as:

 ROW_NUMBER ( )     OVER ( [ <partition_by_clause> ] <order_by_clause> )

Now, it’s time to show examples of how this works. First, let’s set up a business case. Assume that you run a nationwide scholarship program. Candidates can apply each year for up to 10 schools. You want to produce a report for the top 5 schools that each applicant has applied to, where they meet the eligibility requirements to attend, for the current school year. Ignoring all of the related tables that would be involved, the table at the root of this report would be defined as:

declare @Candidates table (CandidateID int NOT NULL,                            [Year] smallint NOT NULL,                            ChoiceNbr tinyint NOT NULL,                            SchoolID smallint NOT NULL,                           MeetsEligibility bit NOT NULL);

Now, let’s add some data to this table:

insert into @Candidates (CandidateID, [Year], ChoiceNbr, SchoolID, MeetsEligibility)select 1,2010,1,1,1  UNION ALL select 1,2010,2,2,1  UNION ALLselect 1,2010,3,3,1  UNION ALL select 1,2010,4,4,1  UNION ALLselect 1,2010,5,5,0  UNION ALL select 1,2010,6,6,0  UNION ALLselect 1,2010,7,7,1  UNION ALL select 1,2010,8,8,1  UNION ALLselect 1,2010,9,9,0  UNION ALL select 1,2010,10,10,1 UNION ALLselect 2,2010,1,1,0  UNION ALL select 2,2010,2,2,1  UNION ALLselect 2,2010,3,3,1  UNION ALL select 2,2010,4,4,0  UNION ALLselect 2,2010,5,5,1  UNION ALL select 2,2010,6,6,1  UNION ALLselect 2,2010,7,7,1  UNION ALL select 2,2010,8,8,0  UNION ALLselect 2,2010,9,9,1  UNION ALL select 2,2010,10,10,1;

If you run the SQL statement select * from @Candidates, you will see that there are two candidates applying to 10 schools each for the year 2010. They each meet the eligibility requirements to 7 schools, though not for the same choice number. To get just the schools where they meet the eligibility requirements, we obviously need to retrieve only those records where MeetsEligibility=1. The SQL Statement:

select * from @Candidates where MeetsEligibility = convert(bit,1);

Returns:

CandidateIDYearChoiceNbrSchoolIDMeetsEligibility
12010111
12010221
12010331
12010441
12010771
12010881
1201010101
22010221
22010331
22010551
22010661
22010771
22010991
2201010101

This removes the schools where they don’t meet the eligibility requirements, but we are still retrieving more than the desired 5 records for each candidate. If we use a top 5, then we only get the first 5 rows, not the first 5 for each candidate. Now, we could go through a looping mechanism, but that would be inefficient. Taking a look at the ROW_NUMBER() function, we see that it applies a sequential number to each row. The ROW_NUMBER() function assigns the row number in the order specified by the ORDER BY clause. Since we want the report for the top 5 schools for each candidate, we’ll use the ChoiceNbr column. So, let’s try it out with this SQL statement:

select *, RowNbr = ROW_NUMBER() OVER (ORDER BY ChoiceNbr)   from @Candidates  where MeetsEligibility = convert(bit,1);

(Note that the ORDER BY clause on the ROW_NUMBER() function is NOT the same as the ORDER BY clause on the SELECT statement. On the ROW_NUMBER() function, it controls the order of the assignment of the row number. Without an ORDER BY clause on the SELECT statement, the overall results of the query are not guaranteed to be in any particular order.)

This query returns:

CandidateIDYearChoiceNbrSchoolIDMeetsEligibilityRowNbr
120101111
120102212
220102213
220103314
120103315
120104416
220105517
220106618
220107719
1201077110
1201088111
2201099112
220101010113
120101010114

Here we see that the rows have been numbered in the RowNbr column, and that the results are ordered by the ChoiceNbr. However, the candidates are intermixed, and there still isn’t an easy way to get the top 5 schools for each candidate. Taking a look again at the ROW_NUMBER() function, we see that there is an optional PARTITION BY clause which will divide the result set into groups where the ROW_NUMBER() assignment is restarted at the start of each partition. Since we want to see the top 5 schools per CandidateID, let’s partition by the CandidateID with this SQL statement:

select *, RowNbr = ROW_NUMBER() OVER (PARTITION BY CandidateID ORDER BY ChoiceNbr)   from @Candidates  where MeetsEligibility = convert(bit,1);

This query returns:

CandidateIDYearChoiceNbrSchoolIDMeetsEligibilityRowNbr
120101111
120102212
120103313
120104414
120107715
120108816
12010101017
220102211
220103312
220105513
220106614
220107715
220109916
22010101017

We’re getting pretty close to what we want to see now: every time the CandidateID changes, the row numbering restarts. To get just the top 5 per candidate, just return the rows where RowNbr <= 5 with:

select CandidateID, [Year], ChoiceNbr, SchoolID, RowNbr   from (select *, RowNbr = ROW_NUMBER() OVER (PARTITION BY CandidateID ORDER BY ChoiceNbr)           from @Candidates          where MeetsEligibility = convert(bit,1)) a  where a.RowNbr <= 5  order by CandidateID, RowNbr;

Which returns:

CandidateIDYearChoiceNbrSchoolIDRowNbr
12010111
12010222
12010333
12010444
12010775
22010221
22010332
22010553
22010664
22010775

This is now returning what we want.

“But wait, there’s more!” What happens if we add data from another year to the table? To do this, I’ll just use the previous year, and use the same SchoolID + 100 as those already existing in the table:

insert into @Candidatesselect CandidateID, [Year]-1, ChoiceNbr, SchoolID + 100, MeetsEligibility  from @Candidates;

When we run the above select SQL statement, we get:

CandidateIDYearChoiceNbrSchoolIDRowNbr
12010111
1200911012
1200921023
12010224
12010335
2200921021
22010222
22010333
2200931034
2200951055

Here we see that we are getting data intermixed between the years. Luckily, both the ORDER BY and PARTITION BY clauses can accept multiple columns. Let’s add the Year to the PARTITION BY clause, and add to the WHERE clause the code necessary to get data for just the current year:

select CandidateID, [Year], ChoiceNbr, SchoolID, RowNbr   from (select *, RowNbr = ROW_NUMBER() OVER (PARTITION BY CandidateID, [Year] ORDER BY ChoiceNbr)           from @Candidates          where MeetsEligibility = convert(bit,1)            and [Year] = 2010) a  where a.RowNbr <= 5 order by CandidateID, RowNbr;

This query returns:

CandidateIDYearChoiceNbrSchoolIDRowNbr
12010111
12010222
12010333
12010444
12010775
22010221
22010332
22010553
22010664
22010775

Now we are returning the desired result.

One of the other operations you can perform is an update statement. If you wanted to update the Year for the 4th of the candidate’s choices to 2020 for Candidate 1, you could run this SQL statement:

UPDATE dtMyUpdate    SET [Year] = 2020   FROM (select *, RowNbr = ROW_NUMBER() OVER (PARTITION BY CandidateID, [Year] ORDER BY ChoiceNbr)           from @Candidates          where MeetsEligibility = convert(bit,1)            and [Year] = 2010) dtMyUpdate  WHERE CandidateID = 1   AND RowNbr = 4

Here SQL builds a derived table that gets the RowNbr using the ROW_NUMBER() function, and then updates it for the specified criteria.

Another nice thing that the ROW_NUMBER() function can be used for is in identifying contiguous data ranges. Instead of covering that in this article, I’m going to refer you to this blog by Dave Ballantyne in which this is covered: http://sqlblogcasts.com/blogs/sqlandthelike/archive/2009/08/27/sql-and-contiguous-data-ranges.aspx

Now, let’s take a look at the remaining ranking functions. BOL describes the RANK() function as “Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question.” DENSE_RANK() is described as “Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question.” NTILE() is described as “Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.” For RANK() and DENSE_RANK(), their syntax is identical to that of ROW_NUMBER(). The syntax for NTILE() is:

NTILE (integer_expression) OVER([<partition_by_clause>] < order_by_clause > )

The NTILE() function will group the results into the number of groups as specified by the integer_expression. If the groups cannot be evenly distributed, the lower numbered groups will contain one extra record.

Now, let’s look at the difference between RANK() and DENSE_RANK(). For both of these, if there is a tie between records, then all records with the tie will receive the same rank number. The difference is in what happens with the NEXT record. For DENSE_RANK(), the next rank number is used. For RANK(), the value assigned is as if all preceding records had no ties. If there is no partitioning involved, this would be the same as the ROW_NUMBER().

To show how all of these work, let’s look at a small-town graduating class of 12 students, and how all of the ranking functions will work with looking at their grade. Some of the students graduated with the same grade, so this will show the difference between RANK() an dDENSE_RANK(). We’ll also use NTILE() to break this class down into 5 groups.

declare @ClassRank table (StudentID tinyint, Grade tinyint)insert into @ClassRank select 1, 100 UNION ALL select 2, 100 UNION ALL select 3, 99  UNION ALL select 4, 98  UNION ALL select 5, 95  UNION ALL select 6, 95  UNION ALL select 7, 90  UNION ALL select 8, 89  UNION ALL select 9, 89  UNION ALL select 10, 85 UNION ALL select 11, 85 UNION ALL select 12, 82 select StudentID,        Grade,        RowNbr =ROW_NUMBER() OVER(ORDER BY Grade DESC),        [Rank] =RANK() OVER(ORDER BY Grade DESC),        DenseRank =DENSE_RANK() OVER(ORDER BY Grade DESC),        [NTile] =NTILE(5) OVER (ORDER BY Grade DESC)  from @ClassRank

This query will return:

StudentIDGradeRowNbrRankDenseRankNTile
11001111
21002111
3993321
4984432
5955542
6956542
7907753
8898863
9899864
1085101074
1185111075
1282121285

Let’s examine the results for the first three students. The first two have the same grade, so they are both ranked #1 by both the RANK() and DENSE_RANK() functions. The third student has a different grade, and is assigned to a different rank. With the RANK() function, since this is the third student the assigned rank is #3. With the DENSE_RANK() function, the assigned rank is the next rank – in this case, #2.

In the NTILE() function, we specified to return 5 groups. Since there are 12 students, the results break down to all groups having 2 students, with the beginning groups being assigned an additional student until the total number of students are accounted for.

Another analogy to help you understand the RANK() function would be in sports. Multiple teams can be tied for the same position. If two teams are tied for the #2 spot, the 4th team is not ranked #3 – they would be ranked #4.

Hopefully, you will be able to use the ranking functions to eliminate some of the cursors (or other inefficient looping mechanisms) in your code, and to replace it with a more efficient set-based method.

References:

BOL Ranking Functionshttp://msdn.microsoft.com/en-us/library/ms189798.aspx
BOL ROW_NUMBER()http://msdn.microsoft.com/en-us/library/ms186734.aspx
BOL OVER Clausehttp://msdn.microsoft.com/en-us/library/ms189461.aspx
BOL ORDER BY Clausehttp://msdn.microsoft.com/en-us/library/ms188385.aspx
BOL RANK()http://msdn.microsoft.com/en-us/library/ms176102.aspx
BOL DENSE_RANK()http://msdn.microsoft.com/en-us/library/ms173825.aspx
BOL NTILE()http://msdn.microsoft.com/en-us/library/ms175126.aspx

转载于:https://www.cnblogs.com/heitou/archive/2010/04/22/1718184.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值