在R中, 我们要计算一组数据的秩, 可以使用rank函数.-CSDN博客

rank(x, na.last = TRUE,
          ties.method = c("average", "first", "random", "max", "min"))

测试 :　

> x=array(rpois(35,lambda=10), dim=c(5,7))
> x
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]   14    9   10   16   10    9    8
[2,]    3   10   12    7   12   14   15
[3,]   10    8    8   13   11    5   16
[4,]    8    9   17   14    7    4    9
[5,]    7    7   10   13    9    6    8

排好顺序更容易看出rank是如何计算的

> sort(x)
 [1]  3  4  5  6  7  7  7  7  8  8  8  8  8  9  9  9  9  9 10 10 10 10 10 11 12
[26] 12 13 13 14 14 14 15 16 16 17

> rank(sort(x))
 [1]  1.0  2.0  3.0  4.0  6.5  6.5  6.5  6.5 11.0 11.0 11.0 11.0 11.0 16.0 16.0
[16] 16.0 16.0 16.0 21.0 21.0 21.0 21.0 21.0 24.0 25.5 25.5 27.5 27.5 30.0 30.0
[31] 30.0 32.0 33.5 33.5 35.0

默认是平均值方法.

> rank(sort(x), ties.method = "average")
 [1]  1.0  2.0  3.0  4.0  6.5  6.5  6.5  6.5 11.0 11.0 11.0 11.0 11.0 16.0 16.0
[16] 16.0 16.0 16.0 21.0 21.0 21.0 21.0 21.0 24.0 25.5 25.5 27.5 27.5 30.0 30.0
[31] 30.0 32.0 33.5 33.5 35.0

总共有35个值, 所以rank输出1到35的值, 但是为什么 7 7 7 7对应的是6.5呢?

我们看4个7实际上位置对应的是5,6,7,8 , 取平均值就是6.5.

那么接下来的5个8为什么得到的rank是11呢, 因为5个8的位置是9,10,11,12,13, 平均值是11.

这就是ties.method=average的算法.

接下来我们看看其他的.

min对应的是位置的最小值, 例如7777的位置是5,6,7,8, 取最小值5

88888的位置是 9,10,11,12,13 , 取最小值9

> rank(sort(x), ties.method = "min")
 [1]  1  2  3  4  5  5  5  5  9  9  9  9  9 14 14 14 14 14 19 19 19 19 19 24 25
[26] 25 27 27 29 29 29 32 33 33 35

max对应的是位置的最大值, 例如7777的位置是5,6,7,8, 取最大值8

88888的位置是 9,10,11,12,13 , 取最大值13

> rank(sort(x), ties.method = "max")
 [1]  1  2  3  4  8  8  8  8 13 13 13 13 13 18 18 18 18 18 23 23 23 23 23 24 26
[26] 26 28 28 31 31 31 32 34 34 35

first取的是就位置值, 所以输出的其实是连续的值.

> rank(sort(x), ties.method = "first")
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35

random取的是组内的随机值. 例如 7 8 5 6, 6 7 8 5

> rank(sort(x), ties.method = "random")
 [1]  1  2  3  4  7  8  5  6 12 11  9 10 13 18 16 15 14 17 22 23 21 19 20 24 25
[26] 26 28 27 29 31 30 32 34 33 35
> rank(sort(x), ties.method = "random")
 [1]  1  2  3  4  6  7  8  5 12 10 13  9 11 14 15 17 16 18 21 19 23 22 20 24 25
[26] 26 27 28 31 29 30 32 34 33 35

[参考]

help(rank)

rank                   package:base                    R Documentation

Sample Ranks

Description:

     Returns the sample ranks of the values in a vector.  Ties (i.e.,
     equal values) and missing values can be handled in several ways.

Usage:

     rank(x, na.last = TRUE,
          ties.method = c("average", "first", "random", "max", "min"))
     
Arguments:

       x: a numeric, complex, character or logical vector.

 na.last: for controlling the treatment of ‘NA’s.  If ‘TRUE’, missing
          values in the data are put last; if ‘FALSE’, they are put
          first; if ‘NA’, they are removed; if ‘"keep"’ they are kept
          with rank ‘NA’.

ties.method: a character string specifying how ties are treated, see
          ‘Details’; can be abbreviated.

Details:

     If all components are different (and no ‘NA’s), the ranks are well
     defined, with values in ‘seq_len(x)’.  With some values equal
     (called ‘ties’), the argument ‘ties.method’ determines the result
     at the corresponding indices.  The ‘"first"’ method results in a
     permutation with increasing values at each index set of ties.  The
     ‘"random"’ method puts these in random order whereas the default,
     ‘"average"’, replaces them by their mean, and ‘"max"’ and ‘"min"’
     replaces them by their maximum and minimum respectively, the
     latter being the typical sports ranking.

     ‘NA’ values are never considered to be equal: for ‘na.last = TRUE’
     and ‘na.last = FALSE’ they are given distinct ranks in the order
     in which they occur in ‘x’.

     *NB*: ‘rank’ is not itself generic but ‘xtfrm’ is, and
     ‘rank(xtfrm(x), ....)’ will have the desired result if there is a
     ‘xtfrm’ method.  Otherwise, ‘rank’ will make use of ‘==’, ‘>’,
     ‘is.na’ and extraction methods for classed objects, possibly
     rather slowly.

Value:

     A numeric vector of the same length as ‘x’ with names copied from
     ‘x’ (unless ‘na.last = NA’, when missing values are removed).  The
     vector is of integer type unless ‘x’ is a long vector or
     ‘ties.method = "average"’ when it is of double type (whether or
     not there are any ties).

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

See Also:

     ‘order’ and ‘sort’.

Examples:

     (r1 <- rank(x1 <- c(3, 1, 4, 15, 92)))
     x2 <- c(3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5)
     names(x2) <- letters[1:11]
     (r2 <- rank(x2)) # ties are averaged
     
     ## rank() is "idempotent": rank(rank(x)) == rank(x) :
     stopifnot(rank(r1) == r1, rank(r2) == r2)
     
     ## ranks without averaging
     rank(x2, ties.method= "first")  # first occurrence wins
     rank(x2, ties.method= "random") # ties broken at random
     rank(x2, ties.method= "random") # and again
     
     ## keep ties ties, no average
     (rma <- rank(x2, ties.method= "max"))  # as used classically
     (rmi <- rank(x2, ties.method= "min"))  # as in Sports
     stopifnot(rma + rmi == round(r2 + r2))