在R中, 我们要计算一组数据的秩, 可以使用rank函数.


rank(x, na.last = TRUE,
          ties.method = c("average", "first", "random", "max", "min"))

测试 : 
> x=array(rpois(35,lambda=10), dim=c(5,7))
> x
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]   14    9   10   16   10    9    8
[2,]    3   10   12    7   12   14   15
[3,]   10    8    8   13   11    5   16
[4,]    8    9   17   14    7    4    9
[5,]    7    7   10   13    9    6    8

排好顺序更容易看出rank是如何计算的
> sort(x)
 [1]  3  4  5  6  7  7  7  7  8  8  8  8  8  9  9  9  9  9 10 10 10 10 10 11 12
[26] 12 13 13 14 14 14 15 16 16 17

> rank(sort(x))
 [1]  1.0  2.0  3.0  4.0  6.5  6.5  6.5  6.5 11.0 11.0 11.0 11.0 11.0 16.0 16.0
[16] 16.0 16.0 16.0 21.0 21.0 21.0 21.0 21.0 24.0 25.5 25.5 27.5 27.5 30.0 30.0
[31] 30.0 32.0 33.5 33.5 35.0

默认是平均值方法.
> rank(sort(x), ties.method = "average")
 [1]  1.0  2.0  3.0  4.0  6.5  6.5  6.5  6.5 11.0 11.0 11.0 11.0 11.0 16.0 16.0
[16] 16.0 16.0 16.0 21.0 21.0 21.0 21.0 21.0 24.0 25.5 25.5 27.5 27.5 30.0 30.0
[31] 30.0 32.0 33.5 33.5 35.0

总共有35个值, 所以rank输出1到35的值, 但是为什么 7 7 7 7对应的是6.5呢?
我们看4个7实际上位置对应的是5,6,7,8 , 取平均值就是6.5.
那么接下来的5个8为什么得到的rank是11呢, 因为5个8的位置是9,10,11,12,13, 平均值是11.
这就是ties.method=average的算法.

接下来我们看看其他的.
min对应的是位置的最小值, 例如7777的位置是5,6,7,8, 取最小值5
88888的位置是 9,10,11,12,13 , 取最小值9

> rank(sort(x), ties.method = "min")
 [1]  1  2  3  4  5  5  5  5  9  9  9  9  9 14 14 14 14 14 19 19 19 19 19 24 25
[26] 25 27 27 29 29 29 32 33 33 35

max对应的是位置的最大值, 例如7777的位置是5,6,7,8, 取最大值8
88888的位置是 9,10,11,12,13 , 取最大值13

> rank(sort(x), ties.method = "max")
 [1]  1  2  3  4  8  8  8  8 13 13 13 13 13 18 18 18 18 18 23 23 23 23 23 24 26
[26] 26 28 28 31 31 31 32 34 34 35

first取的是就位置值, 所以输出的其实是连续的值.
> rank(sort(x), ties.method = "first")
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35

random取的是组内的随机值.  例如  7  8  5  6,   6  7  8  5
> rank(sort(x), ties.method = "random")
 [1]  1  2  3  4  7  8  5  6 12 11  9 10 13 18 16 15 14 17 22 23 21 19 20 24 25
[26] 26 28 27 29 31 30 32 34 33 35
> rank(sort(x), ties.method = "random")
 [1]  1  2  3  4  6  7  8  5 12 10 13  9 11 14 15 17 16 18 21 19 23 22 20 24 25
[26] 26 27 28 31 29 30 32 34 33 35



[参考]
1. 
help(rank)

rank                   package:base                    R Documentation

Sample Ranks

Description:

     Returns the sample ranks of the values in a vector.  Ties (i.e.,
     equal values) and missing values can be handled in several ways.

Usage:

     rank(x, na.last = TRUE,
          ties.method = c("average", "first", "random", "max", "min"))
     
Arguments:

       x: a numeric, complex, character or logical vector.

 na.last: for controlling the treatment of ‘NA’s.  If ‘TRUE’, missing
          values in the data are put last; if ‘FALSE’, they are put
          first; if ‘NA’, they are removed; if ‘"keep"’ they are kept
          with rank ‘NA’.

ties.method: a character string specifying how ties are treated, see
          ‘Details’; can be abbreviated.

Details:

     If all components are different (and no ‘NA’s), the ranks are well
     defined, with values in ‘seq_len(x)’.  With some values equal
     (called ‘ties’), the argument ‘ties.method’ determines the result
     at the corresponding indices.  The ‘"first"’ method results in a
     permutation with increasing values at each index set of ties.  The
     ‘"random"’ method puts these in random order whereas the default,
     ‘"average"’, replaces them by their mean, and ‘"max"’ and ‘"min"’
     replaces them by their maximum and minimum respectively, the
     latter being the typical sports ranking.

     ‘NA’ values are never considered to be equal: for ‘na.last = TRUE’
     and ‘na.last = FALSE’ they are given distinct ranks in the order
     in which they occur in ‘x’.

     *NB*: ‘rank’ is not itself generic but ‘xtfrm’ is, and
     ‘rank(xtfrm(x), ....)’ will have the desired result if there is a
     ‘xtfrm’ method.  Otherwise, ‘rank’ will make use of ‘==’, ‘>’,
     ‘is.na’ and extraction methods for classed objects, possibly
     rather slowly.

Value:

     A numeric vector of the same length as ‘x’ with names copied from
     ‘x’ (unless ‘na.last = NA’, when missing values are removed).  The
     vector is of integer type unless ‘x’ is a long vector or
     ‘ties.method = "average"’ when it is of double type (whether or
     not there are any ties).

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

See Also:

     ‘order’ and ‘sort’.

Examples:

     (r1 <- rank(x1 <- c(3, 1, 4, 15, 92)))
     x2 <- c(3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5)
     names(x2) <- letters[1:11]
     (r2 <- rank(x2)) # ties are averaged
     
     ## rank() is "idempotent": rank(rank(x)) == rank(x) :
     stopifnot(rank(r1) == r1, rank(r2) == r2)
     
     ## ranks without averaging
     rank(x2, ties.method= "first")  # first occurrence wins
     rank(x2, ties.method= "random") # ties broken at random
     rank(x2, ties.method= "random") # and again
     
     ## keep ties ties, no average
     (rma <- rank(x2, ties.method= "max"))  # as used classically
     (rmi <- rank(x2, ties.method= "min"))  # as in Sports
     stopifnot(rma + rmi == round(r2 + r2))

  • 7
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值