Ordering Permutation
Description
order
returns a permutation which rearranges its first argument into ascending or descending order, breaking ties by further arguments. sort.list
is the same, using only one argument.
See the examples for how to use these functions to sort data frames, etc.
order函数返回的是一个按照升序或者降序对第一个参数(也就是数据)进行重排之后的顺序值,这个顺序可以被后面的参数修改。sort.list函数也起到了相同的结果,但是只需要一个参数就搞定了。
Usage
order(..., na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "radix")) sort.list(x, partial = NULL, na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "quick", "radix"))
Arguments
... | a sequence of numeric, complex, character or logical vectors, all of the same length, or a classed R object. |
x | an atomic vector. |
partial | vector of indices for partial sorting. (Non- |
decreasing | logical. Should the sort order be increasing or decreasing? For the |
na.last | for controlling the treatment of |
method | the method to be used: partial matches are allowed. The default ( |
各个参数的解释:
...... 数值型、逻辑性、复数型和字符型的向量,他们必须有相同的长度,或者是标准的R语言对象。
partial 用于部分排序的一串索引向量。(NULL不会被执行)。
decreasing 一个逻辑值。表明排序是否按照升序或者降序。对于“radix”(基数)方式,这也可以是一个长度等于参数数量的向量,而对于其他的工具,这个长度只能是1.
na.last 这个参数用来控制NA的表达。如果为TREU,缺省值将会被放置到数据的最后面,如果为FALSE,缺省值将会被放到最前面。如果为NA,那么缺省值将会被移除。(详情见于“NOTE”)
method 指的是匹配的方式。允许局部匹配。默认值("auto")将对较短的浮点型、整数型、逻辑性向量以及因子应用“radix”(基数)的匹配方式。否则,就使用“shell”的匹配方式。详情见于sort的帮助文件。
Details
In the case of ties in the first vector, values in the second are used to break the ties. If the values are still tied, values in the later arguments are used to break the tie (see the first example). The sort used is stable (except for method = "quick"
), so any unresolved ties will be left in their original ordering.
Complex values are sorted first by the real part, then the imaginary part.
Except for method "radix"
, the sort order for character vectors will depend on the collating sequence of the locale in use: see Comparison
.
The "shell"
method is generally the safest bet and is the default method, except for short factors, numeric vectors, integer vectors and logical vectors, where "radix"
is assumed. Method "radix"
stably sorts logical, numeric and character vectors in linear time. It outperforms the other methods, although there are caveats (see sort
). Method "quick"
for sort.list
is only supported for numeric x
with na.last = NA
, is not stable, and is slower than "radix"
.
partial = NULL
is supported for compatibility with other implementations of S, but no other values are accepted and ordering is always complete.
For a classed R object, the sort order is taken from xtfrm
: as its help page notes, this can be slow unless a suitable method has been defined or is.numeric(x)
is true. For factors, this sorts on the internal codes, which is particularly appropriate for ordered factors.
具体
如果对第一个参数(也就是数据)不能正确排序(比如说有两个1),那么第二串向量将会帮助第一串向量排序(例如c(2,1)可以帮助第一串的两个1排出顺序),如果还是不能将由后面的向量排序。(每串向量的长度需要相等)。除了是“quick”的方式,sort函数的使用是比较笨拙的,所有没有成功排序的顺序都将保留它们原始的顺序。
复数按照先实部后虚部的顺序来排。
除了在radix(基数)方式下,字符型向量的排序依赖于各地方语言的不同进行排序。(详情见于Comparison)
一般来说,按照"shell"的方式排序是最保险的,而且也是默认的方式。(除了上面说的短的浮点型、整数型、逻辑型向量和因子,这些值的默认排序方式是radix)。radix排序方式在线性时间内对于短的浮点型、整数型、逻辑型向量和因子的排序是稳定的,它比其他排序方式都要好,尽管时而有报错信息。在sort.list函数中,quick方式(应该是快速排序?)只支持一个数值型向量而且要求na.list=NA,而且比radix更慢也更不稳定。
设置partial=NULL可以兼容S语言的语法,但是不支持其他的值,而且其排序总是完全排序的。
对于一个标准的R语言的对象,排序的内部函数是xtfrm:正如其帮助文件所说的,除非定义的排序方式比较稳定或者被排序的是一个数值型向量,否则这个函数的计算总是比较慢。对于因子型,这个内置的函数代码就特别适合。
Value
An integer vector unless any of the inputs has 2^31 or more elements, when it is a double vector.
除非输入的是一个长度超过2^31个的向量,否则输出的值都是一个浮点型的数值。
Note
sort.list
can get called by mistake as a method for sort
with a list argument: it gives a suitable error message for list x
.
There is a historical difference in behaviour for na.last = NA
: sort.list
removes the NA
s and then computes the order amongst the remaining elements: order
computes the order amongst the non-NA
elements of the original vector. Thus
x[order(x, na.last = NA)] zz <- x[!is.na(x)]; zz[sort.list(x, na.last = NA)]
both sort the non-NA
values of x
.
注意:同样的na.list=NA,order和sort.list返回的结果不一样,sort.list计算去除了NA值之后的新的向量相对应的顺序,而order计算去除了NA但任在原有的向量中计算顺序。
Prior to R 3.3.0 method = "radix"
was only supported for integers of range less than 100,000.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Knuth, D. E. (1998) The Art of Computer Programming, Volume 3: Sorting and Searching. 2nd ed. Addison-Wesley.
require(stats)
(ii <- order(x <- c(1,1,3:1,1:4,3), y <- c(9,9:1), z <- c(2,1:9)))
## 6 5 2 1 7 4 10 8 3 9
rbind(x, y, z)[,ii] # shows the reordering (ties via 2nd & 3rd arg)
## Suppose we wanted descending order on y.
## A simple solution for numeric 'y' is
rbind(x, y, z)[, order(x, -y, z)]
## More generally we can make use of xtfrm
cy <- as.character(y)
rbind(x, y, z)[, order(x, -xtfrm(cy), z)]
## The radix sort supports multiple 'decreasing' values:
rbind(x, y, z)[, order(x, cy, z, decreasing = c(FALSE, TRUE, FALSE),
method="radix")]
## Sorting data frames:
dd <- transform(data.frame(x, y, z),
z = factor(z, labels = LETTERS[9:1]))
## Either as above {for factor 'z' : using internal coding}:
dd[ order(x, -y, z), ]
## or along 1st column, ties along 2nd, ... *arbitrary* no.{columns}:
dd[ do.call(order, dd), ]
set.seed(1) # reproducible example:
d4 <- data.frame(x = round( rnorm(100)), y = round(10*runif(100)),
z = round( 8*rnorm(100)), u = round(50*runif(100)))
(d4s <- d4[ do.call(order, d4), ])
(i <- which(diff(d4s[, 3]) == 0))
# in 2 places, needed 3 cols to break ties:
d4s[ rbind(i, i+1), ]
## rearrange matched vectors so that the first is in ascending order
x <- c(5:1, 6:8, 12:9)
y <- (x - 5)^2
o <- order(x)
rbind(x[o], y[o])
## tests of na.last
a <- c(4, 3, 2, NA, 1)
b <- c(4, NA, 2, 7, 1)
z <- cbind(a, b)
(o <- order(a, b)); z[o, ]
(o <- order(a, b, na.last = FALSE)); z[o, ]
(o <- order(a, b, na.last = NA)); z[o, ]
## speed examples on an average laptop for long vectors:
## factor/small-valued integers:
x <- factor(sample(letters, 1e7, replace = TRUE))
system.time(o <- sort.list(x, method = "quick", na.last = NA)) # 0.1 sec
stopifnot(!is.unsorted(x[o]))
system.time(o <- sort.list(x, method = "radix")) # 0.05 sec, 2X faster
stopifnot(!is.unsorted(x[o]))
## large-valued integers:
xx <- sample(1:200000, 1e7, replace = TRUE)
system.time(o <- sort.list(xx, method = "quick", na.last = NA)) # 0.3 sec
system.time(o <- sort.list(xx, method = "radix")) # 0.2 sec
## character vectors:
xx <- sample(state.name, 1e6, replace = TRUE)
system.time(o <- sort.list(xx, method = "shell")) # 2 sec
system.time(o <- sort.list(xx, method = "radix")) # 0.007 sec, 300X faster
## double vectors:
xx <- rnorm(1e6)
system.time(o <- sort.list(xx, method = "shell")) # 0.4 sec
system.time(o <- sort.list(xx, method = "quick", na.last = NA)) # 0.1 sec
system.time(o <- sort.list(xx, method = "radix")) # 0.05 sec, 2X faster