R语言著名开发者 Hadley Wickham 的 Advanced R 的学习笔记1
原书地址 https://adv-r.hadley.nz/index.html
Data structure
Vectors
Vectors come in two flavours: atomic vectors and lists.
All elements of an atomic vector must be the same type, whereas the elements of a list can have different types.
Three common properties:
- Type, typeof(), what it is.
- Length, length(), how many elements it contains.
- Attributes, attributes(), additional arbitrary metadata.
Atomic vectors
Four common types: logical, intege(numeric), character.
Test types: is.double( ), is.atomic( ), is.numeric( ) , is.integer( ) ...
Coercion: Combing a character and an integer yield a character.
Lists
c( ) will combine several lists into one, while list( ) will not.
> c(list(1, 2), c(3:5))
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
> b = list(list(1, 2), c(3:5))
> b
[[1]]
[[1]][[1]]
[1] 1
[[1]][[2]]
[1] 2
[[2]]
[1] 3 4 5
> str(b)
List of 2
$ :List of 2
..$ : num 1
..$ : num 2
$ : int [1:3] 3 4 5
Attributes
For atomic vectors:
Attrubutes are used to store metadata about the object. Attributes can be thought of as a named list (with unique names). Attributes can be accessed individually with attr() or all at once (as a list) with attributes().
By default, most attributes are lost when modifying a vector.
The only attributes not lost are the three most important:
- Names, a character vector giving each element a name, described in names.
- Dimensions, used to turn vectors into matrices and arrays, described in matrices and arrays.
- Class, used to implement the S3 object system, described in S3.
You can name a vector in three ways:
- When creating it: x <- c(a = 1, b = 2, c = 3).
- By modifying an existing vector in place: x <- 1:3; names(x) <- c("a", "b", "c").
Or: x <- 1:3; names(x)[[1]] <- c("a").
- By creating a modified copy of a vector: x <- setNames(1:3, c("a", "b", "c")).
You can create a new vector without names using unname(x), or remove names in place with names(x) <- NULL
Factors
Factors are build on top of integer vectors using two attributes: the class, the levels.
> x <- factor(c("a", "b", "b", "a"))
> typeof(x)
[1] "integer"
> class(x)
[1] "factor"
Though factors are a type of integer vectors, functions like is.integer( ) , is.numeric( ), etc, seem to test
specific type (class ?).
> is.integer(x)
[1] FALSE
> is.atomic(x)
[1] TRUE
factor = integer + 2 attributes.
Factor is a type of integer but is no longer integer. Factors are still atomic vectors.
> f1 <- factor(c("a","b","c","d"))
> f1
[1] a b c d
Levels: a b c d
> f1 %>% as.integer
[1] 1 2 3 4
# When we only change levels, the integers behind factors remain unchanged!
> levels(f1) %<>% rev()
> f1
[1] d c b a
Levels: d c b a
> f1 %>% as.integer()
[1] 1 2 3 4
Many data loading functions will convert char to factors automatically, so use argument :
stringAsFactor = FALSE.
Matrices and arrays
atomic vector + dim atrribute = multi-dimensional array.
2 dims array ----> metrices.
|
Matrices
|
Arrays
|
length()
|
nrow(), ncol()
|
dim()
|
names()
|
rownames(),
colnames()
|
dimnames()
|
c()
|
cbind(), rbind()
|
abind::abind
|
|
t()
|
sperm()
|
|
is.matrix()
as.matrix()
|
is.array()
as.array()
|
Data frames
a list of equal-length vectors.
typeof( data frame ) is "list".
class( data frame) is "data.frame".
Creating a data frame:
df <- data.frame( ## This function turns string as factors defaultly.
x = 1:3,
y = c("a", "b", "c"),
stringsAsFactors = FALSE)
When combing column-wise, the number of rows must match, but row names are ignored.
When combing row-wise, the number and names of columns must match.
plyr::rbind.fill() to combine data frames that don’t have the same columns.
I( ) treat the list( or matrix etc.) as one unit not several columns
> df1 = data.frame("x" = 1:3,
+ "y" = list(4:6,7:9,10:12))
> df1
x y.4.6 y.7.9 y.10.12
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12
> df1 = data.frame("x" = 1:3,
+ "y" = I(list(4:6,7:9,10:12)))
> df1
x y
1 1 4, 5, 6
2 2 7, 8, 9
3 3 10, 11, 12