1. Vector/Matrix/Array
1.1. What are they
- Collection of observations
– Vector – 1 dimensional
– Matrix – 2 dimensional
– Array – 3 dimensional - Class in vector/matrix/array
– Only one class per object
– Combined – class determined: factor/logical < integer < numeric < character (当出现一个vector/matrix/array 里面有多种class的数据时,决定顺序如上。比如一个character + numeric最后的class是character)
example:
1.2. Vector
1.2.1 Generate Vector
> vec1 <- c(1, 2, 3)
> vec2 <- 1:11
> vec3 <- rep(x = 4, 7)
> vec3
[1] 4 4 4 4 4 4 4
> vec4 <- seq(from = 1, to = 12, by = 1.333)
> vec4
[1] 1.000 2.333 3.666 4.999 6.332 7.665 8.998 10.331 11.664
> vec5 <- seq(1, 12, lenght.out = 10) # 1到12之间,等分取10个数
## Warning: In seq.default(1, 12, lenght.out = 10) :
## extra argument 'lenght.out' will be disregarded
> vec5 #即1, 1+(12-1)/(10-1), 1+2*[(12-1)/(10-1)], ..., 1+(10-1)*[(12-1)/(10-1)]
[1] 1.000000 2.222222 3.444444 4.666667 5.888889 7.111111 8.333333
[8] 9.555556 10.777778 12.000000
1.2.2 Index Vector
- Indexing by
[]
- 负号
-
在R的索引里意味着去除该元素 - 区别于其他语言, R 的索引是从1开始的
Example
> vec4
[1] 1.000 2.333 3.666 4.999 6.332 7.665 8.998 10.331 11.664
vec4[3]
## [1] 3.666
vec4[1:3]
## [1] 1.000 2.333 3.666
vec4[23:24]
## [1] NA NA
vec4[c(1, 3, 7)]
## [1] 1.000 3.666 8.998
vec4[c(1, 1, 1, 2)]
## [1] 1.000 1.000 1.000 2.333
vec4[-4] #access everything but the 4th element
## [1] 1.000 2.333 3.666 6.332 7.665 8.998 10.331 11.664
vec4[-1:3] # this will cause error
## Error in vec4[-1, 3] : 量度数目不对
# It’s related to the way -1:3 is interpreted.
-1:3
## [1] -1 0 1 2 3
vec4[-(1:3)] #brackets will help negate all elements of 1:3
## [1] 4.999 6.332 7.665 8.998 10.331 11.664
vec4[-c(4, 5, 7)]
## [1] 1.000 2.333 3.666 7.665 10.331 11.664
vec4[9:1] #listing the vector in reverse
## [1] 11.664 10.331 8.998 7.665 6.332 4.999 3.666 2.333 1.000
1.3. Matrix
1.3.1. Generate Matrix
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
参数:
参数 | description |
---|---|
data | an optional data vector (including a list or expression vector). Non-atomic classed R objects are coerced by as.vector and all attributes discarded. |
nrow | the desired number of rows. |
ncol | the desired number of columns. |
byrow | logical. If FALSE (the default) the matrix is filled by columns, otherwise the matrix is filled by rows. 即 byrow=T 是横着往里填数据,byrow=F 是竖着往里填数据 |
dimnames | A dimnames attribute for the matrix: NULL or a list of length 2 giving the row and column names respectively. An empty list is treated as NULL, and a list of length one as row names. The list can be named, and the list names will be used as names for the dimensions. |
> MT <- matrix(c(1,2,1,2),nrow=2,ncol=2,byrow=T)
> MT
[,1] [,2]
[1,] 1 2
[2,] 1 2
# another example
> MT2<-matrix(c(1,2,1,2),nrow=3,ncol=5,byrow=F)
Warning message:
In matrix(c(1, 2, 1, 2), nrow = 3, ncol = 5, byrow = F):数据长度[4]不是矩阵行数[3]的整倍
> MT2
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 1 2 1
[2,] 2 1 2 1 2
[3,] 1 2 1 2 1
1.3.2. Index Matrix
用法:matrix[row,column]
> MT[1,1] #即取第一行第一列的元素。
[1] 1
1.4. Array
1.4.1. Generate Array
array(data = NA, dim = length(data), dimnames = NULL)
参数 | description |
---|---|
data | a vector (including a list or expression vector) giving data to fill the array. Non-atomic classed objects are coerced by as.vector. |
dim | the dim attribute for the array to be created, that is an integer vector of length one or more giving the maximal indices in each dimension. |
dimnames | either NULL or the names for the dimensions. This must a list (or it will be ignored) with one component for each dimension, either NULL or a character vector of the length given by dim for that dimension. The list can be named, and the list names will be used as names for the dimensions. If the list is shorter than the number of dimensions, it is extended by NULLs to the length required. |
> ARR <- array(1:4, c(1,2,2)) #参数 c(1,2,2) 意思是1个row, 2个col, 2个layer。填入的数据是1:4。
> ARR
,,1
[,1] [,2]
[1,] 1 2
,,2
[,1] [,2]
[1,] 3 4
1.4.2. Index Array
Array[row,column,layer]
> ARR[1,2,2]
[1] 4
> ARR[1,1,2]
[1] 3
> ARR[1,1,1]
[1] 1
> ARR[1,2,1]
[1] 2
1.5. Arithmetic for matrix and arrays
Operations | Function |
---|---|
Number of rows/colums | nrow(x)/ncol(x) |
Length of all elements | length(x) |
Names of rows and columns | names() |
Dimension ; 对于array返回(row,col,layer) | dim() |
Transpose | t(x) |
Matrix multiplication | x %*% y |
Cross product | crossprod(x,y) |
Diagonal elements | diag(x) |
NB:
(1)t(x)
这里有一篇文章讲的很清楚:http://blog.sciencenet.cn/blog-508298-551299.html
(2) x %*% y
矩阵乘法;crossprod(x,y)
俩矩阵的向量积
2. list
2.1. Description:
- Collection of variables
– Different classes/structures are possible (可以将不同结构的数据放在一个list中,甚至list里面嵌套list。如一个list里一层是dataframe,一层是array,一层是plot/list)
– Different dimensions is possible
2.2. Generate & Index List
Created with list()
Indexed with [[ ]]
# create
my_character <-c(1, 1, 0, 0)
my_logical <- TRUE
> my_list <- list(my_character, my_logical)
> my_list
[[1]]
[1] 1 1 0 0
[[2]]
[1] TRUE
# index
> my_list[[1]]
[1] “1” “1” “0” “0”
> my_list[[1]][3]
[1] “0
3. Dataframe
3.1. Description:
- Variables in data frame can have different classes
- ncol(x)/nrow(x) returns number of columns/rows
- Columns and rows are named - colnames/rownames
3.2. Generate Dataframe
生成:
data.frame(..., row.names = NULL, check.rows = FALSE, check.names = TRUE, fix.empty.names = TRUE, stringsAsFactors = default.stringsAsFactors())
参数表:
参数 | description |
---|---|
... | these arguments are of either the form value or tag = value. Component names are created based on the tag (if present) or the deparsed argument itself. |
row.names | NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame. |
check.rows | if TRUE then the rows are checked for consistency of length and names. |
check.names | logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names and are not duplicated. If necessary they are adjusted (by make.names) so that they are. |
fix.empty.names | logical indicating if arguments which are “unnamed” (in the sense of not being formally called as someName = arg) get an automatically constructed name or rather name “”. Needs to be set to FALSE even when check.names is false if “” names should be kept. |
stringsAsFactors | logical: should character vectors be converted to factors? The ‘factory-fresh’ default has been TRUE previously but has been changed to FALSE for R 4.0.0. Only as short time workaround, you can revert by setting options(stringsAsFactors = TRUE) which now warns about its deprecation. |
Example:
> data <- data.frame(ID=rep(1:10, each=3),
TIME=c(0,6,12),
MDV=0)
> str(data)
'data.frame': 30 obs. of 3 variables:
$ ID : int 1 1 1 2 2 2 3 3 3 4 ...
$ TIME: num 0 6 12 0 6 12 0 6 12 0 ...
$ MDV : num 0 0 0 0 0 0 0 0 0 0 ...
3.3. Index Dataframe
用法:
dataframe[row,column]
dataframe$colname[row]
Example:
> data[3,2]
[1] 12
> data$TIME[3]
[1] 12
> data$TIME[data$TIME==12] <- 12.5 #把data$TIME里的12都换成12.5
> data$TIME
[1] 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5
[22] 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5
> data[ ,2]
[1] 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5
[22] 0.0 6.0 12.5 0.0 6.0 12.5 0.0 6.0 12.5
> data[3,]
ID TIME MDV
3 1 12.5 0
3.4. View data frames
Function | Description |
---|---|
View(data.set) | Open and look at data.set |
head(data.set) | Look at the first several lines of a data.set |
tail(data.set) | Look at the last several lines of a data.set |