笔记链接
学习笔记2—高级数据结构
R语言常见的数据结构:
- 数据框(data.frame)
- 矩阵(matrix)
- 列表(list)
- 数组(array)
2.1 数据框
直观上,数据框就像Excel电子表格有行有列,统计上,每列代表一个变量,每行代表一个观测。
在数据框中,每列实际是个向量,都有相同的长度。
数据框可以用来保存不同的数据类型,但每列每个元素必须是相同数据类型。
data.frame函数:创建数据框
> x <- 10:1
> y <- -4:5
> q <- c("Hockey", "Footall", "Baseball", "Curling", "Rugby", "Lacrosse", "Basketball", "Tennis", "Cricket","Soccer")
> theDF <- data.frame(First=x, Second=y, Sport=q)
> theDF
First Second Sport
1 10 -4 Hockey
2 9 -3 Footall
3 8 -2 Baseball
4 7 -1 Curling
5 6 0 Rugby
6 5 1 Lacrosse
7 4 2 Basketball
8 3 3 Tennis
9 2 4 Cricket
10 1 5 Soccer
nrow函数:得到数据框的行数
ncol函数:得到数据框的列数
dim函数:得到数据框的行数和列数
> nrow(theDF)
[1] 10
> ncol(theDF)
[1] 3
> dim(theDF)
[1] 10 3
names函数可以获取数据框的列名字,返回字符向量
> names(theDF)
[1] "First" "Second" "Sport"
> names(theDF)[2]
[1] "Second"
rownames函数可以获取和指定数据框的行名字
> rownames(theDF) <- c("One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten")
> rownames(theDF)
[1] "One" "Two" "Three" "Four" "Five" "Six"
[7] "Seven" "Eight" "Nine" "Ten"
head函数可以显示前几行
tail函数可以显示后几行
> head(theDF, n=3)
First Second Sport
One 10 -4 Hockey
Two 9 -3 Footall
Three 8 -2 Baseball
> tail(theDF)
First Second Sport
Five 6 0 Rugby
Six 5 1 Lacrosse
Seven 4 2 Basketball
Eight 3 3 Tennis
Nine 2 4 Cricket
Ten 1 5 Soccer
指定列指定行的方法:
$指定列名称
> theDF$Sport
[1] "Hockey" "Footall" "Baseball" "Curling"
[5] "Rugby" "Lacrosse" "Basketball" "Tennis"
[9] "Cricket" "Soccer"
[]方括号指定元素
> theDF[3,1]
[1] 8
可以用向量索引更多元素
> theDF[c(3, 5), 2:3]
Second Sport
Three -2 Baseball
Five 0 Rugby
只指定列来访问一整列
> theDF[, 2:3]
Second Sport
One -4 Hockey
Two -3 Footall
Three -2 Baseball
Four -1 Curling
Five 0 Rugby
Six 1 Lacrosse
Seven 2 Basketball
Eight 3 Tennis
Nine 4 Cricket
Ten 5 Soccer
还可以通过以列参数指定为字符向量:theDF[, c(“First”, “Sport”)]
以列名字作为第二个参数:theDF[, “Sport”]
等等
但有些输出的是向量(如theDF[, “Sport”]),有些输出的是数据框。指定第三个参数drop=FALSE可以保持输出数据框。
> theDF[,"Sport",drop=FALSE]
Sport
One Hockey
Two Footall
Three Baseball
Four Curling
Five Rugby
Six Lacrosse
Seven Basketball
Eight Tennis
Nine Cricket
Ten Soccer
2.2 列表
列表是一个容器,可以存储相同类型或者不同类型的数据。
> list(theDF, 1:10, c(1, 2, 3))
[[1]]
First Second Sport
One 10 -4 Hockey
Two 9 -3 Footall
Three 8 -2 Baseball
Four 7 -1 Curling
Five 6 0 Rugby
Six 5 1 Lacrosse
Seven 4 2 Basketball
Eight 3 3 Tennis
Nine 2 4 Cricket
Ten 1 5 Soccer
[[2]]
[1] 1 2 3 4 5 6 7 8 9 10
[[3]]
[1] 1 2 3
names函数可以查看或设置列表的名字。
> list1 <- list(theDF, 1:10, c(1, 2, 3))
> names(list1)
NULL
> names(list1) <- c("DF", "VEC1", "VEC2")
> names(list1)
[1] "DF" "VEC1" "VEC2"
或者在创建过程中指定名字
> list1 <- list(DF=theDF, VEC1=1:10, VEC2=c(1, 2, 3))
> names(list1)
[1] "DF" "VEC1" "VEC2"
可以用vector创建指定长度的空列表
> emptyList <- vector(mode="list", length=4)
> emptyList
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
使用索引可以对列表增加元素,前提该索引添加前不存在于列表中
> emptyList[[5]] <- 2
> length(emptyList)
[1] 5
2.3 矩阵
矩阵结构具有行和列,每列的数据类型相同,每个元素数据类型也相同,一般是数值型。
matrix函数:创建矩阵
创建5乘以2的矩阵:
> A <- matrix(1:10, nrow=5)
> B <- matrix(21:30, nrow=5)
> A
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> B
[,1] [,2]
[1,] 21 26
[2,] 22 27
[3,] 23 28
[4,] 24 29
[5,] 25 30
nrow、ncol、dim函数可以查看矩阵的行数列数。
t函数可以用来对矩阵进行转置
矩阵相乘:(这里将B进行转置后才可以相乘)
> A %*% t(B)
[,1] [,2] [,3] [,4] [,5]
[1,] 177 184 191 198 205
[2,] 224 233 242 251 260
[3,] 271 282 293 304 315
[4,] 318 331 344 357 370
[5,] 365 380 395 410 425
如果乘号没加**%的话表示对应相乘**
> A*B
[,1] [,2]
[1,] 21 156
[2,] 44 189
[3,] 69 224
[4,] 96 261
[5,] 125 300
> A * t(B)
Error in A * t(B) : 非整合陈列
colnames和rownames可以查看或修改矩阵行和列的名字
> colnames(A) <- c("Left", "Right")
> rownames(A) <- c("1st", "2nd", "3rd", "4th", "5th")
> A
Left Right
1st 1 6
2nd 2 7
3rd 3 8
4th 4 9
5th 5 10
转置会置换行和列的名字
> t(A)
1st 2nd 3rd 4th 5th
Left 1 2 3 4 5
Right 6 7 8 9 10
在进行矩阵乘法时,矩阵乘积的行名字取自左边矩阵的行名字,矩阵乘积的列名字取自右边矩阵的列名字。
2.4 数组
数组是一个多维向量,所有元素必须相同类型。
array函数:创建数组
用方括号访问单个元素,方括号里第一个参数是行索引,第二个参数是列索引,剩下的参数是维度。
创建一个两行三列二维数组:
> theArray <- array(1:12, dim=c(2, 3, 2))
> theArray
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
查看数组元素的一些例子:
> theArray[1, , ]
[,1] [,2]
[1,] 1 7
[2,] 3 9
[3,] 5 11
> theArray[ , ,1 ]
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> theArray[ 1, ,1 ]
[1] 1 3 5