r vector 4 elements_R数据结构-CSDN博客

大纲：

R之 data Structure

由于先学的python，R又和python很像，所以有时候会受到python的影响，没有搞清楚R的数据结构，现在打算单独关于这个的东西复习一下

来源: https://www.datamentor.io/r-programming/data-frame/

R vector

R的vector是包括相同类型的最基本的数据结构，数据类型可以是 逻辑型，整型， 双精度浮点数， 字符型， 复数型， raw型

用 typeof()来检查数据类型

用length()来检查向量长度

用c()来创建verctor

因为vector中的元素必须要是相同的数据类型，所以它会强制将元素转换成相同的数据类型，如果不同的话

> x <- c(1, 5, 4, 9, 0)
> typeof(x)
[1] "double"
> length(x)
[1] 5
> x <- c(1, 5.4, TRUE, "hello")
> x
[1] "1"     "5.4"   "TRUE"  "hello"
> typeof(x)
[1] "character

2.如果想创建连续的数字的向量，可以用:来创建

> x <- 1:7; x
[1] 1 2 3 4 5 6 7
> y <- 2:-2; y
[1]  2  1  0 -1 -2

3.用seq()函数来创建vector

> seq(1, 3, by=0.2)          # specify step size
[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
> seq(1, 5, length.out=4)    # specify length of the vector
[1] 1.000000 2.333333 3.666667 5.000000

索引vector元素

vector 中的元素可以用索引获取，索引可以是 逻辑型, 整型， 字符型向量

1.用整型向量索引，R的索引从1开始，而且可以用-来去掉指定的元素，单数不能把正负数混着用

> x
[1]  0  2  4  6  8 10
> x[3]           # access 3rd element
[1] 4
> x[c(2, 4)]     # access 2nd and 4th element
[1] 2 6
> x[-1]          # access all but 1st element
[1]  2  4  6  8 10
> x[c(2, -4)]    # cannot mix positive and negative integers
Error in x[c(2, -4)] : only 0's may be mixed with negative subscripts
> x[c(2.4, 3.54)]    # real numbers are truncated to integers
[1] 2 4

2.用逻辑型向量索引，将TRUE对应的位置取出来

> x[c(TRUE, FALSE, FALSE, TRUE)]
[1] -3  3
> x[x < 0]  # filtering vectors based on conditions
[1] -3 -1
> x[x > 0]
[1] 3

3.用字符型向量索引（这里的vector是命名向量，为各个元素命名了的）

> x <- c("first"=3, "second"=0, "third"=9)
> names(x)
[1] "first"  "second" "third" 
> x["second"]
second 
0 
> x[c("first", "third")]
first third 
3     9

修改元素

可以用上面的方法将其索引出来然后再修改他们

> x
[1] -3 -2 -1  0  1  2
> x[2] <- 0; x        # modify 2nd element
[1] -3  0 -1  0  1  2
> x[x<0] <- 5; x   # modify elements less than 0
[1] 5 0 5 0 1 2
> x <- x[1:4]; x      # truncate x to first 4 elements
[1] 5 0 5 0

如何删除vector

它要用NULL去赋值给相应的变量去删除它，很奇怪

> x
[1] -3 -2 -1  0  1  2
> x <- NULL
> x
NULL
> x[4]
NULL

R矩阵

矩阵是R语言里面的二维数据结构，与vector相似。但是另外又加上了维度属性。
对象所有的属性可以用attributes()来查看，维度可以直接用dim()来查看
我们可以用class()来检查一个变量是否是一个矩阵

> a
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> class(a)
[1] "matrix"
> attributes(a)
$dim
[1] 3 3
> dim(a)
[1] 3 3

如何创建一个矩阵

1.用matrix()来创建，维度的话可以传入合适的nrow和ncol来指定

为行和列同时指定大小不是必要的，如果一个维度一斤指定另外一个维度可以自动通过数据的长度来推断出来

> matrix(1:9, nrow = 3, ncol = 3)
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> # same result is obtained by providing only one dimension
> matrix(1:9, nrow = 3)
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

我们可以看出来它是按照列来填充矩阵的，可以通过指定byrow=RTRUE来让它按行填充

> matrix(1:9, nrow=3, byrow=TRUE)    # fill matrix row-wise
[,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

尽管在所有情况下matrix在内部都是以列为顺序来储存的

matirx可以为行列命名

> x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"), c("A","B","C")))
> x
A B C
X 1 4 7
Y 2 5 8
Z 3 6 9

用colnames(),和rownames()来存取

> colnames(x)
[1] "A" "B" "C"
> rownames(x)
[1] "X" "Y" "Z"
> # It is also possible to change names
> colnames(x) <- c("C1","C2","C3")
> rownames(x) <- c("R1","R2","R3")
> x
C1 C2 C3
R1  1  4  7
R2  2  5  8
R3  3  6  9

2.用rbind(),cbind()来创建

> cbind(c(1,2,3),c(4,5,6))
[,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> rbind(c(1,2,3),c(4,5,6))
[,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

3.也可以将vector变成矩阵，只需要给它加上一个dim就行了

> x <- c(1,2,3,4,5,6)
> x
[1] 1 2 3 4 5 6
> class(x)
[1] "numeric"
> dim(x) <- c(2,3)
> x
[,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
> class(x)
[1] "matrix"

如何索矩阵元素

可以用[,来索引矩阵的元素var[row,column],这里的row和cloumn是向量

1.用整数来索引

通过指定行列值来索引，为空则是索引全部

> x
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> x[c(1,2),c(2,3)]    # select rows 1 & 2 and columns 2 & 3
# 这里和python不同，如果python按照这样索引的话返回的将只是4，8
[,1] [,2]
[1,]    4    7
[2,]    5    8
> x[c(3,2),]    # leaving column field blank will select entire columns
[,1] [,2] [,3]
[1,]    3    6    9
[2,]    2    5    8
> x[,]    # leaving row as well as column field blank will select entire matrix
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> x[-1,]    # select all rows except first
[,1] [,2] [,3]
[1,]    2    5    8
[2,]    3    6    9

这里值得注意的是，如果只是返回一行，或一列，返回值将是vector

> x[1,]
[1] 1 4 7
> class(x[1,])
[1] "integer"

可以指定drop=FALSE来避免这种行为

> x[1,,drop=FALSE]  # now the result is a 1X3 matrix rather than a vector
[,1] [,2] [,3]
[1,]    1    4    7
> class(x[1,,drop=FALSE])
[1] "matrix"

也可以用单列的向量来索引当像这样索引的时候它是将列堆叠在一起，然后按照向量来索引的，返回值是一个向量

> x
[,1] [,2] [,3]
[1,]    4    8    3
[2,]    6    0    7
[3,]    1    2    9
> x[1:4]
[1] 4 6 1 8
> x[c(3,5,7)]
[1] 1 0 3

2.用逻辑值向量拉索引

两个逻辑向量可以用来索引一个矩阵，在这种情况下行和列指定为TRUE的位置将会被返回，如果向量与维度大小不同的情况下，值将会循环填充，而且可以和整数类型混合使用

> x
[,1] [,2] [,3]
[1,]    4    8    3
[2,]    6    0    7
[3,]    1    2    9
> x[c(TRUE,FALSE,TRUE),c(TRUE,TRUE,FALSE)]
[,1] [,2]
[1,]    4    8
[2,]    1    2
> x[c(TRUE,FALSE),c(2,3)]    # the 2 element logical vector is recycled to 3 element vector
[,1] [,2]
[1,]    8    3
[2,]    2    9

用一个逻辑向量来索引也是可以的

> x[c(TRUE, FALSE)]
[1] 4 1 0 3 9

在上面的例子中x被当作一个矩阵堆栈后的向量，和之前的情况一样

> x[x>5]    # select elements greater than 5
[1] 6 8 7 9
> x[x%%2 == 0]    # select even elements
[1] 4 6 8 0 2

3.用字符向量索引

当矩阵是命名矩阵的时候，是可以用字符向量来索引矩阵的，可以混合整数与逻辑索引

> x
A B C
[1,] 4 8 3
[2,] 6 0 7
[3,] 1 2 9
> x[,"A"]
[1] 4 6 1
> x[TRUE,c("A","C")]
A C
[1,] 4 3
[2,] 6 7
[3,] 1 9
> x[2:3,c("A","C")]
A C
[1,] 6 7
[2,] 1 9

如何修改矩阵

我们可以将索引和赋值语句一起用来修改它

> x
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> x[2,2] <- 10; x    # modify a single element
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2   10    8
[3,]    3    6    9
> x[x<5] <- 0; x    # modify elements less than 5
[,1] [,2] [,3]
[1,]    0    0    7
[2,]    0   10    8
[3,]    0    6    9

常用的矩阵方法

转置

> t(x)    # transpose a matrix
[,1] [,2] [,3]
[1,]    0    0    0
[2,]    0   10    6
[3,]    7    8    9

添加行列

> cbind(x, c(1, 2, 3))    # add column
[,1] [,2] [,3] [,4]
[1,]    0    0    7    1
[2,]    0   10    8    2
[3,]    0    6    9    3
> rbind(x,c(1,2,3))    # add row
[,1] [,2] [,3]
[1,]    0    0    7
[2,]    0   10    8
[3,]    0    6    9
[4,]    1    2    3
> x <- x[1:2,]; x    # remove last row
[,1] [,2] [,3]
[1,]    0    0    7
[2,]    0   10    8

重新指定维数

> x
[,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
> dim(x) <- c(3,2); x    # change to 3X2 matrix
[,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> dim(x) <- c(1,6); x    # change to 1X6 matrix
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    3    4    5    6

R list

列表是一个可以包含混合数据类型的数据结构

vector所有的元素都是相同的类型，这叫做原子向量（atomic vector）但是有不同数据类型的元素的叫做list

我们可以用typeod()来检查它是否是list，用length()查看它的大小

如何创建一个list

list可以用list()来创建

> x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3)

这里我们创建了一个list它包含三种不同的数据类型

它的结构可以用str()来查看

> str(x)
List of 3
$ a: num 2.5
$ b: logi TRUE
$ c: int [1:3] 1 2 3

在这个例子中,a,b,c被叫做标签，让它更容易索引它的元素

然而，标签是可选项，可以不加，在不加的情况下数字索引将会被默认指定

怎样索引list元素

list可以像向量一样索引，整型，逻辑型，字符型向量都可以作为索引

> x
$name
[1] "John"
$age
[1] 19
$speaks
[1] "English" "French" 
> x[c(1:2)]    # index using integer vector
$name
[1] "John"
$age
[1] 19
> x[-2]        # using negative integer to exclude second component
$name
[1] "John"
$speaks
[1] "English" "French" 
> x[c(T,F,F)]  # index using logical vector
$name
[1] "John"
> x[c("age","speaks")]    # index using character vector
$age
[1] 19
$speaks
[1] "English" "French"

向上面一样用[索引，将会给我们一个list的子集，而不是元素的内容，要检索内容那就需要用[[，但是这种方法只能一次取一个

> x["age"]
$age
[1] 19
> typeof(x["age"])    # single [ returns a list
[1] "list"
> x[["age"]]    # double [[ returns the content
[1] 19
> typeof(x[["age"]])
[1] "double"

一个和[[相同的，在索引一个list时经常会用到的方法是$

他们都是一样的，但是$可以部分索引（指定一部分标签）

> x$name    # same as x[["name"]]
[1] "John"
> x$a                  # partial matching, same as x$ag or x$age
[1] 19
> x[["a"]]             # cannot do partial match with [[
NULL
> # indexing can be done recursively
> x$speaks[1]
[1] "English"
> x[["speaks"]][2]
[1] "French"

怎么修改一个list

我们可以用上面的方法索引并重新赋值来修改它

重新赋值将会将其元素重新排序

> x[["name"]] <- "Clair"; x
$age
[1] 19
$speaks
[1] "English" "French" 
$name
[1] "Clair"

怎么添加新元素

我们只需要给新标签赋值就行了

> x[["married"]] <- FALSE
> x
$age
[1] 19
$speaks
[1] "English" "French" 
$name
[1] "Clair"
$married
[1] FALSE

如何删除list中的元素

用NULL赋值

> x[["age"]] <- NULL
> str(x)
List of 3
$ speaks : chr [1:2] "English" "French"
$ name   : chr "Clair"
$ married: logi FALSE
> x$married <- NULL
> str(x)
List of 2
$ speaks: chr [1:2] "English" "French"
$ name  : chr "Clair"

R Data frame

R data frame是一个二维数据结构它是list的一种特殊情况，list的每个元素都是等长的
他的每一个元素都是一个列，组成了一个Data frame

用class()来查看它是否是dataframe，这里可以看出来R矩阵和dataframe是用class查看是否是相应的类型的，因为他们是vector和list的特殊情况

> x
SN Age Name
1  1  21 John
2  2  15 Dora
> typeof(x)    # data frame is a special case of  list
[1] "list"
> class(x)
[1] "data.frame"

在这这个例子中，x可以认为是一个list有三个元素，每个元素是包含两个元素的向量

dataframe的常用函数

> names(x)
[1] "SN"   "Age"  "Name"
> ncol(x)
[1] 3
> nrow(x)
[1] 2
> length(x)    # returns length of the list, same as ncol()
[1] 3

上面给定的dataframe可以被这样创建

> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Dora"))
> str(x)    # structure of x
'data.frame':   2 obs. of  3 variables:
$ SN  : int  1 2
$ Age : num  21 15
$ Name: Factor w/ 2 levels "Dora","John": 2 1

注意上面的name的类型是factor而不是字符向量(vector)

默认的，dataframe会将字符向量转换成为factor（下面会提到）

用stringsAsFactors=FALSE来取消这种行为

> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John", "Dora"), stringsAsFactors = FALSE)
> str(x)    # now the third column is a character vector
'data.frame':   2 obs. of  3 variables:
$ SN  : int  1 2
$ Age : num  21 15
$ Name: chr  "John" "Dora"

许多数据读取函数read.table(),read.csv(),read.delim(),read.fwf()都会将数据读取为dataframe

如何索引一个dataframe

dataframe的元素可以像list和矩阵一样存取

1.像list一样存取

我们可以用[,[[或$来存取dataframe的列

> x["Name"]
Name
1 John
2 Dora
> x$Name
[1] "John" "Dora"
> x[["Name"]]
[1] "John" "Dora"
> x[[3]]
[1] "John" "Dora"

用[[和$索引是一样的，而用[索引将会返回一个dataframe，而不是像之前的两个返回一个向量

2.像matrix一样索引

dataframe可以像矩阵一样指定行列来索引

我们用了trees这个数据集来展示这一点，R中的数据集可以用library(help = ‘datasets’)来查看

trees数据集包含了黑莓树的grith，height和volume属性

dataframe可以用str()和head()来查看

> str(trees)
'data.frame':   31 obs. of 3 variables:
$ Girth : num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
$ Height: num  70 65 63 72 81 83 66 75 80 75 ...
$ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
> head(trees,n=3)
Girth Height Volume
1   8.3     70   10.3
2   8.6     65   10.3
3   8.8     63   10.2

它有31行3列

现在我们来像matrix来索引它

> trees[2:3,]    # select 2nd and 3rd row
Girth Height Volume
2   8.6     65   10.3
3   8.8     63   10.2
> trees[trees$Height > 82,]    # selects rows with Height greater than 82
Girth Height Volume
6   10.8     83   19.7
17  12.9     85   33.8
18  13.3     86   27.4
31  20.6     87   77.0
> trees[10:12,2]
[1] 75 79 7

我们可以看到最后一个例子他的返回值是一个向量，因为我们只提取了一列，我们可以用drop=FALSE来取消这种行为

> trees[10:12,2, drop = FALSE]
Height
10     75
11     79
12     76

怎样修改一个dataframe

我们可以像matrix一样用重新赋值来修改

> x
SN Age Name
1  1  21 John
2  2  15 Dora
> x[1,"Age"] <- 20; x
SN Age Name
1  1  20 John
2  2  15 Dora

添加元素

rbind()添加行

> rbind(x,list(1,16,"Paul"))
SN Age Name
1  1  20 John
2  2  15 Dora
3  1  16 Paul

cbind()添加列

> cbind(x,State=c("NY","FL"))
SN Age Name State
1  1  20 John    NY
2  2  15 Dora    FL

因为dataframe是用list来实现的，所以我们可以用像list一样的渎职来添加列

> x
SN Age Name
1  1  20 John
2  2  15 Dora
> x$State <- c("NY","FL"); x
SN Age Name State
1  1  20 John    NY
2  2  15 Dora    FL

删除元素

赋空值

> x$State <- NULL
> x
SN Age Name
1  1  20 John
2  2  15 Dora

可以用重新赋值来删除行

> x <- x[-1,]
> x
SN Age Name
2  2  15 Dora

R factor

因子是用于提前知道所有有限值的情况（类别数据），x相当于一组基

举个例子，一个关于婚姻状态的数据集可能会包含单身，已婚，分居，离婚，和丧偶，在这种情况下我们事先知道这些事先定义的离散的值他们被称作levels，如下例

> x
[1] single  married married single
Levels: married single

我们可以看到x有四个元素，两个levels，我们可以用class来检查一个变量是否是factor，同样的，factor的levels可以用levels()来查看

> class(x)
[1] "factor"
> levels(x)
[1] "married" "single"

怎样创建一个factor

我们可以用factor()来创建factor，factor的levels如果没有给出的话，可以自动由数据生成

> x <- factor(c("single", "married", "married", "single")); 
> x 
[1] single  married married single
Levels: married single
> x <- factor(c("single", "married", "married", "single"), levels = c("single", "married", "divorced"));
> x
[1] single  married married single
Levels: single married divorced

从上例可以看出，levels可能定义了并没有用上

factor和vector很相似，factor是作为整数储存的，这个可以从他的结构中清楚的看出来

x <- factor(c("single","married","married","single"))
> str(x)
Factor w/ 2 levels "married","single": 2 1 1 2

我们可以看到levels是作为字符向量存储的，而factor存储的实际上是索引

当我们读非数字的列进dataframe的时候，也会将其转换成factor

data.frmae()默认的将字符向量转化成为因子，要阻止这种行为可以通过传入stringsAsFactors = FALSE参数来指定

如何存取factor元素

与向量很类似

> x
[1] single  married married single
Levels: married single
> x[3]           # access 3rd element
[1] married
Levels: married single
>  x[c(2, 4)]     # access 2nd and 4th element
[1] married single
Levels: married single
> x[-1]          # access all but 1st element
[1] married married single
Levels: married single
> x[c(TRUE, FALSE, FALSE, TRUE)]  # using logical vector
[1] single single
Levels: married single

如何修改一个facotr

重新赋值，要注意的是我们不能给他赋levels以外的值

> x
[1] single  married married single
Levels: single married divorced
> x[2] <- "divorced"    # modify second element;  x
[1] single   divorced married  single  
Levels: single married divorced
> x[3] <- "widowed"    # cannot assign values outside levels
Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "widowed") :
invalid factor level, NA generated
> x
[1] single   divorced <NA>     single  
Levels: single married divorced

一个解决方法就是拓展他的levels

> levels(x) <- c(levels(x), "widowed")    # add new level
> x[3] <- "widowed"
> x
[1] single   divorced widowed  single  
Levels: single married divorced widowed