常见的数据类型: 向量,矩阵,数据框,数组 ,列表
1. 用is和as函数
- is相关函数:判断数据类型
> methods(is)
[1] is.Alignment is.array is.atomic is.Border is.call
[6] is.CellBlock is.CellProtection is.CellStyle is.character is.complex
[11] is.data.frame is.DataFormat is.double is.element is.empty.model
[16] is.environment is.expression is.factor is.Fill is.finite
[21] is.Font is.function is.infinite is.integer is.jnull
[26] is.language is.leaf is.list is.loaded is.logical
[31] is.matrix is.mts is.na is.na.data.frame is.na.numeric_version
[36] is.na.POSIXlt is.na<- is.na<-.default is.na<-.factor is.na<-.numeric_version
[41] is.name is.nan is.null is.numeric is.numeric.Date
[46] is.numeric.difftime is.numeric.POSIXt is.numeric_version is.object is.ordered
[51] is.package_version is.pairlist is.primitive is.qr is.R
[56] is.raster is.raw is.recursive is.relistable is.single
[61] is.stepfun is.symbol is.table is.ts is.tskernel
[66] is.unsorted is.vector
- as相关函数:强制转换数据
> methods(as)
[1] as.array as.array.default as.call as.character
[5] as.character.condition as.character.Date as.character.default as.character.error
[9] as.character.factor as.character.hexmode as.character.numeric_version as.character.octmode
[13] as.character.POSIXt as.character.srcref as.complex as.data.frame
[17] as.data.frame.array as.data.frame.AsIs as.data.frame.character as.data.frame.complex
[21] as.data.frame.data.frame as.data.frame.Date as.data.frame.default as.data.frame.difftime
[25] as.data.frame.factor as.data.frame.integer as.data.frame.list as.data.frame.logical
[29] as.data.frame.matrix as.data.frame.model.matrix as.data.frame.noquote as.data.frame.numeric
[33] as.data.frame.numeric_version as.data.frame.ordered as.data.frame.POSIXct as.data.frame.POSIXlt
[37] as.data.frame.raw as.data.frame.table as.data.frame.ts as.data.frame.vector
[41] as.Date as.Date.character as.Date.default as.Date.factor
[45] as.Date.numeric as.Date.POSIXct as.Date.POSIXlt as.dendrogram
[49] as.difftime as.dist as.double as.double.difftime
[53] as.double.POSIXlt as.environment as.expression as.expression.default
[57] as.factor as.formula as.function as.function.default
[61] as.graphicsAnnot as.hclust as.hexmode as.integer
[65] as.list as.list.data.frame as.list.Date as.list.default
[69] as.list.difftime as.list.environment as.list.factor as.list.function
[73] as.list.numeric_version as.list.POSIXct as.list.POSIXlt as.logical
[77] as.logical.factor as.matrix as.matrix.data.frame as.matrix.default
[81] as.matrix.noquote as.matrix.POSIXlt as.name as.null
[85] as.null.default as.numeric as.numeric_version as.octmode
[89] as.ordered as.package_version as.pairlist as.person
[93] as.personList as.POSIXct as.POSIXct.Date as.POSIXct.default
[97] as.POSIXct.numeric as.POSIXct.POSIXlt as.POSIXlt as.POSIXlt.character
[101] as.POSIXlt.Date as.POSIXlt.default as.POSIXlt.factor as.POSIXlt.numeric
[105] as.POSIXlt.POSIXct as.qr as.raster as.raw
[109] as.relistable as.roman as.single as.single.default
[113] as.stepfun as.symbol as.table as.table.default
[117] as.ts as.vector as.vector.factor
- 将矩阵转换成数据框
> x <- state.x77
> is.data.frame(x) ##x不是数据框
[1] FALSE
> x <- as.data.frame(x) ##将x强制转换成数据框
> is.data.frame(x)
[1] TRUE ##x是数据框
- 数据框转换成矩阵
矩阵里面的数据必须是同一个数据类型,如字符串或者数值型
数据框里面的数据可以有多个数据类型
将含有多种数据类型的数据框转换成矩阵,将数据统一转换成字符串
> x <- as.matrix(x) ##将前面转换成数据框的x强制转换成矩阵
> is.data.frame(x)
[1] FALSE
> is.matrix(x)
[1] TRUE
- 不能用as将数据框转换成向量和因子
> x <- as.data.frame(state.x77)
> x <- as.vector(x)
> is.vector(x)
[1] FALSE
>
> x <- as.factor(x)
Warning message:
In xtfrm.data.frame(x) : cannot xtfrm data frames
- 将向量转换成矩阵
> x <- state.abb
> class(x)
[1] "character"
> is.vector(x)
[1] TRUE ##验证该数据为向量
> dim(x) <- c(5,10)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "AL" "CO" "HI" "KS" "MA" "MT" "NM" "OK" "SD" "VA"
[2,] "AK" "CT" "ID" "KY" "MI" "NE" "NY" "OR" "TN" "WA"
[3,] "AZ" "DE" "IL" "LA" "MN" "NV" "NC" "PA" "TX" "WV"
[4,] "AR" "FL" "IN" "ME" "MS" "NH" "ND" "RI" "UT" "WI"
[5,] "CA" "GA" "IA" "MD" "MO" "NJ" "OH" "SC" "VT" "WY"
- 将向量转换成因子
> x <- state.abb
> as.factor(x)
[1] AL AK AZ AR CA CO CT DE FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT
[46] VA WA WV WI WY
50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC ... WY
- 将向量转换成列表
> x <- state.abb
> as.list(x)
[[1]]
[1] "AL"
[[2]]
[1] "AK"
[[3]]
[1] "AZ"
[[4]]
[1] "AR"
[[5]]
[1] "CA"
...
[[46]]
[1] "VA"
[[47]]
[1] "WA"
[[48]]
[1] "WV"
[[49]]
[1] "WI"
[[50]]
[1] "WY"
2. 对数据取子集
> who <- read.csv("E:/R-workplace/1/RData/WHO.csv",header = T) ##202行,358列的数据
> View(who)
> Showing 1 to 18 of 202 entries, 358 total columns
- 利用索引提取子集
#####提取连续的数据
> who1 <- who[c(1:50),c(1:10)]
> View(who)
> Showing 1 to 18 of 50 entries, 10 total columns ##从who中提取50行10列的数据
####提取不连续的数据
> who2 <- who[c(1,4,6,3,10),c(38,234,214,78)]
> View(who2)
> Showing 1 to 5 of 5 entries, 4 total columns ##从who中提取指定的行跟列的数据
##提取who中Continent为7 的数据
> who3 <- who[which(who$Continent==7),]
> View(who3)
> Showing 1 to 9 of 9 entries, 358 total columns
##提取who中countryId在1-10的国家
> who4 <- who[which(who$CountryID>=1&who$CountryID<=10),]
> View(who4)
> Showing 1 to 10 of 10 entries, 358 total columns
- subset函数
subset(x,subset,select,drop)
x:要子集的对象
subset:表示要保留的元素或行的逻辑表达式
select:指示要从数据帧中选择的列
> who5 <- subset(who,who$CountryID>=1 & who$CountryID<=10)
> View(who5)
> Showing 1 to 10 of 10 entries, 358 total columns
- sample函数:进行随机抽样
sample(x, size, replace = FALSE, prob = NULL)
x:一个由一个或多个元素组成的向量,或一个正整数
size:抽样的数量
replace:数据是否返回,默认是无返回抽样
######1. 对向量进行抽样
> x <- 1:100
> sample(x,20)
[1] 14 55 65 40 83 71 96 45 21 77 97 28 98 18 93 31 85 88 74 20
##用sort函数给抽样的数据排序
> sort(sample(x,20))
[1] 3 12 18 25 33 34 36 38 39 41 45 47 58 59 60 66 87 97 98 100
> sort(sample(x,20,T))
[1] 2 9 22 24 29 36 42 43 47 53 68 73 76 79 82 83 84 90 90 91
#######2. 对数据框进行抽样
> sample(who$CountryID,10)
[1] 189 172 177 176 37 100 44 180 153 142
> who6 <- who[sample(who$CountryID,10),]
> View(who6)
> Showing 1 to 10 of 10 entries, 358 total columns
3. 对数据框进行合并
- 合并为列
data.frame()函数
> USArrests #1973年美国50个州每10万居民因袭击、谋杀和强奸被捕的统计数据,还有居住在城市地区的人口百分比。
> state.division #与美国50个州有关的数据集
> ###将两个数据集合并
> data.frame(USArrests,state.division)
Murder Assault UrbanPop Rape state.division
Alabama 13.2 236 58 21.2 East South Central
Alaska 10.0 263 48 44.5 Pacific
Arizona 8.1 294 80 31.0 Mountain
Arkansas 8.8 190 50 19.5 West South Central
...
Washington 4.0 145 73 26.2 Pacific
West Virginia 5.7 81 39 9.3 South Atlantic
Wisconsin 2.6 53 66 10.8 East North Central
Wyoming 6.8 161 60 15.6 Mountain
cbind()函数
> cbind(USArrests,state.division)
> Murder Assault UrbanPop Rape state.division
Alabama 13.2 236 58 21.2 East South Central
Alaska 10.0 263 48 44.5 Pacific
Arizona 8.1 294 80 31.0 Mountain
Arkansas 8.8 190 50 19.5 West South Central
...
Washington 4.0 145 73 26.2 Pacific
West Virginia 5.7 81 39 9.3 South Atlantic
Wisconsin 2.6 53 66 10.8 East North Central
Wyoming 6.8 161 60 15.6 Mountain
- 合并为行:两个数据框要有相同的列名
> dat