R语言-数据转换

常见的数据类型: 向量,矩阵,数据框,数组 ,列表

1. 用is和as函数

  • is相关函数:判断数据类型
> methods(is)
 [1] is.Alignment            is.array                is.atomic               is.Border               is.call                
 [6] is.CellBlock            is.CellProtection       is.CellStyle            is.character            is.complex             
[11] is.data.frame           is.DataFormat           is.double               is.element              is.empty.model         
[16] is.environment          is.expression           is.factor               is.Fill                 is.finite              
[21] is.Font                 is.function             is.infinite             is.integer              is.jnull               
[26] is.language             is.leaf                 is.list                 is.loaded               is.logical             
[31] is.matrix               is.mts                  is.na                   is.na.data.frame        is.na.numeric_version  
[36] is.na.POSIXlt           is.na<-                 is.na<-.default         is.na<-.factor          is.na<-.numeric_version
[41] is.name                 is.nan                  is.null                 is.numeric              is.numeric.Date        
[46] is.numeric.difftime     is.numeric.POSIXt       is.numeric_version      is.object               is.ordered             
[51] is.package_version      is.pairlist             is.primitive            is.qr                   is.R                   
[56] is.raster               is.raw                  is.recursive            is.relistable           is.single              
[61] is.stepfun              is.symbol               is.table                is.ts                   is.tskernel            
[66] is.unsorted             is.vector
  • as相关函数:强制转换数据
> methods(as)
  [1] as.array                      as.array.default              as.call                       as.character                 
  [5] as.character.condition        as.character.Date             as.character.default          as.character.error           
  [9] as.character.factor           as.character.hexmode          as.character.numeric_version  as.character.octmode         
 [13] as.character.POSIXt           as.character.srcref           as.complex                    as.data.frame                
 [17] as.data.frame.array           as.data.frame.AsIs            as.data.frame.character       as.data.frame.complex        
 [21] as.data.frame.data.frame      as.data.frame.Date            as.data.frame.default         as.data.frame.difftime       
 [25] as.data.frame.factor          as.data.frame.integer         as.data.frame.list            as.data.frame.logical        
 [29] as.data.frame.matrix          as.data.frame.model.matrix    as.data.frame.noquote         as.data.frame.numeric        
 [33] as.data.frame.numeric_version as.data.frame.ordered         as.data.frame.POSIXct         as.data.frame.POSIXlt        
 [37] as.data.frame.raw             as.data.frame.table           as.data.frame.ts              as.data.frame.vector         
 [41] as.Date                       as.Date.character             as.Date.default               as.Date.factor               
 [45] as.Date.numeric               as.Date.POSIXct               as.Date.POSIXlt               as.dendrogram                
 [49] as.difftime                   as.dist                       as.double                     as.double.difftime           
 [53] as.double.POSIXlt             as.environment                as.expression                 as.expression.default        
 [57] as.factor                     as.formula                    as.function                   as.function.default          
 [61] as.graphicsAnnot              as.hclust                     as.hexmode                    as.integer                   
 [65] as.list                       as.list.data.frame            as.list.Date                  as.list.default              
 [69] as.list.difftime              as.list.environment           as.list.factor                as.list.function             
 [73] as.list.numeric_version       as.list.POSIXct               as.list.POSIXlt               as.logical                   
 [77] as.logical.factor             as.matrix                     as.matrix.data.frame          as.matrix.default            
 [81] as.matrix.noquote             as.matrix.POSIXlt             as.name                       as.null                      
 [85] as.null.default               as.numeric                    as.numeric_version            as.octmode                   
 [89] as.ordered                    as.package_version            as.pairlist                   as.person                    
 [93] as.personList                 as.POSIXct                    as.POSIXct.Date               as.POSIXct.default           
 [97] as.POSIXct.numeric            as.POSIXct.POSIXlt            as.POSIXlt                    as.POSIXlt.character         
[101] as.POSIXlt.Date               as.POSIXlt.default            as.POSIXlt.factor             as.POSIXlt.numeric           
[105] as.POSIXlt.POSIXct            as.qr                         as.raster                     as.raw                       
[109] as.relistable                 as.roman                      as.single                     as.single.default            
[113] as.stepfun                    as.symbol                     as.table                      as.table.default             
[117] as.ts                         as.vector                     as.vector.factor  
  • 将矩阵转换成数据框
> x <- state.x77
> is.data.frame(x)				    ##x不是数据框
[1] FALSE

> x <- as.data.frame(x)				##将x强制转换成数据框

> is.data.frame(x)
[1] TRUE						    ##x是数据框
  • 数据框转换成矩阵

矩阵里面的数据必须是同一个数据类型,如字符串或者数值型
数据框里面的数据可以有多个数据类型
将含有多种数据类型的数据框转换成矩阵,将数据统一转换成字符串


> x <- as.matrix(x)			##将前面转换成数据框的x强制转换成矩阵

> is.data.frame(x)
[1] FALSE

> is.matrix(x)
[1] TRUE
  • 不能用as将数据框转换成向量和因子
> x <- as.data.frame(state.x77)
> x <- as.vector(x)
> is.vector(x)
[1] FALSE
> 
> x <- as.factor(x)
Warning message:
In xtfrm.data.frame(x) : cannot xtfrm data frames
  • 将向量转换成矩阵
> x <- state.abb
> class(x)
[1] "character"
> is.vector(x)
[1] TRUE														##验证该数据为向量

> dim(x) <- c(5,10)
> x
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "AL" "CO" "HI" "KS" "MA" "MT" "NM" "OK" "SD" "VA" 
[2,] "AK" "CT" "ID" "KY" "MI" "NE" "NY" "OR" "TN" "WA" 
[3,] "AZ" "DE" "IL" "LA" "MN" "NV" "NC" "PA" "TX" "WV" 
[4,] "AR" "FL" "IN" "ME" "MS" "NH" "ND" "RI" "UT" "WI" 
[5,] "CA" "GA" "IA" "MD" "MO" "NJ" "OH" "SC" "VT" "WY" 
  • 将向量转换成因子
> x <- state.abb
> as.factor(x)
 [1] AL AK AZ AR CA CO CT DE FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT
[46] VA WA WV WI WY
50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC ... WY
  • 将向量转换成列表
> x <- state.abb
> as.list(x)
[[1]]
[1] "AL"

[[2]]
[1] "AK"

[[3]]
[1] "AZ"

[[4]]
[1] "AR"

[[5]]
[1] "CA"

...

[[46]]
[1] "VA"

[[47]]
[1] "WA"

[[48]]
[1] "WV"

[[49]]
[1] "WI"

[[50]]
[1] "WY"

2. 对数据取子集

> who <- read.csv("E:/R-workplace/1/RData/WHO.csv",header = T)		##202行,358列的数据
> View(who)
> Showing 1 to 18 of 202 entries, 358 total columns
  • 利用索引提取子集
#####提取连续的数据
> who1 <- who[c(1:50),c(1:10)]
> View(who)
> Showing 1 to 18 of 50 entries, 10 total columns		##从who中提取50行10列的数据

####提取不连续的数据
> who2 <- who[c(1,4,6,3,10),c(38,234,214,78)]
> View(who2)
> Showing 1 to 5 of 5 entries, 4 total columns			##从who中提取指定的行跟列的数据

##提取who中Continent为7 的数据
> who3 <- who[which(who$Continent==7),]
> View(who3)
> Showing 1 to 9 of 9 entries, 358 total columns

##提取who中countryId在1-10的国家

> who4 <- who[which(who$CountryID>=1&who$CountryID<=10),]
> View(who4)
> Showing 1 to 10 of 10 entries, 358 total columns
  • subset函数

subset(x,subset,select,drop)
x:要子集的对象
subset:表示要保留的元素或行的逻辑表达式
select:指示要从数据帧中选择的列


> who5 <- subset(who,who$CountryID>=1 & who$CountryID<=10)
> View(who5)
> Showing 1 to 10 of 10 entries, 358 total columns
  • sample函数:进行随机抽样

sample(x, size, replace = FALSE, prob = NULL)
x:一个由一个或多个元素组成的向量,或一个正整数
size:抽样的数量
replace:数据是否返回,默认是无返回抽样

######1. 对向量进行抽样
> x <- 1:100
> sample(x,20)
 [1] 14 55 65 40 83 71 96 45 21 77 97 28 98 18 93 31 85 88 74 20
 ##用sort函数给抽样的数据排序
> sort(sample(x,20))
 [1]   3  12  18  25  33  34  36  38  39  41  45  47  58  59  60  66  87  97  98 100
 
> sort(sample(x,20,T))
 [1]  2  9 22 24 29 36 42 43 47 53 68 73 76 79 82 83 84 90 90 91

#######2. 对数据框进行抽样
> sample(who$CountryID,10)
 [1] 189 172 177 176  37 100  44 180 153 142
> who6 <- who[sample(who$CountryID,10),]
> View(who6)
> Showing 1 to 10 of 10 entries, 358 total columns

3. 对数据框进行合并

  • 合并为列
    data.frame()函数
> USArrests		#1973年美国50个州每10万居民因袭击、谋杀和强奸被捕的统计数据,还有居住在城市地区的人口百分比。
> state.division   #与美国50个州有关的数据集
> ###将两个数据集合并
> data.frame(USArrests,state.division)
               Murder Assault UrbanPop Rape     state.division
Alabama          13.2     236       58 21.2 East South Central
Alaska           10.0     263       48 44.5            Pacific
Arizona           8.1     294       80 31.0           Mountain
Arkansas          8.8     190       50 19.5 West South Central
...
Washington        4.0     145       73 26.2            Pacific
West Virginia     5.7      81       39  9.3     South Atlantic
Wisconsin         2.6      53       66 10.8 East North Central
Wyoming           6.8     161       60 15.6           Mountain

cbind()函数

> cbind(USArrests,state.division)
>                Murder Assault UrbanPop Rape     state.division
Alabama          13.2     236       58 21.2 East South Central
Alaska           10.0     263       48 44.5            Pacific
Arizona           8.1     294       80 31.0           Mountain
Arkansas          8.8     190       50 19.5 West South Central
...
Washington        4.0     145       73 26.2            Pacific
West Virginia     5.7      81       39  9.3     South Atlantic
Wisconsin         2.6      53       66 10.8 East North Central
Wyoming           6.8     161       60 15.6           Mountain
  • 合并为行:两个数据框要有相同的列名

> dat
  • 34
    点赞
  • 105
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值