R语言-数据转换

最新推荐文章于 2024-08-31 19:09:08 发布

weixin_51077152

最新推荐文章于 2024-08-31 19:09:08 发布

阅读量1.1w

点赞数 34

分类专栏： R语言文章标签： r语言

本文链接：https://blog.csdn.net/weixin_51077152/article/details/120056997

版权

本文介绍了R语言中数据转换的各种方法，包括使用is和as函数进行数据类型转换，子集提取，数据框合并，去除重复项，数据转置，排序，使用apply系列函数，以及reshape2、tidyr和dplyr包中的数据处理功能，如数据重塑、去重、排序、分组和合并等。

摘要由CSDN通过智能技术生成

常见的数据类型：向量，矩阵，数据框，数组，列表

1. 用is和as函数

is相关函数：判断数据类型

> methods(is)
 [1] is.Alignment            is.array                is.atomic               is.Border               is.call                
 [6] is.CellBlock            is.CellProtection       is.CellStyle            is.character            is.complex             
[11] is.data.frame           is.DataFormat           is.double               is.element              is.empty.model         
[16] is.environment          is.expression           is.factor               is.Fill                 is.finite              
[21] is.Font                 is.function             is.infinite             is.integer              is.jnull               
[26] is.language             is.leaf                 is.list                 is.loaded               is.logical             
[31] is.matrix               is.mts                  is.na                   is.na.data.frame        is.na.numeric_version  
[36] is.na.POSIXlt           is.na<-                 is.na<-.default         is.na<-.factor          is.na<-.numeric_version
[41] is.name                 is.nan                  is.null                 is.numeric              is.numeric.Date        
[46] is.numeric.difftime     is.numeric.POSIXt       is.numeric_version      is.object               is.ordered             
[51] is.package_version      is.pairlist             is.primitive            is.qr                   is.R                   
[56] is.raster               is.raw                  is.recursive            is.relistable           is.single              
[61] is.stepfun              is.symbol               is.table                is.ts                   is.tskernel            
[66] is.unsorted             is.vector

as相关函数：强制转换数据

> methods(as)
  [1] as.array                      as.array.default              as.call                       as.character                 
  [5] as.character.condition        as.character.Date             as.character.default          as.character.error           
  [9] as.character.factor           as.character.hexmode          as.character.numeric_version  as.character.octmode         
 [13] as.character.POSIXt           as.character.srcref           as.complex                    as.data.frame                
 [17] as.data.frame.array           as.data.frame.AsIs            as.data.frame.character       as.data.frame.complex        
 [21] as.data.frame.data.frame      as.data.frame.Date            as.data.frame.default         as.data.frame.difftime       
 [25] as.data.frame.factor          as.data.frame.integer         as.data.frame.list            as.data.frame.logical        
 [29] as.data.frame.matrix          as.data.frame.model.matrix    as.data.frame.noquote         as.data.frame.numeric        
 [33] as.data.frame.numeric_version as.data.frame.ordered         as.data.frame.POSIXct         as.data.frame.POSIXlt        
 [37] as.data.frame.raw             as.data.frame.table           as.data.frame.ts              as.data.frame.vector         
 [41] as.Date                       as.Date.character             as.Date.default               as.Date.factor               
 [45] as.Date.numeric               as.Date.POSIXct               as.Date.POSIXlt               as.dendrogram                
 [49] as.difftime                   as.dist                       as.double                     as.double.difftime           
 [53] as.double.POSIXlt             as.environment                as.expression                 as.expression.default        
 [57] as.factor                     as.formula                    as.function                   as.function.default          
 [61] as.graphicsAnnot              as.hclust                     as.hexmode                    as.integer                   
 [65] as.list                       as.list.data.frame            as.list.Date                  as.list.default              
 [69] as.list.difftime              as.list.environment           as.list.factor                as.list.function             
 [73] as.list.numeric_version       as.list.POSIXct               as.list.POSIXlt               as.logical                   
 [77] as.logical.factor             as.matrix                     as.matrix.data.frame          as.matrix.default            
 [81] as.matrix.noquote             as.matrix.POSIXlt             as.name                       as.null                      
 [85] as.null.default               as.numeric                    as.numeric_version            as.octmode                   
 [89] as.ordered                    as.package_version            as.pairlist                   as.person                    
 [93] as.personList                 as.POSIXct                    as.POSIXct.Date               as.POSIXct.default           
 [97] as.POSIXct.numeric            as.POSIXct.POSIXlt            as.POSIXlt                    as.POSIXlt.character         
[101] as.POSIXlt.Date               as.POSIXlt.default            as.POSIXlt.factor             as.POSIXlt.numeric           
[105] as.POSIXlt.POSIXct            as.qr                         as.raster                     as.raw                       
[109] as.relistable                 as.roman                      as.single                     as.single.default            
[113] as.stepfun                    as.symbol                     as.table                      as.table.default             
[117] as.ts                         as.vector                     as.vector.factor

将矩阵转换成数据框

> x <- state.x77
> is.data.frame(x)				    ##x不是数据框
[1] FALSE

> x <- as.data.frame(x)				##将x强制转换成数据框

> is.data.frame(x)
[1] TRUE						    ##x是数据框

数据框转换成矩阵

矩阵里面的数据必须是同一个数据类型，如字符串或者数值型
数据框里面的数据可以有多个数据类型
将含有多种数据类型的数据框转换成矩阵，将数据统一转换成字符串


> x <- as.matrix(x)			##将前面转换成数据框的x强制转换成矩阵

> is.data.frame(x)
[1] FALSE

> is.matrix(x)
[1] TRUE

不能用as将数据框转换成向量和因子

> x <- as.data.frame(state.x77)
> x <- as.vector(x)
> is.vector(x)
[1] FALSE
> 
> x <- as.factor(x)
Warning message:
In xtfrm.data.frame(x) : cannot xtfrm data frames

将向量转换成矩阵

> x <- state.abb
> class(x)
[1] "character"
> is.vector(x)
[1] TRUE														##验证该数据为向量

> dim(x) <- c(5,10)
> x
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "AL" "CO" "HI" "KS" "MA" "MT" "NM" "OK" "SD" "VA" 
[2,] "AK" "CT" "ID" "KY" "MI" "NE" "NY" "OR" "TN" "WA" 
[3,] "AZ" "DE" "IL" "LA" "MN" "NV" "NC" "PA" "TX" "WV" 
[4,] "AR" "FL" "IN" "ME" "MS" "NH" "ND" "RI" "UT" "WI" 
[5,] "CA" "GA" "IA" "MD" "MO" "NJ" "OH" "SC" "VT" "WY"

将向量转换成因子

> x <- state.abb
> as.factor(x)
 [1] AL AK AZ AR CA CO CT DE FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT
[46] VA WA WV WI WY
50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC ... WY

将向量转换成列表

> x <- state.abb
> as.list(x)
[[1]]
[1] "AL"

[[2]]
[1] "AK"

[[3]]
[1] "AZ"

[[4]]
[1] "AR"

[[5]]
[1] "CA"

...

[[46]]
[1] "VA"

[[47]]
[1] "WA"

[[48]]
[1] "WV"

[[49]]
[1] "WI"

[[50]]
[1] "WY"

2. 对数据取子集

> who <- read.csv("E:/R-workplace/1/RData/WHO.csv",header = T)		##202行，358列的数据
> View(who)
> Showing 1 to 18 of 202 entries, 358 total columns

利用索引提取子集

#####提取连续的数据
> who1 <- who[c(1:50),c(1:10)]
> View(who)
> Showing 1 to 18 of 50 entries, 10 total columns		##从who中提取50行10列的数据

####提取不连续的数据
> who2 <- who[c(1,4,6,3,10),c(38,234,214,78)]
> View(who2)
> Showing 1 to 5 of 5 entries, 4 total columns			##从who中提取指定的行跟列的数据

##提取who中Continent为7 的数据
> who3 <- who[which(who$Continent==7),]
> View(who3)
> Showing 1 to 9 of 9 entries, 358 total columns

##提取who中countryId在1-10的国家

> who4 <- who[which(who$CountryID>=1&who$CountryID<=10),]
> View(who4)
> Showing 1 to 10 of 10 entries, 358 total columns

subset函数

subset(x,subset,select,drop)
x：要子集的对象
subset：表示要保留的元素或行的逻辑表达式
select：指示要从数据帧中选择的列


> who5 <- subset(who,who$CountryID>=1 & who$CountryID<=10)
> View(who5)
> Showing 1 to 10 of 10 entries, 358 total columns

sample函数：进行随机抽样

sample(x, size, replace = FALSE, prob = NULL)
x：一个由一个或多个元素组成的向量，或一个正整数
size：抽样的数量
replace：数据是否返回，默认是无返回抽样

######1. 对向量进行抽样
> x <- 1:100
> sample(x,20)
 [1] 14 55 65 40 83 71 96 45 21 77 97 28 98 18 93 31 85 88 74 20
 ##用sort函数给抽样的数据排序
> sort(sample(x,20))
 [1]   3  12  18  25  33  34  36  38  39  41  45  47  58  59  60  66  87  97  98 100
 
> sort(sample(x,20,T))
 [1]  2  9 22 24 29 36 42 43 47 53 68 73 76 79 82 83 84 90 90 91

#######2. 对数据框进行抽样
> sample(who$CountryID,10)
 [1] 189 172 177 176  37 100  44 180 153 142
> who6 <- who[sample(who$CountryID,10),]
> View(who6)
> Showing 1 to 10 of 10 entries, 358 total columns

3. 对数据框进行合并

合并为列
data.frame（）函数

> USArrests		#1973年美国50个州每10万居民因袭击、谋杀和强奸被捕的统计数据,还有居住在城市地区的人口百分比。
> state.division   #与美国50个州有关的数据集
> ###将两个数据集合并
> data.frame(USArrests,state.division)
               Murder Assault UrbanPop Rape     state.division
Alabama          13.2     236       58 21.2 East South Central
Alaska           10.0     263       48 44.5            Pacific
Arizona           8.1     294       80 31.0           Mountain
Arkansas          8.8     190       50 19.5 West South Central
...
Washington        4.0     145       73 26.2            Pacific
West Virginia     5.7      81       39  9.3     South Atlantic
Wisconsin         2.6      53       66 10.8 East North Central
Wyoming           6.8     161       60 15.6           Mountain

cbind()函数

> cbind(USArrests,state.division)
>                Murder Assault UrbanPop Rape     state.division
Alabama          13.2     236       58 21.2 East South Central
Alaska           10.0     263       48 44.5            Pacific
Arizona           8.1     294       80 31.0           Mountain
Arkansas          8.8     190       50 19.5 West South Central
...
Washington        4.0     145       73 26.2            Pacific
West Virginia     5.7      81       39  9.3     South Atlantic
Wisconsin         2.6      53       66 10.8 East North Central
Wyoming           6.8     161       60 15.6           Mountain

合并为行：两个数据框要有相同的列名


> dat

最低0.47元/天解锁文章

weixin_51077152

关注

34
点赞
踩
106

收藏

觉得还不错? 一键收藏
1
评论
R语言-数据转换

常见的数据类型：向量，矩阵，数据框，列表1. 用is和as函数is相关函数：判断数据类型> methods(is) [1] is.Alignment is.array is.atomic is.Border is.call [6] is.CellBlock is.CellProtection is.CellS.
复制链接

扫一扫