R语言数据的整理与清洗（Data Frame 篇上）

最新推荐文章于 2024-08-13 20:31:51 发布

羊&鹿

最新推荐文章于 2024-08-13 20:31:51 发布

阅读量715

点赞数 25

文章标签： r语言数据挖掘数据分析

本文链接：https://blog.csdn.net/yangyulu1998/article/details/138160222

版权

《Cookbook for R》 Manipulating Data ~ Dataframe

Renaming columns in a data frame 数据框：列重命名

示例数据

# 以列为内容单位组建数据框
d <- data.frame(alpha=1:3, beta=4:6, gamma=7:9)
d
#>   alpha beta gamma
#> 1     1    4     7
#> 2     2    5     8
#> 3     3    6     9

# 访问列名
names(d)  
#> [1] "alpha" "beta"  "gamma"

重命名列的方法一：使用 plyr 包的 rename()
最简单的方法

library(plyr)
rename(d, c("beta"="two", "gamma"="three"))
#>   alpha two three
#> 1     1   4     7
#> 2     2   5     8
#> 3     3   6     9

重命名列的方法二：使用R自带函数
它直接修改原本的数据框，不需要再重新赋值返回

# 将列名 "beta" 改为 "two"
names(d)[names(d)=="beta"] <- "two"
d
#>   alpha two gamma
#> 1     1   4     7
#> 2     2   5     8
#> 3     3   6     9

# 也可以根据位置
# 同样的，具有修改数据的风险

# 按位置：修改第三个项目 "gamma" 为 "three"
names(d)[3] <- "three"
d
#>   alpha two three
#> 1     1   4     7
#> 2     2   5     8
#> 3     3   6     9

重命名列的方法三：使用R的字符串搜索和替换函数
Note：alpha 周围的 ^ 和 $ 是为了确保整个字符串匹配。
如果没有它们而有一个名为 alphabet 的列，它也会匹配。

names(d) <- sub("^alpha$", "one", names(d))
d
#>   one two three
#> 1   1   4     7
#> 2   2   5     8
#> 3   3   6     9

# 将所有 "t" 换成 "X"
names(d) <- gsub("t", "X", names(d))
d
#>   one Xwo Xhree
#> 1   1   4     7
#> 2   2   5     8
#> 3   3   6     9

# sub() 和 gsub() 函数的区别前面已经解释过了

Adding and removing columns from a data frame 数据框：加减列

添加或者删减列的方法很多

data <- read.table(header=TRUE, text='
 id weight
  1     20
  2     27
  3     24
')

# 添加列的方法
data$size      <- c("small", "large", "medium")
data[["size"]] <- c("small", "large", "medium")
data[,"size"]  <- c("small", "large", "medium")
data$size      <- 0   # 每一行都赋值 0


# 移除列的方法
data$size      <- NULL
data[["size"]] <- NULL
data[,"size"]  <- NULL
data[[3]]      <- NULL
data[,3]       <- NULL
data           <- subset(data, select=-size)

Reordering the columns in a data frame 数据框：列重排

示例数据

data <- read.table(header=TRUE, text='
    id weight   size
     1     20  small
     2     27  large
     3     24 medium
')

按列的位置或名字进行重排

# 按列的位置（数字）进行重排
data[c(1,3,2)]
#>   id   size weight
#> 1  1  small     20
#> 2  2  large     27
#> 3  3 medium     24


# 按列的名字进行重排
data[c("size", "id", "weight")]
#>     size id weight
#> 1  small  1     20
#> 2  large  2     27
#> 3 medium  3     24

# 如果想保存更改，需要重新赋值
# 例如：data <- data[c(1,3,2)]

上面的示例通过将数据框视为列表（本质上是一列一列的向量）来索引。

也可以使用矩阵形式的索引（matrix-style indexing）data[row,col]

data[, c(1,3,2)]
#>   id   size weight
#> 1  1  small     20
#> 2  2  large     27
#> 3  3 medium     24

矩阵索引的缺点是: 只指定一列时，它会给出不同的结果。

# 按 List-style indexing 指定一列
data[2]
#>   weight
#> 1     20
#> 2     27
#> 3     24

# 按 Matrix-style indexing 指定一列 
# 降维成为向量
data[,2]
#> [1] 20 27 24

# 按 Matrix-style indexing 指定一列但 drop=FALSE 
# 仍然以数据框形式返回结果
data[, 2, drop=FALSE]
#>   weight
#> 1     20
#> 2     27
#> 3     24

通常使用列表式索引或 drop=FALSE 选项更安全

Merging data frames 合并数据框

先构建两个数据框

stories <- read.table(header=TRUE, text='
   storyid  title
    1       lions
    2      tigers
    3       bears
')

data <- read.table(header=TRUE, text='
    subject storyid rating
          1       1    6.7
          1       2    4.5
          1       3    3.7
          2       2    3.3
          2       3    4.1
          2       1    5.2
')

情况一：merge()函数合并两个数据框

# 第一、二个参数是你要合并的数据框
# 第三个参数是合并的依据——共同存在的列
merge(stories, data, "storyid")
#>   storyid  title subject rating
#> 1       1  lions       1    6.7
#> 2       1  lions       2    5.2
#> 3       2 tigers       1    4.5
#> 4       2 tigers       2    3.3
#> 5       3  bears       1    3.7
#> 6       3  bears       2    4.1

情况二：如果两个数据框中要匹配的列具有不同的名称，merge()可以指定名称

stories2 <- read.table(header=TRUE, text='
   id       title
    1       lions
    2      tigers
    3       bears
')

# 指定每个数据框按哪个列进行合并
# stories2$id 和 data$storyid.
merge(x=stories2, y=data, by.x="id", by.y="storyid")
#>   id  title subject rating
#> 1  1  lions       1    6.7
#> 2  1  lions       2    5.2
#> 3  2 tigers       1    4.5
#> 4  2 tigers       2    3.3
#> 5  3  bears       1    3.7
#> 6  3  bears       2    4.1

情况三：merge()可以按多列进行合并

# 再来两个示例
animals <- read.table(header=T, text='
   size type         name
  small  cat         lynx
    big  cat        tiger
  small  dog    chihuahua
    big  dog "great dane"
')

observations <- read.table(header=T, text='
   number  size type
        1   big  cat
        2 small  dog
        3 small  dog
        4   big  dog
')

merge(observations, animals, c("size","type"))
#>    size type number       name
#> 1   big  cat      1      tiger
#> 2   big  dog      4 great dane
#> 3 small  dog      2  chihuahua
#> 4 small  dog      3  chihuahua

羊&鹿

关注

25
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
R语言数据的整理与清洗（Data Frame 篇上）

也可以使用矩阵形式的索引（matrix-style indexing）上面的示例通过将数据框视为列表（本质上是一列一列的向量）来索引。矩阵索引的缺点是: 只指定一列时，它会给出不同的结果。情况二：如果两个数据框中要匹配的列具有不同的名称，重命名列的方法三：使用R的字符串搜索和替换函数。它直接修改原本的数据框，不需要再重新赋值返回。重命名列的方法二：使用R自带函数。是为了确保整个字符串匹配。如果没有它们而有一个名为。重命名列的方法一：使用。添加或者删减列的方法很多。按列的位置或名字进行重排。
复制链接

扫一扫