R语言中的dplyr包

最新推荐文章于 2024-08-17 06:13:14 发布

zoujiahui_2018

最新推荐文章于 2024-08-17 06:13:14 发布

阅读量2.5k

点赞数 1

分类专栏： R 文章标签： r语言开发语言

本文链接：https://blog.csdn.net/qq_18055167/article/details/123468357

版权

R 专栏收录该内容

59 篇文章 55 订阅

订阅专栏

介绍

dplyr是一个常用的用于数据清洗的R包，其中主要的函数有：

select() 从数据中选择列
filter() 数据行的子集
group_by() 汇总数据
summarise() 汇总数据（计算汇总统计信息）
arrange() 排序数据
mutate() 创建新变量

mutate()的使用方法

mutate(df, new_variable=existing_var的表达式，.keep = c("all", "used", "unused", "none"),  .before = NULL,  .after = NULL)

参数介绍：
df: 需要修改的数据框
new_variable: 新变量的名称
.keep： This is an experimental argument that allows you to control which columns from .data are retained in the output:

“all”, the default, retains all variables.
“used” keeps any variables used to make new variables; it’s useful for checking your work as it displays inputs and outputs side-by-side.
“unused” keeps only existing variables not used to make new
variables.
“none”, only keeps grouping keys (like transmute()).
Grouping variables are always kept, unconditional to .keep.
.before, .after Optionally, control where new columns should appear (the default is to add to the right hand side).

实例


# By default, new columns are placed on the far right.
# Experimental: you can override with `.before` or `.after`
df <- tibble(x = 1, y = 2)
df %>% mutate(z = x + y)
# # A tibble: 1 x 3
#         x     y     z
#       <dbl> <dbl> <dbl>
#   1     1     2     3

df %>% mutate(z = x + y, .before = 1)
# # A tibble: 1 x 3
#         z     x     y
#       <dbl> <dbl> <dbl>
#   1     3     1     2

df %>% mutate(z = x + y, .after = x)
# # A tibble: 1 x 3
#         x     z     y
#       <dbl> <dbl> <dbl>
#   1     1     3     2

# By default, new columns are placed on the far right.
# Experimental: you can override with `.before` or `.after`
df <- tibble(x = 1, y = 2)
df %>% mutate(z = x + y)
# # A tibble: 1 x 3
#         x     y     z
#       <dbl> <dbl> <dbl>
#   1     1     2     3

df %>% mutate(z = x + y, .before = 1)
# # A tibble: 1 x 3
#         z     x     y
#       <dbl> <dbl> <dbl>
#   1     3     1     2

df %>% mutate(z = x + y, .after = x)
# # A tibble: 1 x 3
#         x     z     y
#       <dbl> <dbl> <dbl>
#   1     1     3     2

# By default, mutate() keeps all columns from the input data.
# Experimental: You can override with `.keep`
df <- tibble(x = 1, y = 2, a = "a", b = "b")
df %>% mutate(z = x + y, .keep = "all") # the default
# # A tibble: 1 x 5
#         x     y      a     b     z
#        <dbl> <dbl> <chr> <chr> <dbl>
#   1     1     2      a     b    3

df %>% mutate(z = x + y, .keep = "used")
# # A tibble: 1 x 3
#         x     y     z
#       <dbl> <dbl> <dbl>
#   1     1     2     3


df %>% mutate(z = x + y, .keep = "unused")
# # A tibble: 1 x 3
#        a     b     z
#       <chr> <chr> <dbl>
#   1    a     b     3


df %>% mutate(z = x + y, .keep = "none") # same as transmute()
# # A tibble: 1 x 1
#         z
#       <dbl>
#   1     3