R-tidyverse
参考资料:https://bookdown.org/wangminjie/R4DS/tidyverse-readr.html
13.4 用法
13.4.1 索取
head
、tail
函数分别显示前6个或者后6个
slice
取出固定位置的行
# 按行位置索引,加上符号这是反选
penguins %>% slice(2:5)
# ??
penguins %>%
group_by(species) %>%
slice(1)
# 每组只取2行
penguins %>%
group_by(species) %>%
slice_head( n=2)
# pro=0.5 表示按分组只取每组内一半的数据
penguins %>%
group_by(species) %>%
slice_head( prop = 0.5 )
slice
按位置索引
## bill_length_mm中最大值所在的行
### 法1
penguins %>%
filter(bill_length_mm == max(bill_length_mm))
### 法2
penguins %>%
arrange(desc(bill_length_mm)) %>%
slice(1)
### 法3
penguins %>%
slice_max(bill_length_mm)
### 抽样,replace = TRUE 表示有重复抽烟
iris %>% as_tibble() %>% slice_sample(n = 5, replace = TRUE)
separate
分割;unite
联合
tb <- tibble::tribble(
~day, ~price,
1, "30-45",
2, "40-95",
3, "89-65",
4, "45-63",
5, "52-42"
)
tb1 <- tb %>%
separate(price, into = c("low", "high"), sep = "-")
tb1
tb1 %>%
unite(col = "price", c(low, high), sep = ":", remove = FALSE)
distinct
处理的对象是data.frame;功能是筛选不重复的row;n_distinct()
处理的对象是vector,功能是统计不同的元素有多少个,返回一个数值
df <- tibble::tribble(
~x, ~y, ~z,
1, 1, 1,
1, 1, 2,
1, 1, 1,
2, 1, 2,
2, 2, 3,
3, 3, 1
)
df
df %>%
distinct()
df %>%
distinct(x)
df %>%
distinct(x, y)
df %>%
distinct(x, y, .keep_all = TRUE) # 只保留最先出现的row
df %>%
distinct(
across(c(x, y)),
.keep_all = TRUE
)
df %>%
group_by(x) %>%
distinct(y, .keep_all = TRUE)
across()
函数用法,用在 mutate() 和summarise() 函数里面
across() 对多列执行相同的函数操作,返回数据框
across(.cols = everything(), .fns = NULL, …, .names = NULL)