R手册(Tidy+Transform)--dplyr and plyr


dplyr: A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

dplyr

转换和添加变量

函数说明
mutate(data,…)添加新变量*(base::transform)*,可以用已存在的变量,可以和聚合函数配合使用
mutate_all(.tbl, .funs, …).funs: List of function calls generated by funs(), or a character vector of function names, or simply a function
mutate_if(.tbl, .predicate, .funs, …).predicate: A predicate function to be applied to the columns or a logical vector
mutate_at(.tbl, .vars, .funs, …).vars: A list of columns generated by vars(), a character vector of column names, a numeric vector of column positions
transmute(data,…)添加新变量,去除已有的变量
case_when(…)if…else if…else变体,条件按顺序判定依次执行

for example

mtcars<-mutate(mtcars,x=ifelse(x>100,NA,x)) #处理异常值
mtcars<-mutate(mtcars,x=NULL) #赋值NULL表示删除变量
mtcars%>%group_by(cyl)%>%mutate(m=mean(mpg)) # 添加组内平均值
iris %>% mutate_at(vars(matches("Sepal")), log)
iris %>% as_tibble() %>% mutate_if(is.factor, as.character)
iris %>% group_by(Species) %>% mutate_all(funs(inches = . / 2.54))

data %>%mutate(
    newvar = case_when(
            cond1 ~ value1,
            cond2 ~ value2,
            TRUE ~ other_value )#相当于else语句
  )

筛选

函数说明
filter(data,…)筛选*(base::subset(x, subset, select))*
filter_all(.tbl, .vars_predicate).vars_predicate: A quoted predicate expression as returned by all_vars() or any_vars()
filter_if(.tbl, .predicate, .vars_predicate)
filter_at(.tbl, .vars, .vars_predicate).vars: A list of columns generated by vars(), a character vector of column names, a numeric vector of column positions
popular_dests <- flights %>% group_by(dest) %>% filter(n() > 365)
slice(.data,…)切片
slice(mtcars, n())
slice(mtcars, 5:n())

select and rename

函数说明
select()用变量名选择变量
select(flights, -(year:day))
select_all(.tbl, .funs = list(), …)
select_if(.tbl, .predicate, .funs = list(), …)
select_at(.tbl, .vars, .funs = list(), …)
rename()keeps all variables
rename(iris, oldname = newname,...)
rename_all(.tbl, .funs = list(), …)
rename_if(.tbl, .predicate, .funs = list(), …)
rename_at(.tbl, .vars, .funs = list(), …)

参数.vars

赋值说明
ends_with()ends with aprefix
select(iris, starts_with("Petal"))
contains()containsaliteralstring
matches()匹配正则表达式(regularexpression)
num_range()类数字区域likex01,x02,x03.
one_of()variables in character vector.
everything()换位置输出
select(flights, time_hour, air_time, everything())

分组

函数说明
group_by(.data, …, add = FALSE)/分组,大多数函数可用已分的组
group_by_all(.tbl, .funs = list(), …)
group_by_at(.tbl, .vars, .funs = list(), …,)
group_by_if(.tbl, .predicate, .funs = list(), …,)
ungroup()解除分组
base::by(data, INDICES, FUN)indices:要分组的factor or factor list

聚合

函数说明
summarise()reduces multiple values down to a single summary.
summarise_all(.tbl, .funs, …).funs: List of function calls generated by funs(), or a character vector of function names, or simply a function
summarise_if(.tbl, .predicate, .funs, …).predicate: A predicate function to be applied to the columns or a logical vector
summarise_at(.tbl, .vars, .funs, …).vars: A list of columns generated by vars(), a character vector of column names, a numeric vector of column positions
do(data,…)补充函数,可以聚合任何函数*(plyr::dlply)*
a<-iris%>%group_by(Species)%>%do(s=summary(.))

for example

mtcars %>% group_by(cyl) %>% summarise(a = n(), b = a + 1)
iris %>% group_by(Species) %>% summarise_all(mean)
iris %>% group_by(Species)%>% summarise_all(funs(min, max))
iris %>% group_by(Species)%>% summarise_all(c("min", "max"))
iris %>% group_by(Species)%>% summarise_all(funs(Q3 = quantile), probs = 0.75)
starwars %>% summarise_at(c("height", "mass"), mean, na.rm = TRUE)
starwars %>% summarise_at(vars(height:mass), mean, na.rm = TRUE)
starwars %>% summarise_if(is.numeric, mean, na.rm = TRUE)

排序和排名

排序说明
arrange(data,…, .by_group = FALSE)
arrange_all, arrange_at, arrange_if
desc(x)降序
排名(base::rank)
row_number(x)先出现先排名
min_rank(x)以最小的排名计
dense_rank(x)like min_rank(), but with no gaps between ranks
percent_rank(x)百分比排名
cume_dist(x)累积百分比排名(<=当前排名的比例)
ntile(x,n)分成n级的粗略排名

for example
arrange(mtcars, cyl,desc(disp))

表连接

函数说明*(base::merge)*
inner_join(x, y, by = NULL);
left_join; right_join; full_join;
1.连接键名相同时,by=c(‘id1’, ‘id2’)
2. 连接键名不相同时,by = c(“a” = “b”)
semi_join返回y中匹配到的x(x only)
anti_join返回y中没有匹配到的x(x only)
行/列合并:
bind_rows(…, .id = NULL)base::rbinds
bind_cols(…)base::colbinds

集合运算

函数说明
intersect(x, y)返回x和y交集。
union(x, y)返回x和y并集。
setdiff(x, y)返回在x,但不在y数据(差集)
setequle(x,y)逻辑,x和y是否相等

计数

函数说明*(base::nrow)*
n()记录数,summarise(), mutate() and filter() 配合函数
n_distinct(…, na.rm = FALSE)唯一值计数
tally(x, wt, sort = FALSE)计数 mtcars %>% tally()
count(x, …, wt = NULL, sort = FALSE)组内计数 group_by() + tally()
mtcars %>% count(cyl) # cyl分组计数
add_tally(x, wt, sort = FALSE)添加计数变量 mutate()+tally()
mtcars %>% add_tally()
add_count(x, …, wt = NULL, sort = FALSE)添加组内计数变量 group_by() + add_tally()
mtcars %>% add_count(cyl)

去重

distinct(.data, ..., .keep_all = FALSE)
(base::duplicated, base::unique)

  • 分组后组内去重
  • … : 指定要去重的列
  • .keep_all 是否保留全部列

df%>%group_by(col1)%>%distinct(col2,.keep_all=TRUE)

抽样(data.frame)

for data.frame:
sample_n(tbl, size, replace = FALSE)
sample_frac(tbl, size, replace = FALSE)

for vector:
base::sample(x,size,replace=FALSE,prob=NULL)

其他函数

函数说明
nth(x,n,order_by = NULL)
first();last()
top_n(x, n, wt)
提取 first, last or nth value from a vector
参数wt排序变量
lead(x,n)偏移量
lag(x,n)超前或滞后值
between(x, left, right)
if_else(condition, true, false, missing = NULL)
near(x, y)判断浮点数相等
all_vars(expr),any_vars(expr)Apply predicate to all variables
vars()Select variables
funs()Create a list of functions calls

plyr: Tools for Splitting, Applying and Combining Data

函数统一格式:

* * ply
|  --->data structure be processed
----> data structure be returned

a: array
l: list
d: data.frame
m: multiple inputs
r: repeat multiple times
_: nothing

for example

aaply(.data, .margins, .fun = NULL, ...,.progress = "none")
ddply(.data, .variables, .fun = NULL, ..., .progress = "none")
mdply(cbind(mean = 1:5, sd = 1:5), rnorm, n = 5)

参数.progress in (“none”/”text”/”tk”/”win”)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值