dplyr do: Some Tips for Using and Programming

421 篇文章 15 订阅

This post aims to explore some basic concepts of do(), along with giving some advice in using and programming.

do() is a verb (function) of dplyrdplyr is a powerful R package for data manipulation, written and maintained by Hadley Wickham. This package allows you to perform the common data manipulation tasks on data frames, like: filtering for rows, selecting specific columns, re-ordering rows, adding new columns, summarizing data and computing arbitrary operations.

First of all, you have to install dplyr package:

install.packages("dplyr")

and to load it:

require(dplyr)

We will analyze the use of do() with the following dataset, created with random data:

set.seed(100)
ds <- data.frame(group=c(rep("a",100), rep("b",100), rep("c",100)), 
                 x=rnorm(n = 300, mean = 3, sd = 2), y=rnorm(n = 300, mean = 2, sd = 2))

We firstly transform it into a tbl_df object to achieve a better print method. No changes occur on the input data frame.

ds <- tbl_df(ds)
ds

Source: local data frame [300 x 3]

    group        x           y
   (fctr)    (dbl)       (dbl)
1       a 1.995615 -1.71089045
2       a 3.263062 -0.03712943
3       a 2.842166 -0.09022217
4       a 4.773570  0.69742469
5       a 3.233943  2.76536531
6       a 3.637260  4.06379942
7       a 1.836419  2.26214995
8       a 4.429065  2.75438347
9       a 1.349481 -1.77539016
10      a 2.280276  3.04043881
..    ...      ...         ...

Base Concepts of do() (Non Standard Evaluation Version)

As we already said, do() computes arbitrary operations on a data frame returning more than one number back.

To use do(), you must know that:

  • it always returns a dataframe
  • unlike the others data manipulation verbs of dplyrdo()needs the specification of . placeholder inside the function to apply, referring to the data it has to work with.
    # Head of ds
    ds %>% do(head(.))

    Source: local data frame [6 x 3]
    
       group        x           y
      (fctr)    (dbl)       (dbl)
    1      a 1.995615 -1.71089045
    2      a 3.263062 -0.03712943
    3      a 2.842166 -0.09022217
    4      a 4.773570  0.69742469
    5      a 3.233943  2.76536531
    6      a 3.637260  4.06379942
  • it is conceived to be used with dplyr group_by() to compute operations within groups:
    # Head of ds by group
    ds %>% group_by(group) %>% do(head(.))

    Source: local data frame [18 x 3]
    Groups: group [3]
    
        group          x           y
       (fctr)      (dbl)       (dbl)
    1       a 1.99561530 -1.71089045
    2       a 3.26306233 -0.03712943
    3       a 2.84216582 -0.09022217
    4       a 4.77356962  0.69742469
    5       a 3.23394254  2.76536531
    6       a 3.63726018  4.06379942
    7       b 2.33415330 -0.56965729
    8       b 5.72622741  1.71643653
    9       b 2.06170532  4.87756954
    10      b 4.68575126 -0.08011508
    11      b 0.08401255 -0.04767590
    12      b 2.19938816  4.18954758
    13      c 3.05634353 -0.89257491
    14      c 2.28659319  2.63171152
    15      c 4.70525275  1.31450497
    16      c 4.02673050 -1.86270620
    17      c 5.03640599  2.48564201
    18      c 0.95704183  1.27446410
  • the argument of do() can be named or unnamed:
    • named arguments (more than one supplied) become list-columns, with one element for each group:
    # Tail (last 3 obs) of x by group
    ds %>% group_by(group) %>% do(out=tail(.$x, 3))

    Source: local data frame [3 x 2]
    Groups: <by row>
    
       group      out
      (fctr)    (chr)
    1      a <dbl[3]>
    2      b <dbl[3]>
    3      c <dbl[3]>

    • unnamed argument (only one supplied) must be a data frame and labels will be duplicated accordingly:
    # Tail (last 3 obs) of x by group
    ds %>% group_by(group) %>% do(data.frame(out=tail(.$x, 3)))

    Source: local data frame [9 x 2]
    Groups: group [3]
    
       group       out
      (fctr)     (dbl)
    1      a 3.8270397
    2      a 0.6426337
    3      a 0.6519305
    4      b 3.3238824
    5      b 0.8290942
    6      b 4.1538746
    7      c 6.5861213
    8      c 4.6280643
    9      c 0.3599512

Its use is the same working with customized functions.

Let us define the following function, which performs two simple operations returning a data frame:

my_fun <- function(x, y){
  res_x = mean(x) + 2
  res_y = mean(y) * 5 
  return(data.frame(res_x, res_y))
}

If the argument is named the result is:

# Apply my_fun() function to ds by group
ds %>% group_by(group) %>% do(out=my_fun(x=.$x, y=.$y))

Source: local data frame [3 x 2]
Groups: <by row>

   group                out
  (fctr)              (chr)
1      a <data.frame [1,2]>
2      b <data.frame [1,2]>
3      c <data.frame [1,2]>

Otherwise, if argument is unnamed the result is:

# Apply my_fun() function to ds by group
ds %>% group_by(group) %>% do(my_fun(x=.$x, y=.$y))

Source: local data frame [3 x 3]
Groups: group [3]

   group    res_x     res_y
  (fctr)    (dbl)     (dbl)
1      a 5.005825  9.167546
2      b 5.022282  8.683619
3      c 5.025586 11.240558

Programming with do_() (Standard Evaluation Version)

How can we enclose the previous operations inside a function? Simple! Using do_() (the SE version of do()) and interp() function of lazyeval package.

Continue reading on Quantide blog…

The post dplyr do: Some Tips for Using and Programming appeared first on MilanoR.

1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 、4下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合;、下载 4使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合;、 4下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值