esoteric R | Introducing Closures

最新推荐文章于 2024-09-14 07:21:58 发布

RoQuant

最新推荐文章于 2024-09-14 07:21:58 发布

阅读量407

点赞数

分类专栏： R 文章标签： closure

R 专栏收录该内容

421 篇文章 15 订阅

订阅专栏

Jeffrey A. Ryan
January 1, 2011

The R language provides object-oriented programming through two primary systems, known as S3 and S4. S3 implements a class-based dispatch mechanism, while S4 offers a more traditional object-oriented scheme. Both implementations utilize list-style constructs for objects and separate data from methods. A third mechanism, closures, offers the programmer the option of integrating methods within objects. This can be used as a lightweight object design with benefits that neither S3 nor S4 offer.

The Basics of a Closure

A CLOSURE IN R is an object that contains functions bound to the environment the closure was created in. These functions maintain access to the scope in which they were defined, allowing for powerful design patterns that are difficult with the standard S3/S4 approach to objects in R.

To create closures, we use the environment object in R. This allows for data and methods to reside within the object instances, making self-aware behavior and selective inheritence easy. It's even possible to mix this with traditional R by assigning a class to the environment.

We'll start the exploration with an example of functionality found in other interpretted langauges — the stack1.

Example: A Stack in R

A stack implementation consists of three main components:

a container variable --- a.k.a. the stack
a push method to add elements
a pop method to remove elements

The general idea is to be able to add elements to a container, and modify the container in-place. In R this is possible using some assignment tricks into the .GlobalEnv, but it can be frought with unintended consequences. Closures offer us a perfect alternative to keep surprises to a minimum.

First, we'll create single environment that will act as the container and then add into that environment a stack vector and the two methods, push and pop.

      s <- new.env()
  
      s$.Data <- vector()
      s$push <- function(x) .Data <<- c(.Data,x)
      s$pop  <- function() {
          tmp <- .Data[length(.Data)]
          .Data <<- .Data[-length(.Data)]
          return(tmp)
        }
      ls(s, all=TRUE)
      [1] ".Data" "pop" "push"

We are using the double arrow <<- assignment operator in the pushfunction to let assignment proceed up the internal stack frame until a variable is found to bind to. This allows for non-local modifications to our .Data variable. The push method appends new data to the stack and pop removes the last element of the stack and returns it to the caller. We can use the $ operator to access the internal methods of our environment.

      s$push(1)
      Error in s$push(1) : object '.Data' not found

Oops, something is wrong. It turns out that <<– can't find the .Dataobject stored in the s object. We haven't matched the environment of the function to the object's environment. R isn't starting its search for .Data in the correct location; it needs more information. The functions environment and as.environment work well here.

      environment(s$push) <- as.environment(s)
      environment(s$pop) <- as.environment(s)

      s$push(1)   # works now
      s$pop()
      [1] 1

We can use S3 classes to create push and pop methods to make the calls look more like normal R

      push <- function(x, value, ...) UseMethod("push")
      pop  <- function(x, ...) UseMethod("pop")
      push.stack <- function(x, value, ...) x$push(value)
      pop.stack  <- function(x) x$pop()

That completes our stack object. Unfortunately, we currently need to recreate most of the above code for each new "stack" object we'd like to create. A much better approach would be to functionalize this.

      new_stack <- function() { 
        stack <- new.env()
        stack$.Data <- vector()
        stack$push <- function(x) .Data <<- c(.Data,x)
        stack$pop  <- function() {
          tmp <- .Data[length(.Data)]
          .Data <<- .Data[-length(.Data)]
          return(tmp)
        }
        environment(stack$push) <- as.environment(stack)
        environment(stack$pop) <- as.environment(stack)
        class(stack) <- "stack"
        stack
      }

Not only can we now create stacks easily, we can also use this to extend the class with new functionality via inheritance.

Example: Making a Better Stack

An interesting extension to our example comes from extending our stack object with additional "shift" and "unshift" methods. Using the new_stack constructor, we can extend the "stack" object to a new class called "betterstack".

      new_betterstack <- function() {
        stack <- new_stack()
        stack_env <- as.environment(stack)
        stack$shift   <- function(x) .Data <<- c(x, .Data)
        stack$unshift <- function() {
          tmp <- .Data[1]
          .Data <<- .Data[-1]
          return(tmp)
        }
        environment(stack$shift)   <- stack_env
        environment(stack$unshift) <- stack_env
        class(stack) <- c("betterstack", "stack")
        stack
      }

To make the experience more R like, we again add S3 methods for shift and unshift like we did for push and pop. Putting it all together gets us a nice stack-like object for R.

nb <- new_betterstack()
      push(nb, 1:3)
  
      nb$.Data
      [1] 1 2 3
  
      pop(nb) # from the back
      [1] 3
  
      unshift(nb) # from the front
      [1] 1
  
      shift(nb, 3)
      push(nb, 1)
      nb$.Data
      [1] 3 2 1

Conclusion

In this first installment on closures in R we covered a few of the basics. Creating objects using environment objects, adding methods that act on private data, and even incorporating this into the traditional S3landscape. Some simple usage patterns one may encounter would include keeping track of a 'static' data without relying on global variables (hint: create incr and decr methods for the .Data) or allowing for method overrides by instance.

In future articles we'll examine some of the more nuanced behavior of closures in general, as well explore how R's implementation is different from implementations in other well know programming langauges.

About the author

Jeffrey Ryan is the founder of lemnica corp., a Chicago firm specializing in statistical software, training, and on-demand support. He helps organize the R/Finance conference series [www.RinFinance.com], and is a frequent speaker on software related topics. He is the author or co-author of a variety of R packages involving finance, large data, and visualizations including quantmod, xts, Defaults, IBrokers, RBerkeley, mmap, and indexing. He currently lives in Chicago, Illinois with his wife and three children.