R Programming - Scoping Rules

skyCeleste.x

已于 2022-05-21 00:58:31 修改

阅读量191

点赞数

文章标签： r语言开发语言

于 2022-05-13 21:43:16 首次发布

本文链接：https://blog.csdn.net/jeonghin/article/details/124757504

版权

binding values to symbol

The global environment or the user’s workspace is always the first element of the search list and the base package is always the last
The order of the packages on the search list matters
User’s can configure which packages get loaded on startup so you cannot assume that there will be a set list of packages available
When a user loads a package with library the namespace of that package gets put in position 2 of the search list (by default) and everything else gets shifted down the list
Note that R has separate namespaces for functions and non-functions so it’s possible to have an object named c and a function named c

Scoping Rules

Scoping is the mechanism within R that determines how R finds symbols (i.e. programming language elements) to retrieve their values during the execution of an R script.

Scoping rules for R are the main feature that make it different from the original S language

The scoping rules determine how a value is associated with a free variable in a function
R uses lexical scoping or static scoping. A common alternative is dynamic scoping
Related to the scoping rules is how R uses the search list to bind a value to a symbol
Lexical scoping turns out to be particularly useful for simplifying statistical computations

R supports two types of scoping: lexical scoping and dynamic scoping. Dynamic scoping is primarily used within functions to save typing during interactive analysis. Lexical scoping is used to retrieve values from objects based on the way functions are nested when they were written.

Lexical scoping

Consider the following function

f <- function(x, y) {
		x^2 + y / z
}

This function has 2 formal arguments x and y. In the body of the function there is another symbol z. In this case z is called a free variable. The scoping rules of a language determine how values are assigned to free variables. Free variables are not formal arguments and are not local variables(assigned inside the function body)

Lexical scoping in R means that the values of free variables are searched for in the environment in which the function was defined

what is an environment?

An environment is a collection of (symbol, value) pairs, i.e. x is a symbol and 3.14 might be its value
Every environment has a parent environment; it is possible for an environment to have multiple “children”
the only environment without a parent is the empty environment
A function + an environment = a closure or function closure

Searching for the value for a free variable:

If the value of a symbol is not found in the environment in which a function was defined, then the search is continued in the parent environment.
The search continues down the sequence of parent environments until we hit the top-level environment; this usually the global environment (workspace) or the namespace of a package.
After the top-level environment, the search continues down the search list until we hit the empty environment. If a value for a given symbol cannot be found once the empty environment is arrived at, then an error is thrown.

Why does all this matter?

Typically, a function is defined in the global environment, so that the values of free variables are just found in the user’s workspqce
This behavior is logical for most people and is usually the “right thing” to do
However, in R you can have functions defined inside other functions
- Languages like C don’t let you do this
Now things g et interesting - in this case the environment in which a function is defined is the body of another function!

make.power <- function(n) {
        pow <- function(x) {
                x^n
        }
        pow
}

This function returns another function as its value

> cube <- make.power(3)
> square <- make.power(2)
> cube(3)
[1] 27
> square(3)
[1] 9

Exploring a Function CLOSURE

What’s in a function’s environment?

> ls(environment(cube))
[1] "n"   "pow"
> get("n", environment(cube))
[1] 3

> ls(environment(square))
[1] "n"   "pow"
> get("n", environment(square))
[1] 2

Lexical vs. Dynamic Scoping

y <- 10
f <- function(x) {
        y <- 2
        y^2 + g(x)
}

g <- function(x) {
        x*y
}

> f(3)
[1] 34

With lexical scoping the value of y in the function g is looked up in the environment in which the function was defined, in this case the global environment, so the value of y is 10
With dynamic scoping, the value of y is looked up in the environment from which the function was called (sometimes referred to as the calling environment)
- In R the calling environment is known as the parent frame
So the value of y would be 2

When a function is defined in the global environment and is subsequently called from the global environment, then the defining environment and the calling environment are the same. This can sometimes give the appearance of dynamic scoping

> g <- function(x) {
+   x*y
+ }
> g(2)
> Error in g(2) : object "y" not found
> y <- 3
> g(2)
> [1] 8

Other Languages

other languages that support lexical scoping

Scheme
Perl
Python
Common Lisp (all languages converge to Lisp)

Consequence of Lexical Scoping

In R, all objects must be stored in memory
All functions must carry a pointer to their respective defining environments, which could be anywhere
In S-PLUS, free variables are always looked up in the global workspace, so everything can be stored on the disk because the “defining environment” of all functions is the same

Application: Optimization

Why is any od this information useful?

Optimization routines in r like optimal, nom, and optimize require you to pass a function whose argument is a vector of parameters (e.r. a log-likelihood)
However, an object function might depend on a host of other things besides its parameters (like data)
When writing software which does optimization, it may be desirable to allow the user to hold certain parameters fixed

Maximizing a Normal Likelihood

Write a “constructor” function

make.NegLogLik <- function(data, fixed=c(FALSE,FALSE)){
		params <- fixed
		function(p) {
				params[!fixed] <- p
				mu <- params[1]
				sigma <- params[2]
				a <- -0.5*length(data)*log(2*pi*sigma^2)
				b <- -0.5*sum((data-mu)^2 / (sigma^2) - (a+b)
		}
}

Note: Optimization functions in R minimize functions, so you need to use the negative log-likelihood

set.seed(1); normals <- rnorm(100, 1, 2)
nLL <- make.NegLogLik(normals)
nLL
function(p) {
				params[!fixed] <- p
				mu <- params[1]
				sigma <- params[2]
				a <- -0.5*length(data)*log(2*pi*sigma^2)
				b <- -0.5*sum((data-mu)^2 / (sigma^2) - (a+b)
		}
environment: 0x165bla4
ls(environment(nLL))
[1] "data"  "fixed". "params"

Estimating Parameters

optim(c(mu = 0, sigma = 1), nLL)$par
	mu    sigma
1.218239  1.787343

fixing a = 2

nLL <- make.NegLogLik(normal, c(FALSE, 2))
optimize(nLL, c(-1, 3))$minimum
[1] 1.217775

nLL <- make.NegLogLik(normal, c(1, FALSE))
optimize(nLL, c(1e-6, 10))$minimum
[1] 1.800596

Plotting the Likelihood

nLL <- make.NegLogLik(normal, c(1, FALSE))
x <- seq(1.7, 1.9, len=100)
y <- sapply(x, nLL)
plot(x, exp(-(y-min(y))), type="1")

nLL <- make.NegLogLik(normal, c(FALSE, 2))
x <- seq(0.5, 1.5, len=100)
y <- sapply(x, nLL)
plot(x, exp(-(y-min(y))), type="1")

Coding Standards

always use text files / text editor
indent your code
limit the width of your code (80 columns?)
limit the length of your individual functions

Dates and Times in R

R has developed a special representation of dates and times

dates are represented by the Date class
Times are represented by the POSIXct or the POSIXlt class
Dates are stored internally as the number of days since 1970-01-01
Times are stored internally as the number of seconds since 1970-01-01

> x <- as.Date("1970-01-01")
> x
[1] "1970-01-01"
> unclass(x)
[1] 0
> unclass(as.Date("1970-01-02"))
[1] 1

Times in R

Times are represented using the POSIXct or the POSIXlt class

POSIXct is just a very large integer under the hood; it use a useful class when you want to store times in something like a data frame
POSUXlt is a list underneath and it stores a bunch of other useful information like the day of the week, day of the year, month, day of the month

There are a number of generic functions that work on dates and times

weekdays: give the day of the week
months: give the month name
quarters: give the quarter number (“Q1”, “Q2”, “Q3”, or “Q4”)

> x <- Sys.time()
> x
[1] "2022-05-13 21:03:02 CST"
> p <- as.POSIXlt(x)
> names(unclass(p))
 [1] "sec"    "min"    "hour"   "mday"   "mon"    "year"  
 [7] "wday"   "yday"   "isdst"  "zone"   "gmtoff"
> p$sec
[1] 2.773116

Times can be coerced from a character string using the as.POSIXlt or as.POSIXct function

> x <- Sys.time()
> x
[1] "2022-05-13 21:05:18 CST"
> unclass(x)
[1] 1652447118
> x$sex
Error in x$sex : $ operator is invalid for atomic vectors
> p <- as.POSIXlt(x)
> p$sec
[1] 18.18829

Finally, there is the strptime function in case your dates are written in a different format

> datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10")
> x <- strptime(datestring, "%B %d, %Y %H:%M")
> x
[1] "2012-01-10 10:40:00 CST" "2011-12-09 09:10:00 CST"
> class(x)
[1] "POSIXlt" "POSIXt"

Operations on Dates and Times

You can use mathematical operations on dates and times. Well, really just + and -. You can do comparisons too (i.e. ==, <=)

> x <- as.Date("2012-01-01")
> y <- strptime("9 Jan 2011 11:34:21", "%d %b %Y %H:%M:%S")
> x-y
Error in x - y : non-numeric argument to binary operator
In addition: Warning message:
Incompatible methods ("-.Date", "-.POSIXt") for "-" 
> x <- as.POSIXlt(x)
> x-y
Time difference of 356.8511 days

Even keeps track of leap years, leap seconds, daylight savings, and tome zones

> x <- as.Date("2012-03-01") 
> y <- as.Date("2012-02-28")
> x-y
Time difference of 2 days
> x <- as.POSIXct("2012-10-25 01:00:00")
> y <- as.POSIXct("2012-10-25 06:00:00", tz="GMT")
> y-x
Time difference of 13 hours