1. Some prerequisites
Assume you find some R functions become bottlenecks in your R scripts and dramatically reduce the speed of your code, you can consider transforming them into some equivalent C++ function. You can store these C++ equivalents in a “.cpp” file and call them in your R script with sourceCpp()
.
In R source file, where you want to call the functions in test.cpp
file, use the following code
library(Rcpp)
Sys.setenv("PKG_CXXFLAGS"="-std=c++11")
sourceCpp("test.cpp")
The second line above can let us use C++11 with Rcpp.
In test.cpp
file, use the following code
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
After, //[[Rcpp::export]]
, you can add your customized C++ functions.
2. Some examples of Rcpp functions with different types of input and output
2.1. Scalar input and Scalar output
In the first example, we implement a univariate sign(x) function which return 1 if x is positive, -1 if x is negative and 0 if x is 0.
R code:
signR <- function(x) {
if (x > 0) {
1
} else if (x == 0) {
0
} else {
-1
}
}
C++ equivalent:
int signC(int x) {
if (x > 0) {
return 1;
} else if (x == 0) {
return 0;
} else {
return -1;
}
}
2.2. Vector input and Scalar output
Here we want to calculate the summation of a vector:
R code:
sumR <- function(x) {
total <- 0
for (i in seq_along(x)) {
total <- total + x[i]
}
total
}
C++ equivalent:
double sumC(NumericVector x) {
int n = x.size();
double total = 0;
for(int i = 0; i < n; ++i) {
total += x[i];
}
return total;
}
sumR()
will be several orders of magnitude slower thansumC()
.- use
.size()
to calculate the length of a vector.
2.3. Vector input and Vector output
In this example, we computes the Euclidean distance between a value and a vector of values.
R code:
pdistR <- function(x, ys) {
sqrt((x - ys) ^ 2)
}
C++ equivalent:
NumericVector pdistC(double x, NumericVector ys) {
int n = ys.size();
NumericVector out(n);
for(int i = 0; i < n; ++i) {
out[i] = sqrt(pow(ys[i] - x, 2.0));
}
return out;
}
- In C++, use
pow()
instead of^
. - We create a new numeric vector of length n with a constructor:
NumericVector out(n)
. Another useful way of making a vector is to copy an existing one:NumericVector zs = clone(ys)
.
2.4. Matrix input and Vector output
In this example, we calculate the row sums of a matrix with rowSums() and a C++ equivalent.
cppFunction('NumericVector rowSumsC(NumericMatrix x) {
int nrow = x.nrow(), ncol = x.ncol();
NumericVector out(nrow);
for (int i = 0; i < nrow; i++) {
double total = 0;
for (int j = 0; j < ncol; j++) {
total += x(i, j);
}
out[i] = total;
}
return out;
}')
set.seed(1014)
x <- matrix(sample(100), 10)
rowSums(x)
## [1] 458 558 488 458 536 537 488 491 508 528
rowSumsC(x)
## [1] 458 558 488 458 536 537 488 491 508 528
- Each vector type has a matrix equivalent:
NumericMatrix
,IntegerMatrix
,CharacterMatrix
, andLogicalMatrix
. - In C++, you subset a matrix with (), not [].
- Use
.nrow()
and.ncol()
methods to get the dimensions of a matrix.
2.5. Matrix input and Matrix output
We have two p×q matrices and we want to calculate the sum of these two matrices. I do it elementwisely.
NumericMatrix sumC(NumericMatrix x, NumericMatrix y){
int nrow=x.nrow(), ncol=x.ncol();
NumericMatrix out(nrow,ncol);
for (int i=0;i<nrow;++i){
for (int j=0;j<ncol;++j){
out(i,j)=x(i,j)+y(i,j);
}
}
return out;
}
- We create a
p×q
matrix with
NumericMatrix out(p,q);
.
3. Core data types
3.1. Rcpp Classes
Here are some commonly used Rcpp classes.
Rcpp Classes | Detailes |
---|---|
IntegerVector | for vectors of type integer |
NumericVector | for vectors of type numeric |
LogicalVector | for vectors of type logical |
CharacterVector | for vectors of type character |
GenericVector | for generic vectors which implement List types |
ExpressionVector | for vectors of expression types |
RawVector | for vectors of type raw |
IntegerMatrix | for matrix of type integer |
NumericMatrix | for matrix of type numeric |
List | for lists |
DataFrame | for data frames |
3.2. Vector Classes
Vector is one of the most commonly used classes in Rcpp. Rcpp functions can directly accept R vectors as input and output a Rcpp Vector classes (will be automatically transformed to R vectors when call this function in R). Moreover, if the Rcpp function receives a list, for example, which contains a R vector, and we want to assign this R vector to a C++ object, we can use as<>()
for this conversion. For instance,
X=as<NumericVector>(mod["residuals"]);
where X
is C++ vector and mod
is R list which contains a R vector mod["residuals"]
. Also, we can use wrap()
function for the inverse direction, but this is often done implicitly when return a C++ vector.
3.2.1. The NumericVector Class
NumericVector
is quite the most commonly used vector type among Rcpp classes. This object corresponds to the numeric R vector which stores real-values floating-point variables. Its storage is double
.
Initialization:
NumericVector x(n);
Here we initialized a
NumericVector
calledx
which has lengthn
(n
is selective).Slicing:
We can take the i th element ofx
asx[i]
.Methods:
x.size()
: length of the vector.
x.push_back()
: add a new element in the end
x.push_front()
: add a new element in frontExample:
Let’s calculate the exponential of each element of a vector with a given exponent.// [[Rcpp::export]] NumericVector powC(NumericVector x, double y){ NumericVector out(x.size()); for (auto i=x.begin(),j=out.begin();i!=x.end();++i,++j){ *j=pow((*i),y); } for (int i=0;i<out.size();++i){ std::cout<<out[i]<<" "; } std::cout<<std::endl; return out; } /*** R x=rnorm(5,0,1) y=2; C_square=powC(x,y); R_square=x^2; cbind(x,R_square,C_square) */ > C_square=powC(x,y); 0.547511 0.107676 0.471773 2.08374 0.0868845 > cbind(x,R_square,C_square) x R_square C_square [1,] -0.7399398 0.54751089 0.54751089 [2,] -0.3281404 0.10767615 0.10767615 [3,] -0.6868573 0.47177289 0.47177289 [4,] 1.4435153 2.08373656 2.08373656 [5,] 0.2947617 0.08688446 0.08688446
3.2.2. Multidimensional arrays induced by vector
Array also plays an important roles in modeling, and internally, array is implemented by vector with dimension attributes. Matrix classes we will talk later is a special case of array with two dimensional attributes.
- Initialization:
NumericVector out=NumericVector(Dimension(2,2,3));
3.2.3. Other Vector Class
LogicalVector
// [[Rcpp::export]] LogicalVector logicalVector(){ LogicalVector out(5); out[0]=false; out[1]=true; out[2]=R_NaN; out[3]=R_PosInf; out[4]=NA_REAL; return out; } /*** R logicalVector() */ > logicalVector() [1] FALSE TRUE NA NA NA
CharacterVector
// [[Rcpp::export]] CharacterVector charVector(){ CharacterVector out(3); out[0]="Hello"; out[1]="A"; out[2]="World"; return out; } /*** R charVector() */ > charVector() [1] "Hello" "A" "World"
3.3. Matrix Classes
As we mentioned above, each vector has a matrix equivalent. Here we only talk about NumericMatrix
.
3.3.1 NumericMatrix
Initialization:
NumericMatrix mat(nrow,ncol);
Here we initialized a
NumericMatrix
calledmat
which hasnrow
rows andncol
columns.Slicing:
We can take the(i,j) th element ofmat
asmat(i,j)
. If we want to make some change on a subset of a matrix, we may need other library such as RcppArmadillo.Method:
mat.ncol
: number of columns
mat.nrow
: number of rows.
3.4. List aka GenericVector
List
is known as a container for data with different type which is the same as the list object in R.
Example 1 (
List
to retrieve parameters from R)// [[Rcpp::export]] double retrieve_list(List mod) { double out=0; NumericVector x=as<NumericVector>(mod["a"]); NumericVector y=as<NumericVector>(mod["b"]); for (auto i=x.begin(),j=y.begin();i!=x.end()&&j!=y.end();++i,++j){ out+=(*i)/((*i)+(*j)); } return out; } /*** R a=c(1,4,3); b=c(12,23,24); mod=list(a=a,b=b); retrieve_list(mod) */ > retrieve_list(mod) [1] 0.3361823
Example 2 (
List
to return parameters to R)// [[Rcpp::export]] List return_list() { NumericMatrix y1(2,2); NumericVector y2(5); NumericVector tmp=rnorm(4); for (int i=0;i<2;++i){ for (int j=0;j<2;++j){ y1(i,j)=tmp[i+j]; } } y2=rnorm(5); return List::create(Named("y1")=y1, Named("y2")=y2); } /*** R return_list() */ > return_list() $y1 [,1] [,2] [1,] 2.838554 -1.1276921 [2,] -1.127692 -0.5148732 $y2 [1] -0.1708179 0.6451966 -0.5324484 -1.3813581 0.2408375
3.5. Data Frames
Data frames are an essential object type in R and are used by almost all modeling functions, so naturally Rcpp supports this type too. Internally, data frames are represented as lists.
// [[Rcpp::export]]
DataFrame create_DataFrame()
{
NumericVector a=NumericVector::create(1,2,3);
CharacterVector b=CharacterVector::create("a","b","c");
std::vector<std::string> c(3);
c[0]="A";c[1]="B";c[2]="C";
return DataFrame::create(Named("col1")=a,
Named("col2")=b,
Named("col3")=c);
}
/*** R
create_DataFrame()
*/
> create_DataFrame()
col1 col2 col3
1 100 a A
2 200 b B
3 300 c C
4. Rcpp Sugar
4.1. Operators
4.1.1. Binary arithmetic operators
Rcpp sugar defines the usual binary arithmetic operators : +, -, *, /.
// two numeric vectors of the same size
NumericVector x ;
NumericVector y ;
// expressions involving two vectors
NumericVector res = x + y ;
NumericVector res = x - y ;
NumericVector res = x * y ;
NumericVector res = x / y ;
// one vector, one single value
NumericVector res = x + 2.0 ;
NumericVector res = 2.0 - x;
NumericVector res = y * 2.0 ;
NumericVector res = 2.0 / y;
// two expressions
NumericVector res = x * y + y / 2.0 ;
NumericVector res = x * ( y - 2.0 ) ;
NumericVector res = x / ( y * y ) ;
4.1.2. Binary logical operators
Binary logical operators create a logical sugar expression from either two sugar expressions of the same type or one sugar expression and a primitive value of the associated type.
// two integer vectors of the same size
NumericVector x ;
NumericVector y ;
// expressions involving two vectors
LogicalVector res = x < y ;
LogicalVector res = x > y ;
LogicalVector res = x <= y ;
LogicalVector res = x >= y ;
LogicalVector res = x == y ;
LogicalVector res = x != y ;
// one vector, one single value
LogicalVector res = x < 2 ;
LogicalVector res = 2 > x;
LogicalVector res = y <= 2 ;
LogicalVector res = 2 != y;
// two expressions
LogicalVector res = ( x + y ) < ( x*x ) ;
LogicalVector res = ( x + y ) >= ( x*x ) ;
LogicalVector res = ( x + y ) == ( x*x ) ;
4.1.3. Unary operators
The unary operator- can be used to negate a (numeric) sugar expression, whereas the unary operator! negates a logical sugar expression:
4.2. Functions
is.na()
Produces a logical sugar expression of the same length. Each element of the result expression evaluates to TRUE if the corresponding input is a missing value, or FALSE otherwise.seq_len()
seq_len( 10 )
will generate an integer vector from 1 to 10 (Note: not from 0 to 9), which is very useful in conjugation withsapply()
andlapply()
.pmin(a,b)
andpmax(a,b)
a
andb
are two vectors.pmin()
(orpmax()
) compares the i <script type="math/tex" id="MathJax-Element-5">i</script>th elements ofa
andb
and return the smaller (larger) one.ifelse()
ifelse( x > y, x+y, x-y )
means ifx>y
is true, then do the addition; otherwise do the subtraction.sapply()
sapply
applies a C++ function to each element of the given expression to create a new expression. The type of the resulting expression is deduced by the compiler from the result type of the function.
The function can be a free C++ function such as the overload generated by the template function below:template <typename T> T square( const T& x){ return x * x ; } sapply( seq_len(10), square<int> ) ;
Alternatively, the function can be a functor whose type has a nested type called result_type
template <typename T>
struct square : std::unary_function<T,T> {
T operator()(const T& x){
return x * x ;
}
}
sapply( seq_len(10), square<int>() ) ;
lappy()
lapply
is similar to sapply except that the result is allways an list expression (an expression of typeVECSXP
).sign()
diff()
other functions
IntegerVector x; abs( x ) exp( x ) floor( x ) ceil( x ) pow(x, z) # x to the power of z
Reference
- Eddelbuettel, D. (2013). Seamless R and C++ Integration with Rcpp. Springer Publishing Company, Incorporated.
- Allaire, J.J. (2016). Rcpp Attributes.
- Eddelbuettel, D. (2016). Rcpp syntactic sugar.