Rcpp的开始<Getting Started with Rcpp> Nick Ulle

最新推荐文章于 2024-05-10 09:31:20 发布

ronghuilin

最新推荐文章于 2024-05-10 09:31:20 发布

阅读量1.3k

点赞数

分类专栏： R语言混合编程 C++ Rcpp 文章标签： R语言混合编程 C++

R语言同时被 3 个专栏收录

19 篇文章 5 订阅

订阅专栏

C++

2 篇文章 0 订阅

订阅专栏

混合编程

1 篇文章 0 订阅

订阅专栏

R语言混合编程

C和C++语言的混合编程

1.Introduction

Compiled C and C++ routines can be called from R using the built-in .

R可用调用内置函数编译C和C++例程。

R objects passed to these routines have type SEXP. A SEXP is a pointer to an encapsulated structure that holds the object’s type, value, and other attributes used by the R interpreter.

这些例程是SEXP的R对象。SEXP对象是一个封装结构，具有类型、值和属性，由R解释器使用。

The R application programming interface (API) provides a limited set of macros and C routines for manipulating SEXPs and calling R functions.

R的“应用程序接口”API提供有限宏(左右）定义集合和C例程，以实现维护SEXP对象并且调用R语言函数。

The level of abstraction in the R API is low. Even simple tasks may require writing lengthy boilerplate code.

R API是简易的，简单任务也必须编写漫长的样板代码。

Using the R API from C++ is especially uncomfortable, because it doesn’t take advantage of any of C++’s features.

在C++中使用R API是不令人高兴的，因为它们没有任何C++的特性。

Rcpp is an R package that makes it easier to interface R and C++ code. Rcpp does this by providing a set of C++ wrapper classes for common R data types, as well as tools for automating the process of compiling and loading C++ routines for R.

Rcpp提供常见R数据类型的 C++ 包装类的集合与编译加载C++例程的工具。

2.文章中的例子

Create a blank text ﬁle and enter the code:

创建一个空白.txt文件，然后输入源码:

#include <Rcpp.h>
// [[Rcpp::export]]

void hello()

{

Rprintf("Hello, world! ");

}

Save the ﬁle as hello.cpp.

保存文件名为hello.cpp。

Rprintf()是R API。The syntax is the same as printf.

Time to test the code! Start R and enter the commands:

在R语言编程环境中，键入命令：

library(Rcpp)

sourceCpp("hello.cpp") #编译hello.cpp文件

hello()

You should see “Hello, world!” printed on the R console.

在显示器R console上将看到"hello,world"字样。

3.The Rcpp Interface

3.1 Data Structures

Most of Rcpp’s functionality is provided through a set of C++ classes that wrap R data structures.A few of them are:

Rcpp 的大部分功能通过一组包装R 数据结构的C++类提供。有几个是：

• IntegerVector, NumericVector, LogicalVector, CharacterVector

整数向量，数值向量，逻辑向量，字符串向量

• List, DataFrame

列表，数据框

• Named, Dimension

命名，维度

• IntegerMatrix, NumericMatrix

整数矩阵，数值矩阵

• Function

函数

• Environment

环境

Memory management is handled automatically by the class constructors and destructors. These classes also have methods that mimic various R functions. A few of the most

内存管理是类构造函数和解析函数负责处理。这些类也有基本成员函数模仿各种R函数。最常用的一些方法是：

useful methods are:

• isNULL

判断空

• attributeNames, hasAttribute, attr

属性标签，属性，属性设置

• length, nrow, ncol

长度，行值，列值

The vector and list classes have constructors that accept the number of elements as a parameter, similar to their counterparts in R.

向量和列表类有类构造函数将成员元素作为参数，与R的对应函数类似。(注释：列表是特殊向量）

The helper class Dimension can be used to create a multidimensional vector:

Rcpp的辅助类"Dimension"能用在创建一个多维向量：

// Create a 2-by-3-by-4 vector.

NumericVector a = NumericVector( Dimension(2, 3, 4) ); #创建数值向量a，有维度(2,3,4)

They also have a static create method, for specifying the elements of the new vector. The helper class Named represents named vector elements. For instance,

它们也有统计创建成员函数，明确规定新的向量的元素。辅助类命名“Named"表示标签向量的元素值。？

IntegerVector q1_days = IntegerVector::create(

Named("January") = 31, #赋值january=31

Named("February") = 28,

Named("March") = 31

);

creates an integer vector with 3 named elements.

创建一个整数标签有三个标签元素。

3.3 Other Details

Rcpp converts R objects to and from C++ objects with the templated routines as and wrap, respectively. It’s rarely necessary to call these routines explicitly, but since Rcpp makes frequent implicit use of them, it’s important to know what they do.

Rcpp软件包转换R对象与C++对象应用模板例程。Rcpp常常含蓄地使用这些例程，尽管几乎从不明说地调用，因此了解Rcpp的运行机制就是重要的。

The clone routine makes a copy of an Rcpp object. Since C++ uses reference semantics, you must explicitly call clone when you want to make a copy.

“克隆”例程复制Rcpp对象的一个副本。由于C++使用引用语义，因此必须在产生副本时明说调用“克隆”。

/**

引用是变量的别名，因此C++编译器用特殊的编译方法为引用分配内存空间，而引用是不分配内存空间的。

**/

Missing values can be speciﬁed with the constants NA_INTEGER, NA_REAL, NA_LOGICAL, and NA_STRING. The special values NaN, Inf, and -Inf can be speciﬁed with the constants R_NaN, R_PosInf, and R_NegInf. These constants all come from the R API rather than Rcpp.

缺失值应被规范表示为NA_INTEGER, NA_REAL, NA_LOGICAL, 和NA_STRING常数量。而特殊值NaN, Inf, 和-Inf应被表示为R_NaN, R_PosInf, 和R_NegInf常量。这些常量产生在R API而不是Rcpp。

4.Programming Strategy 编程战略

Generally speaking, you should write most of your code in R, to take advantage of its high level of abstraction. Then you can proﬁle your code to identify bottlenecks where R is unacceptably slow, and replace those sections with C++ code for a performance boost. The most straightforward way to do this is to rewrite an entire function. As long as your C++ routine has the same call signature as the R function it replaces, the change should be invisible to the rest of your application.

写程序的大部分内容用R语言，利用它的高等级抽象思维。然后将发现的R不可接受的慢的程序段，用C++程序代替，提高程序性能。最好的方法是重写一个完整函数。只要调用签名相同，函数的更新是不可见的，因此并不破坏剩下的部分。

5.Example: Row Maximums

Suppose we want to compute the maximum element of each row in a matrix. To achieve this, we loop over each row of the matrix and use the sugar routine max:
计算矩阵每一行元素的最大值。对每一行设置循环，并且应用sugar例程。

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]

NumericVector row_max(NumericMatrix m) ##计算矩阵每一行的最大值

{

int nrow = m.nrow(); ##行数nrow

NumericVector max(nrow); ##声明max数组，用圆括号表示
for (int i = 0; i < nrow; i++) // Get row i with m(i, _).

max[i] = Rcpp::max( m(i, _) ); ##调用max()计算每一行的最大值，保存在数组max[]中
return max; ##返回值max数组

}

Notice that the matrix classes in Rcpp use parentheses ( ) as the subset operator rather than square brackets

[ ]. This is due to limitations in C++.

Rcpp中的matrix类使用圆括号()作为子集合运算符而不是方括号。这是C++的限制。

6.Example: Box Packing 背包问题

Suppose we want to simulate a discrete box-packing Markov chain. At each time step, an item with weight randomly distributed in {1,...,w} arrives for packing. Items are placed in the same box so long as the box weight does not exceed w. If an item would make the current box’s weight exceed w, a new box is started with that item. We might be interested in the weight of the current box at each time step, as well as which times a new box is started.

假设我们想模拟一个离散包装箱的马尔科夫链（背包问题）。在每一步，到达一个为了装箱的物品，重量具有随机分布特征{1,...,w}。只要箱子重量不超过w，则多个物品重量放置在同一个箱子中。如果一件物品的重量使当前箱子的重量超过w，则将放到一个新的箱子中。我们感兴趣的是每一个新的箱子的重量，和新的箱子的开始时间。

A simulation of the box-packing chain can be implemented in R, but suppose we want to run the simulation for a large number of time steps in order to estimate long-run statistics. In that case, the simulation might be unacceptably slow. We can use C++ and Rcpp to write a much faster version.

包装箱链的模拟能在R语言环境中实施，但是设想我们希望运行模拟在一个大数量的时间步上，为了估计长期运行概率。在这个条件下，模拟可能不可接受地慢。我们能使用C++和Rcpp写一个更快的版本。

Implementation

Create a blank text ﬁle and enter the code skeleton: 创建.txt文件，输入源码框架

#include <Rcpp.h> using namespace Rcpp;
// [[Rcpp::export]]

List pack_boxes(int n, NumericVector p) {
// ...
}

The pack_boxes routine will contain our simulation. It needs to sample item weights, add each item weight to the previous time step’s box weight, and then check whether the box is too heavy, starting a new box when necessary. The routine has parameters n, the number of steps to simulate, and p, the probabilities of the item weights. We don’t need to make w a parameter, since w can be inferred from the length of p. The routine has return type List. Rcpp implicitly converts between SEXP and these input/output types.

包装箱进度将包括我们的模拟。它需要收集物品重量、增加每一个物品重量到上一个时间步的箱子重量中，并且
检查箱子是否过重，在必要的时候开始一个新的箱子。这个进度表有参数n，模拟的时间步，和p，物品重量的概率。我们不需要设置参数w，既然w能从p的长度推断出。这个例程返回list类型。Rcpp执行在SEXP和输入输出类的转换。

If we were implementing the simulation in R, we could sample the item weights with the sample function. The R API doesn’t have a corresponding C routine. Fortunately, Rcpp’s Function class makes calling R functions from C++ simple. The constructor takes the name of the desired function as parameter. After creating a Function object for sample, we can call it with the same parameters as the original R function. A word of caution: calling R functions from C++ code is at least as slow as calling them from R itself, so use them sparingly.

如果在R语言环境中我们执行模拟，将应用收集函数收集物品的重量。R API没有C语言例程。幸运的是，Rcpp的函数类能从C++调用R语言函数。构造器使用所需函数的名字作为参数。在为收集数据创建一个函数对象之后，我们能用相同的参数调用此函数当作R语言函数。注意：调用R语言函数尽管用C++源码，和调用R语言函数一样慢，所以应有节制地使用。

For the rest of the simulation, we need a vector weight of length n to hold the weight of the box at each time step, and another vector, first, to hold the ﬁrst item times. We also need a variable n_boxes to keep track of how many boxes have been packed.

在模拟的其他部分，我们需要一个长度n的重量向量在每一个时间步保存包装箱的重量，并且有另一个向量保存第一个物品的时间。我们需要一个n_boxes变量保存多少个包装箱被使用的轨迹。

#include <Rcpp.h> using namespace Rcpp;
// [[Rcpp::export]]

List pack_boxes(int n, NumericVector p) #p物品的概率

{

Function sample = Environment("package:base")["sample"]; #sample()函数

// Sample item weights.

int w = p.size(); #w=p.size()推理

IntegerVector item = sample(w, n, true, p); #item变量是sample()的值，重量向量
// Initialize loop variables.

IntegerVector weight(n); #weight[]数组，重量向量

weight[0] = item[0];
IntegerVector first(n); #重量向量，first[]

first[0] = 1;

int n_boxes = 1; #包装箱数量
// ...

We don’t know how long first needs to be,but we can ensure it’s long enough by making it length n,as above. Alter natively, if we were concerned about memory usage, we could’ve used a data structure from C++’s standard template library and converted to a correctly-sized IntegerVector at the end of the simulation with Rcpp’s wrap routine.

我们不知道第一个箱子需要多久的时间，但是我们能根据长度n确保足够的程序运行时间。如果我们关注内存的使用率，我们使用了c++标准模板函数库中的一个数据结构，只能在模拟的最后用Rcpp 包装例程进行格式转换，将此数据结构转换为正确长度的整数向量。？

The core of our simulation is a for loop. Unlike R, where for loops be avoided in favor of vectorized code, there’s no penalty for using for loops in C++.

我们的模拟程序的核心是一个for 循环。与R语言不同，for循环避免用向量计算编码，在C++中使用for循环并不悔带来任何坏处。

for (int i = 1; i < n; i++)

{

int new_weight = weight[i - 1] + item[i];
if (new_weight <= w) {

// Continue with current

box. weight[i] = new_weight;

} else {

// Start a new box.

weight[i] = item[i];

first[n_boxes++] = i + 1;

}
}
// ...