用Boost.Python构建混合系统

Building Hybrid Systems with Boost.Python

用Boost.Python构建混合系统

Author: David Abrahams
Contact: dave@boost-consulting.com
Organization: Boost Consulting
Date: 2003-03-19
Author: Ralf W. Grosse-Kunstleve
Copyright: Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved
翻译: 王志勇Slowness Chen金庆
译文更新: 2008-05-29

Abstract

摘要

Boost.Python is an open source C++ library which provides a concise IDL-like interface for binding C++ classes and functions to Python. Leveraging the full power of C++ compile-time introspection and of recently developed metaprogramming techniques, this is achieved entirely in pure C++, without introducing a new syntax. Boost.Python's rich set of features and high-level interface make it possible to engineer packages from the ground up as hybrid systems, giving programmers easy and coherent access to both the efficient compile-time polymorphism of C++ and the extremely convenient run-time polymorphism of Python.

Boost.Python是一个开源C++库, 它提供了一个简明的IDL式的接口, 用于把C++类和函数绑定到Python。 借助C++强大的编译时内省能力和最近发展的元编程技术, 绑定工作完全用纯C++实现, 而没有引入新的语法。 Boost.Python丰富的特性和高级接口, 使得完全按混合系统设计软件包成为可能, 并让程序员以轻松连贯的方式,同时使用 C++高效的编译时多态, 和Python极端便利的运行时多态。

Introduction

介绍

Python and C++ are in many ways as different as two languages could be: while C++ is usually compiled to machine-code, Python is interpreted. Python's dynamic type system is often cited as the foundation of its flexibility, while in C++ static typing is the cornerstone of its efficiency. C++ has an intricate and difficult compile-time meta-language, while in Python, practically everything happens at runtime.

作为两种语言,Python和C++存在很多差异。 C++一般被编译为机器码, 而Python是解释执行的。 Python的动态类型系统通常被认为是它灵活性的基础, 而C++的静态类型系统是C++效率的基石。 C++有一种复杂艰深的编译时元语言, 而在Python中,几乎一切都发生在运行时。

Yet for many programmers, these very differences mean that Python and C++ complement one another perfectly. Performance bottlenecks in Python programs can be rewritten in C++ for maximal speed, and authors of powerful C++ libraries choose Python as a middleware language for its flexible system integration capabilities. Furthermore, the surface differences mask some strong similarities:

然而对很多程序员来说, 这些差异也意味着Python和C++可以完美互补。 为了提高运行速度,Python程序的性能瓶颈可以用C++重写, 而大型C++库的作者们, 为了获得灵活的系统集成能力, 选择Python作为中间件语言。 此外,在表面差异掩盖之下,二者有一些非常相似之处:

  • 'C'-family control structures (if, while, for...)
  • Support for object-orientation, functional programming, and generic programming (these are both multi-paradigm programming languages.)
  • Comprehensive operator overloading facilities, recognizing the importance of syntactic variability for readability and expressivity.
  • High-level concepts such as collections and iterators.
  • High-level encapsulation facilities (C++: namespaces, Python: modules) to support the design of re-usable libraries.
  • Exception-handling for effective management of error conditions.
  • C++ idioms in common use, such as handle/body classes and reference-counted smart pointers mirror Python reference semantics.
  • 'C'-家族的控制结构(if, while, for...)
  • 支持面向对象、函数式编程,以及泛型编程 (它们都是多范式(multi-paradigm)编程语言。)
  • 认同语法可变性(syntactic variability) 对于提高代码可读性和表达力的重要作用, 提供了对运算符重载的广泛支持。
  • 高级概念,如集合和迭代器。
  • 高级封装机制(C++:名字空间,Python:模块),以支持可重用库的设计。
  • 异常处理,提供有效的错误管理。
  • 通用的C++惯用法,如handle/body类,和引用计数的智能指针, 即Python的引用语义。

Given Python's rich 'C' interoperability API, it should in principle be possible to expose C++ type and function interfaces to Python with an analogous interface to their C++ counterparts. However, the facilities provided by Python alone for integration with C++ are relatively meager. Compared to C++ and Python, 'C' has only very rudimentary abstraction facilities, and support for exception-handling is completely missing. 'C' extension module writers are required to manually manage Python reference counts, which is both annoyingly tedious and extremely error-prone. Traditional extension modules also tend to contain a great deal of boilerplate code repetition which makes them difficult to maintain, especially when wrapping an evolving API.

因为Python有着丰富的'C'语言集成API, 原则上,向Python导出C++类型和函数接口应该是可行的, 并且导出的接口与对应C++的接口应该是相似的。 然而,Python本身提供的C++集成功能相对比较弱。 和C++,Python相比, 'C'只有非常基本的抽象能力, 而且完全不支持异常处理。 'C'扩展模块的作者必须手工管理Python的引用计数, 这不仅单调乏味,令人恼火,而且还极易出错。 传统的扩展模块往往包含大量重复的样板代码, 使它们难以维护, 尤其是当要封装的API尚处于发展之中。

These limitations have lead to the development of a variety of wrapping systems. SWIG is probably the most popular package for the integration of C/C++ and Python. A more recent development is SIP, which was specifically designed for interfacing Python with the Qt graphical user interface library. Both SWIG and SIP introduce their own specialized languages for customizing inter-language bindings. This has certain advantages, but having to deal with three different languages (Python, C/C++ and the interface language) also introduces practical and mental difficulties. The CXX package demonstrates an interesting alternative. It shows that at least some parts of Python's 'C' API can be wrapped and presented through a much more user-friendly C++ interface. However, unlike SWIG and SIP, CXX does not include support for wrapping C++ classes as new Python types.

这些限制导致了多种封装系统的发展。 SWIG 可能是最流行的C/C++和Python集成系统。 还有最近发展的 SIP, 它是专门为 Qt 图形用户界面库设计的, 用于提供Qt的Python接口。 为了定制语言间的绑定,SWIG和SIP都引入了它们自己的专用语言。 这有一定的好处, 但是你不得不去应付三种不同语言(Python、C/C++和接口语言), 所以也带来了事实上和心理上的困难。 CXX软件包 展示了另一种令人感兴趣的选择。 它显示,至少可以封装部分Python 'C' API, 将它们表示为更友好的C++接口。 然而,不像SWIG和SIP, CXX不能将C++类封装成新的Python类型。

The features and goals of Boost.Python overlap significantly with many of these other systems. That said, Boost.Python attempts to maximize convenience and flexibility without introducing a separate wrapping language. Instead, it presents the user with a high-level C++ interface for wrapping C++ classes and functions, managing much of the complexity behind-the-scenes with static metaprogramming. Boost.Python also goes beyond the scope of earlier systems by providing:

Boost.Python 的特性和目标与这些系统有很多重叠。 Boost.Python努力提高封装的便利性和灵活性, 但不引入单独的封装语言。 相反,它通过静态元编程,在幕后管理大量的复杂性, 呈现给用户一个高级C++接口来封装C++类和函数。 Boost.Python也在如下领域超越了早期的系统:

  • Support for C++ virtual functions that can be overridden in Python.
  • Comprehensive lifetime management facilities for low-level C++ pointers and references.
  • Support for organizing extensions as Python packages, with a central registry for inter-language type conversions.
  • A safe and convenient mechanism for tying into Python's powerful serialization engine (pickle).
  • Coherence with the rules for handling C++ lvalues and rvalues that can only come from a deep understanding of both the Python and C++ type systems.
  • 支持C++虚函数,并能在Python中覆盖。
  • 对于低级的C++指针和引用,提供全面的生命期管理机制。
  • 支持按Python包组织扩展模块,通过中心注册表进行语言间类型转换。
  • 通过一种安全方便的机制,引入Python强大的序列化引擎(pickle)。
  • 与C++处理左值和右值的规则相一致,该一致性只能来自于对Python和C++类型系统的深入理解。

The key insight that sparked the development of Boost.Python is that much of the boilerplate code in traditional extension modules could be eliminated using C++ compile-time introspection. Each argument of a wrapped C++ function must be extracted from a Python object using a procedure that depends on the argument type. Similarly the function's return type determines how the return value will be converted from C++ to Python. Of course argument and return types are part of each function's type, and this is exactly the source from which Boost.Python deduces most of the information required.

一个关键性的发现启动了Boost.Python的开发, 即利用C++的编译时内省,可以消除传统扩展模块中的大量样板代码。 如每个封装的C++函数的参数都是从Python对象提取的, 提取时必须根据参数类型调用相应的过程。 类似地,函数返回值从C++转换成Python时, 返回值的类型决定了如何转换。 因为参数和返回值的类型是每个函数类型的一部分, 所以Boost.Python可以从函数类型推导出大部分所需的信息。

This approach leads to user guided wrapping: as much information is extracted directly from the source code to be wrapped as is possible within the framework of pure C++, and some additional information is supplied explicitly by the user. Mostly the guidance is mechanical and little real intervention is required. Because the interface specification is written in the same full-featured language as the code being exposed, the user has unprecedented power available when she does need to take control.

这种方法导致了“用户指导的封装(user guided wrapping)”: 在纯C++的框架内, 从待封装的源代码中直接提取尽可能多的信息, 而一些额外的信息由用户显式提供。 通常这种指导是自动的,很少需要真正的干涉。 因为接口规范和导出代码是用同一门全功能的语言写的, 当用户确实需要取得控制时, 他所拥有的权力是空前强大的。

Boost.Python Design Goals

Boost.Python的设计目标

The primary goal of Boost.Python is to allow users to expose C++ classes and functions to Python using nothing more than a C++ compiler. In broad strokes, the user experience should be one of directly manipulating C++ objects from Python.

Boost.Python的首要目标是, 让用户只用C++编译器就能向Python导出C++类和函数。 大体来讲,用户的体验应该是,能够从Python直接操作C++对象。

However, it's also important not to translate all interfaces too literally: the idioms of each language must be respected. For example, though C++ and Python both have an iterator concept, they are expressed very differently. Boost.Python has to be able to bridge the interface gap.

然而,有一点也很重要, 那就是不要过于按字面翻译所有接口: 必须考虑每种语言的惯用法。 例如,虽然C++和Python都有迭代器的概念, 表达方式却很不一样。 Boost.Python必须能够消除这种接口的差异。

It must be possible to insulate Python users from crashes resulting from trivial misuses of C++ interfaces, such as accessing already-deleted objects. By the same token the library should insulate C++ users from low-level Python 'C' API, replacing error-prone 'C' interfaces like manual reference-count management and raw PyObject pointers with more-robust alternatives.

Python用户可能会误用C++接口, 因此,Boost.Python必须能够隔离因轻微的误用而造成的崩溃, 例如访问已删除的对象。 同样的,Boost.Python库应该把C++用户从低级的Python 'C' API中解放出来, 将容易出错的'C'接口, 如手工引用计数管理、原始的 PyObject 指针, 替换为更健壮的接口。

Support for component-based development is crucial, so that C++ types exposed in one extension module can be passed to functions exposed in another without loss of crucial information like C++ inheritance relationships.

支持基于组件的开发是至关重要的, 这样,一个扩展模块导出的C++类型, 可以传递给另一个模块导出的函数, 而不丢失重要的信息,比如C++的继承关系。

Finally, all wrapping must be non-intrusive, without modifying or even seeing the original C++ source code. Existing C++ libraries have to be wrappable by third parties who only have access to header files and binaries.

最后,所有的封装必须是非侵入性的(non-intrusive), 不能修改最初的C++源码, 甚至不必看到源码。 第三方必须能够封装现有的C++库, 即使他只有头文件和二进制库。

Hello Boost.Python World

Hello Boost.Python World

And now for a preview of Boost.Python, and how it improves on the raw facilities offered by Python. Here's a function we might want to expose:

现在来预览一下Boost.Python, 看看它是如何改进Python原有的封装机制的。 下面是我们想导出的函数:

char const* greet(unsigned x)
{
static char const* const msgs[] = { "hello", "Boost.Python", "world!" };

if (x > 2)
throw std::range_error("greet: index out of range");

return msgs[x];
}

To wrap this function in standard C++ using the Python 'C' API, we'd need something like this:

在标准C++中,用Python 'C' API来封装这个函数,我们需要像这样做:

extern "C" // all Python interactions use 'C' linkage and calling convention
{
// Wrapper to handle argument/result conversion and checking
PyObject* greet_wrap(PyObject* args, PyObject * keywords)
{
int x;
if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments
{
char const* result = greet(x); // invoke wrapped function
return PyString_FromString(result); // convert result to Python
}
return 0; // error occurred
}

// Table of wrapped functions to be exposed by the module
static PyMethodDef methods[] = {
{ "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" }
, { NULL, NULL, 0, NULL } // sentinel
};

// module initialization function
DL_EXPORT init_hello()
{
(void) Py_InitModule("hello", methods); // add the methods to the module
}
}

Now here's the wrapping code we'd use to expose it with Boost.Python:

而这是用Boost.Python来导出函数的封装代码:

#include <boost/python.hpp>
using namespace boost::python;
BOOST_PYTHON_MODULE(hello)
{
def("greet", greet, "return one of 3 parts of a greeting");
}

and here it is in action:

这是运行结果:

>>> import hello
>>> for x in range(3):
... print hello.greet(x)
...
hello
Boost.Python
world!

Aside from the fact that the 'C' API version is much more verbose, it's worth noting a few things that it doesn't handle correctly:

使用'C' API的版本要冗长的多,此外,还需要注意,有些东西它没有正确处理:

  • The original function accepts an unsigned integer, and the Python 'C' API only gives us a way of extracting signed integers. The Boost.Python version will raise a Python exception if we try to pass a negative number to hello.greet, but the other one will proceed to do whatever the C++ implementation does when converting an negative integer to unsigned (usually wrapping to some very large number), and pass the incorrect translation on to the wrapped function.

    原来的函数接受一个无符号整数, 然而Python 'C' API只能提取有符号整数。 如果我们试图向 hello.greet 传递一个负数, Boost.Python版会引发Python异常, 而另一个则会继续:执行C++代码, 将负数转换为无符号数(通常会变成一个很大的数), 然后把不正确的转换结果传递给被封装的函数。

  • That brings us to the second problem: if the C++ greet() function is called with a number greater than 2, it will throw an exception. Typically, if a C++ exception propagates across the boundary with code generated by a 'C' compiler, it will cause a crash. As you can see in the first version, there's no C++ scaffolding there to prevent this from happening. Functions wrapped by Boost.Python automatically include an exception-handling layer which protects Python users by translating unhandled C++ exceptions into a corresponding Python exception.

    这引起了第二个问题: 如果输入一个大于2的参数, C++ greet()函数 会抛出异常。 典型的,如果C++异常传播时, 跨越了'C'编译器生成的代码的边界, 就会导致崩溃。 正如你在第一个版本中所见, 那儿没有防止崩溃的C++机制。 而Boost.Python封装的函数自动包含了异常处理层, 它把未处理的C++异常翻译成相应的Python异常, 从而保护了Python用户。

  • A slightly more-subtle limitation is that the argument conversion used in the Python 'C' API case can only get that integer x in one way. PyArg_ParseTuple can't convert Python long objects (arbitrary-precision integers) which happen to fit in an unsigned int but not in a signed long, nor will it ever handle a wrapped C++ class with a user-defined implicit operator unsigned int() conversion. Boost.Python's dynamic type conversion registry allows users to add arbitrary conversion methods.

    一个更微妙的限制是, Python 'C' API的参数转换只能以“一种”方式取得整数 x。 如果有一个Python long 对象(任意精度整数), 它的大小正好属于 unsigned int, 但不属于 signed long, PyArg_ParseTuple就不能对其进行转换。 对于一个定义了 operator unsigned int(), 即用户自定义隐式转换的C++封装类, 它同样无法处理。 而Boost.Python的动态类型转换注册表允许用户添加任意的转换方法。

Library Overview

库概览

This section outlines some of the library's major features. Except as neccessary to avoid confusion, details of library implementation are omitted.

本节简述了库的一些主要特性。 在不影响理解的情况下,省略了库的实现细节。

Exposing Classes

导出类

C++ classes and structs are exposed with a similarly-terse interface. Given:

C++类和结构是用同样简洁的接口导出的。如有:

struct World
{
void set(std::string msg) { this->msg = msg; }
std::string greet() { return msg; }
std::string msg;
};

The following code will expose it in our extension module:

以下代码会将它导出到扩展模块:

#include <boost/python.hpp>
BOOST_PYTHON_MODULE(hello)
{
class_<World>("World")
.def("greet", &World::greet)
.def("set", &World::set)
;
}

Although this code has a certain pythonic familiarity, people sometimes find the syntax bit confusing because it doesn't look like most of the C++ code they're used to. All the same, this is just standard C++. Because of their flexible syntax and operator overloading, C++ and Python are great for defining domain-specific (sub)languages (DSLs), and that's what we've done in Boost.Python. To break it down:

尽管上述代码具有某种熟悉的Python风格, 但语法还是有点令人迷惑, 因为它看起来不像通常的C++代码。 但是,这仍然是正确的标准C++。 因为C++和Python具有灵活的语法和运算符重载, 它们都很善于定义特定领域(子)语言 (DSLs, domain-specific (sub)languages)。 我们在Boost.Python里面就是定义了一个DSL。 把代码拆开来看:

class_<World>("World")

constructs an unnamed object of type class_<World> and passes "World" to its constructor. This creates a new-style Python class called World in the extension module, and associates it with the C++ type World in the Boost.Python type conversion registry. We might have also written:

构造了一个匿名对象, 类型为 class_<World>, 并把"World" 传递给它的构造函数。 这将在扩展模块里创建一个新型Python类 World, 并在Boost.Python的类型转换注册表里, 把它和C++类型 World 关联起来。 我们也可以这么写:

class_<World> w("World");

but that would've been more verbose, since we'd have to name w again to invoke its def() member function:

但是那会显得更冗长, 因为我们不得不再次通过 w 去调用它的 def() 成员函数:

w.def("greet", &World::greet)

There's nothing special about the location of the dot for member access in the original example: C++ allows any amount of whitespace on either side of a token, and placing the dot at the beginning of each line allows us to chain as many successive calls to member functions as we like with a uniform syntax. The other key fact that allows chaining is that class_<> member functions all return a reference to *this.

原来的例子里的点表示成员访问, 它的位置没有什么特别的: 因为C++允许标记(token)的两边可以有任意数量的空白符。 把点放在每行的开始, 允许我们以一致的句法, 链式串接连续的成员函数调用, 想串多少都行。 允许链式调用的另一关键是, class_<> 的成员函数都返回对 *this 的引用。

So the example is equivalent to:

因此本例等同于:

class_<World> w("World");
w.def("greet", &World::greet);
w.def("set", &World::set);

It's occasionally useful to be able to break down the components of a Boost.Python class wrapper in this way, but the rest of this article will stick to the terse syntax.

这种方式将Boost.Python类包装的部件都拆分开来了, 能这样拆分有时候是有用的。 但本文下面仍将坚持使用简洁格式。

For completeness, here's the wrapped class in use:

最后来看封装类的使用:

>>> import hello
>>> planet = hello.World()
>>> planet.set('howdy')
>>> planet.greet()
'howdy'

Constructors

构造函数

Since our World class is just a plain struct, it has an implicit no-argument (nullary) constructor. Boost.Python exposes the nullary constructor by default, which is why we were able to write:

由于我们的 World类 只是一个简单的 struct, 它有一个隐式的无参数的构造函数。 Boost.Python默认会导出这个无参数的构造函数, 所以我们可以这样写:

>>> planet = hello.World()

However, well-designed classes in any language may require constructor arguments in order to establish their invariants. Unlike Python, where __init__ is just a specially-named method, In C++ constructors cannot be handled like ordinary member functions. In particular, we can't take their address: &World::World is an error. The library provides a different interface for specifying constructors. Given:

然而,在任何语言里,对于设计良好的类, 构造函数可能需要参数,以建立类的不变式(invariant)。 Python的__init__ 只是一个特殊命名的方法, 而C++的构造函数与Python不同, 它不能像普通成员函数那样处理。 特别是,我们不能取它的地址: &World::World 是一个错误。 Boost.Python库提供了一个不同的接口来指定构造函数。 假设有:

struct World
{
World(std::string msg); // added constructor
...

we can modify our wrapping code as follows:

我们可以如下修改封装代码:

class_<World>("World", init<std::string>())
...

of course, a C++ class may have additional constructors, and we can expose those as well by passing more instances of init<...> to def():

当然,C++类可能还有其他的构造函数, 我们也可以导出它们, 只需要向 def() 传入更多的 init<...> 实例:

class_<World>("World", init<std::string>())
.def(init<double, double>())
...

Boost.Python allows wrapped functions, member functions, and constructors to be overloaded to mirror C++ overloading.

Boost.Python封装的函数、成员函数,以及构造函数都可以重载, 以映射C++中的重载。

Data Members and Properties

数据成员和属性

Any publicly-accessible data members in a C++ class can be easily exposed as either readonly or readwrite attributes:

C++中任何可公有访问的数据成员, 都能轻易地封装成 readonly 或者 readwrite 属性:

class_<World>("World", init<std::string>())
.def_readonly("msg", &World::msg)
...

and can be used directly in Python:

并直接在Python中使用:

>>> planet = hello.World('howdy')
>>> planet.msg
'howdy'

This does not result in adding attributes to the World instance __dict__, which can result in substantial memory savings when wrapping large data structures. In fact, no instance __dict__ will be created at all unless attributes are explicitly added from Python. Boost.Python owes this capability to the new Python 2.2 type system, in particular the descriptor interface and property type.

不会World实例 __dict__ 中添加属性, 从而在封装大型数据结构时节省大量的内存。 实际上,根本不会创建实例 __dict__, 除非从Python显式添加属性。 Boost.Python的这种能力归功于Python 2.2新的类型系统, 尤其是描述符(descriptor)接口和 property 类型。

In C++, publicly-accessible data members are considered a sign of poor design because they break encapsulation, and style guides usually dictate the use of "getter" and "setter" functions instead. In Python, however, __getattr__, __setattr__, and since 2.2, property mean that attribute access is just one more well-encapsulated syntactic tool at the programmer's disposal. Boost.Python bridges this idiomatic gap by making Python property creation directly available to users. If msg were private, we could still expose it as attribute in Python as follows:

在C++中,人们认为, 可公有访问的数据成员是设计糟糕的标志, 因为它们破坏了封装性, 并且风格指南通常指示使用“getter”和“setter”函数作为替代。 然而在Python里, __getattr____setattr__, 和2.2版出现的 property 意味着,属性访问仅仅是一种任由程序员选用的、 封装性更好的语法工具。 Boost.Python让用户可直接创建Python property, 从而消除了二者语言习惯上的差异。 即使msg是私有的, 我们仍可把它导出为Python中的属性,如下:

class_<World>("World", init<std::string>())
.add_property("msg", &World::greet, &World::set)
...

The example above mirrors the familiar usage of properties in Python 2.2+:

上例等同于Python 2.2+里面熟悉的属性的用法:

>>> class World(object):
... __init__(self, msg):
... self.__msg = msg
... def greet(self):
... return self.__msg
... def set(self, msg):
... self.__msg = msg
... msg = property(greet, set)

Operator Overloading

运算符重载

The ability to write arithmetic operators for user-defined types has been a major factor in the success of both languages for numerical computation, and the success of packages like NumPy attests to the power of exposing operators in extension modules. Boost.Python provides a concise mechanism for wrapping operator overloads. The example below shows a fragment from a wrapper for the Boost rational number library:

两种语言都能够为用户自定义类型编写算术运算符, 这是它们在数值计算上获得成功的主要因素, 并且,像 NumPy 这样的软件包的成功证明了在扩展模块中导出运算符的威力。 Boost.Python为封装运算符重载提供了简洁的机制。 下面是Boost有理数库封装代码的片断:

class_<rational<int> >("rational_int")
.def(init<int, int>()) // constructor, e.g. rational_int(3,4)
.def("numerator", &rational<int>::numerator)
.def("denominator", &rational<int>::denominator)
.def(-self) // __neg__ (unary minus)
.def(self + self) // __add__ (homogeneous)
.def(self * self) // __mul__
.def(self + int()) // __add__ (heterogenous)
.def(int() + self) // __radd__
...

The magic is performed using a simplified application of "expression templates" [VELD1995], a technique originally developed for optimization of high-performance matrix algebra expressions. The essence is that instead of performing the computation immediately, operators are overloaded to construct a type representing the computation. In matrix algebra, dramatic optimizations are often available when the structure of an entire expression can be taken into account, rather than evaluating each operation "greedily". Boost.Python uses the same technique to build an appropriate Python method object based on expressions involving self.

魔法的施展只是简单应用了“表达式模板(expression templates)” [VELD1995], 一种最初为高性能矩阵代数表达式优化而开发的技术。 其精髓是,不是立即进行计算, 而是利用运算符重载,来构造一个代表计算的类型。 在矩阵代数里, 当考虑整个表达式的结构, 而不是“贪婪地”对每步运算求值时, 经常可以获得显著的优化。 Boost.Python使用了同样的技术, 它用包含self的表达式, 构建了一个适当的Python成员方法对象。

Inheritance

继承

C++ inheritance relationships can be represented to Boost.Python by adding an optional bases<...> argument to the class_<...> template parameter list as follows:

要在Boost.Python里描述C++继承关系, 可以在class_<...> 模板参数列表里添加一个可选的 bases<...>, 如下:

class_<Derived, bases<Base1,Base2> >("Derived")
...

This has two effects:

这有两种作用:

  1. When the class_<...> is created, Python type objects corresponding to Base1 and Base2 are looked up in Boost.Python's registry, and are used as bases for the new Python Derived type object, so methods exposed for the Python Base1 and Base2 types are automatically members of the Derived type. Because the registry is global, this works correctly even if Derived is exposed in a different module from either of its bases.
  2. C++ conversions from Derived to its bases are added to the Boost.Python registry. Thus wrapped C++ methods expecting (a pointer or reference to) an object of either base type can be called with an object wrapping a Derived instance. Wrapped member functions of class T are treated as though they have an implicit first argument of T&, so these conversions are neccessary to allow the base class methods to be called for derived objects.
  1. class_<...>创建时, 会在Boost.Python的注册表里查找 Base1Base2 所对应的Python类型对象, 并将它们作为新的 Python Derived 类型对象的基类,因此为Python的 Base1Base2 类型导出的成员函数自动成为 Derived 类型的成员。 因为注册表是全局的,所以 Derived 和它的基类可以在不同的模块中导出。
  2. 在Boost.Python的注册表里, 添加了从Derived 到它的基类的C++转换。 这样,封装了 Derived 实例的对象就可以调用其基类的方法, 而该封装的C++方法本该由一个基类对象(指针或引用)来调用。 类T 的成员方法封装后, 可视为它们具有一个隐含的第一参数 T&, 所以为了允许派生类对象调用基类方法,这些转换是必须的。

Of course it's possible to derive new Python classes from wrapped C++ class instances. Because Boost.Python uses the new-style class system, that works very much as for the Python built-in types. There is one significant detail in which it differs: the built-in types generally establish their invariants in their __new__ function, so that derived classes do not need to call __init__ on the base class before invoking its methods :

当然,也可以从封装的C++类实例派生新的Python类。 因为Boost.Python使用了新型类系统, 从封装类派生就像是从Python内置类型派生一样。 但有一个重大区别: 内置类型一般在 __new__ 函数里建立不变式, 因此其派生类不需要调用基类的 __init__

>>> class L(list):
... def __init__(self):
... pass
...
>>> L().reverse()
>>>

Because C++ object construction is a one-step operation, C++ instance data cannot be constructed until the arguments are available, in the __init__ function:

因为C++的对象构造是一个单步操作, 在__init__ 函数里, 只有参数齐全,才能构造C++实例数据:

>>> class D(SomeBoostPythonClass):
... def __init__(self):
... pass
...
>>> D().some_boost_python_method()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation

This happened because Boost.Python couldn't find instance data of type SomeBoostPythonClass within the D instance; D's __init__ function masked construction of the base class. It could be corrected by either removing D's __init__ function or having it call SomeBoostPythonClass.__init__(...) explicitly.

发生错误的原因是, Boost.Python在实例 D中, 找不到类型 SomeBoostPythonClass 的实例数据; D__init__函数 遮盖了基类的构造函数。 纠正方法为, 删除 D__init__函数, 或者让它显式调用 SomeBoostPythonClass.__init__(...)

Virtual Functions

虚函数

Deriving new types in Python from extension classes is not very interesting unless they can be used polymorphically from C++. In other words, Python method implementations should appear to override the implementation of C++ virtual functions when called through base class pointers/references from C++. Since the only way to alter the behavior of a virtual function is to override it in a derived class, the user must build a special derived class to dispatch a polymorphic class' virtual functions:

用Python从扩展类派生新的类型没有太大意思, 除非可以在C++里面多态地使用派生类。 换句话说, 在C++里,通过基类指针或引用调用C++虚函数时, Python实现的方法应该看起来像是覆盖了C++虚函数的实现。 因为改变虚函数行为的唯一方法是, 在派生类里覆盖它, 所以用户必须构建一个特殊的派生类, 来分派多态类的虚函数:

//
// interface to wrap:
//
class Base
{
public:
virtual int f(std::string x) { return 42; }
virtual ~Base();
};

int calls_f(Base const& b, std::string x) { return b.f(x); }

//
// Wrapping Code
//

// Dispatcher class
struct BaseWrap : Base
{
// Store a pointer to the Python object
BaseWrap(PyObject* self_) : self(self_) {}
PyObject* self;

// Default implementation, for when f is not overridden
int f_default(std::string x) { return this->Base::f(x); }
// Dispatch implementation
int f(std::string x) { return call_method<int>(self, "f", x); }
};

...
def("calls_f", calls_f);
class_<Base, BaseWrap>("Base")
.def("f", &Base::f, &BaseWrap::f_default)
;

Now here's some Python code which demonstrates:

这是Python演示代码:

>>> class Derived(Base):
... def f(self, s):
... return len(s)
...
>>> calls_f(Base(), 'foo')
42
>>> calls_f(Derived(), 'forty-two')
9

Things to notice about the dispatcher class:

关于分派类需要注意:

  • The key element which allows overriding in Python is the call_method invocation, which uses the same global type conversion registry as the C++ function wrapping does to convert its arguments from C++ to Python and its return type from Python to C++.
  • Any constructor signatures you wish to wrap must be replicated with an initial PyObject* argument
  • The dispatcher must store this argument so that it can be used to invoke call_method
  • The f_default member function is needed when the function being exposed is not pure virtual; there's no other way Base::f can be called on an object of type BaseWrap, since it overrides f.
  • 允许在Python里覆盖的关键因素是 call_method 调用, 与C++函数封装一样, 它使用同一个全局注册表, 把参数从C++转换到Python, 并把返回类型从Python转换到C++。
  • 任何你希望封装的构造函数, 其函数签名必须有一个的相同的初始化参数 PyObject*
  • 分派者必须保存这个参数,以便调用 call_method 时使用。
  • 当导出的函数不是纯虚函数时, 就需要 f_default 成员函数; 在BaseWrap 类型的对象里, 没有其他方式可以调用 Base::f, 因为f 被覆盖了。

Deeper Reflection on the Horizon?

更深的反射即将出现?

Admittedly, this formula is tedious to repeat, especially on a project with many polymorphic classes. That it is neccessary reflects some limitations in C++'s compile-time introspection capabilities: there's no way to enumerate the members of a class and find out which are virtual functions. At least one very promising project has been started to write a front-end which can generate these dispatchers (and other wrapping code) automatically from C++ headers.

无可否认,重复这种公式化动作是冗长乏味的, 尤其是项目里有大量多态类的时候。 这里有必要反映一些C++编译时内省能力的限制: C++无法列举类的成员并找出虚函数。 不过,至少有一个项目已经启动, 有希望编写出一个前端程序, 可以从C++头文件自动生成这些分派类(和其他封装代码),

Pyste is being developed by Bruno da Silva de Oliveira. It builds on GCC_XML, which generates an XML version of GCC's internal program representation. Since GCC is a highly-conformant C++ compiler, this ensures correct handling of the most-sophisticated template code and full access to the underlying type system. In keeping with the Boost.Python philosophy, a Pyste interface description is neither intrusive on the code being wrapped, nor expressed in some unfamiliar language: instead it is a 100% pure Python script. If Pyste is successful it will mark a move away from wrapping everything directly in C++ for many of our users. It will also allow us the choice to shift some of the metaprogram code from C++ to Python. We expect that soon, not only our users but the Boost.Python developers themselves will be "thinking hybrid" about their own code.

Bruno da Silva de Oliveira正在开发 Pyste。 Pyste基于 GCC_XML 构建, 而GCC_XML可以生成XML版本的GCC内部程序描述。 因为GCC是一种高度兼容标准的C++编译器, 从而确保了对最复杂的模板代码的正确处理, 和对底层类型系统的完全访问。 和Boost.Python的哲学一致, Pyste接口描述既不侵入待封装的代码, 也不使用某种不熟悉的语言来表达, 相反,它是100%的纯Python脚本。 如果Pyste成功的话, 它将标志, 我们的许多用户不必直接用C++封装所有东西。 Pyste也将允许我们选择性地 把一些元编程代码从C++转移到Python。 我们期待不久以后,不仅用户, 而且Boost.Python开发者也能, “混合地思考”他们自己的代码。 (译注:Pyste已不再维护,更新的是 Py++。)

Serialization

序列化

Serialization is the process of converting objects in memory to a form that can be stored on disk or sent over a network connection. The serialized object (most often a plain string) can be retrieved and converted back to the original object. A good serialization system will automatically convert entire object hierarchies. Python's standard pickle module is just such a system. It leverages the language's strong runtime introspection facilities for serializing practically arbitrary user-defined objects. With a few simple and unintrusive provisions this powerful machinery can be extended to also work for wrapped C++ objects. Here is an example:

序列化(serialization)是指, 把内存中的对象转换成可保存格式, 使之可以保存到磁盘上, 或通过网络传送。 序列化后的对象(最常见的是普通字符串), 可以恢复并转换回原来的对象。 好的序列化系统会自动转换整个对象层次结构。 Python的标准模块 pickle 正是这样的系统。 它利用了语言强大的运行时内省机制, 可以序列化几乎任意的用户自定义对象。 只需加入一些简单的、非侵入的处理, 就可以扩展这个威力巨大的系统, 使它也能用于封装的C++对象。 下面是一个例子:

#include <string>

struct World
{
World(std::string a_msg) : msg(a_msg) {}
std::string greet() const { return msg; }
std::string msg;
};

#include <boost/python.hpp>
using namespace boost::python;

struct World_picklers : pickle_suite
{
static tuple
getinitargs(World const& w) { return make_tuple(w.greet()); }
};

BOOST_PYTHON_MODULE(hello)
{
class_<World>("World", init<std::string>())
.def("greet", &World::greet)
.def_pickle(World_picklers())
;
}

Now let's create a World object and put it to rest on disk:

现在,让我们创建一个 World 对象并把它保存到磁盘:

>>> import hello
>>> import pickle
>>> a_world = hello.World("howdy")
>>> pickle.dump(a_world, open("my_world", "w"))

In a potentially different script on a potentially different computer with a potentially different operating system:

然后,可能是在不同的计算机、不同的操作系统上,一个脚本可能这样恢复对象:

>>> import pickle
>>> resurrected_world = pickle.load(open("my_world", "r"))
>>> resurrected_world.greet()
'howdy'

Of course the cPickle module can also be used for faster processing.

当然,使用 cPickle 模块可以更快速地处理。

Boost.Python's pickle_suite fully supports the pickle protocol defined in the standard Python documentation. Like a __getinitargs__ function in Python, the pickle_suite's getinitargs() is responsible for creating the argument tuple that will be use to reconstruct the pickled object. The other elements of the Python pickling protocol, __getstate__ and __setstate__ can be optionally provided via C++ getstate and setstate functions. C++'s static type system allows the library to ensure at compile-time that nonsensical combinations of functions (e.g. getstate without setstate) are not used.

Boost.Python的 pickle_suite 完全支持标准Python文档定义的 pickle 协议。 类似Python里的__getinitargs__函数, pickle_suite的getinitargs()负责创建参数元组, 以重建pickle的对象。 Python pickle协议中的其他元素, __getstate__和__setstate__, 可以通过C++ getstate和setstate函数来提供,也可以不提供。 利用C++的静态类型系统, Boost.Python库在编译时保证, 不会使用没有意义的函数组合 (例如,有getstate无setstate)。

Enabling serialization of more complex C++ objects requires a little more work than is shown in the example above. Fortunately the object interface (see next section) greatly helps in keeping the code manageable.

要想序列化更复杂的C++对象, 就需要做更多的工作。 幸运的是, object 接口(见下一节)帮了大忙, 它保持了代码的可管理性。

Object interface

Object接口

Experienced 'C' language extension module authors will be familiar with the ubiquitous PyObject*, manual reference-counting, and the need to remember which API calls return "new" (owned) references or "borrowed" (raw) references. These constraints are not just cumbersome but also a major source of errors, especially in the presence of exceptions.

对于有经验的'C'语言扩展模块的作者, 他们应该熟悉无所不在的 PyObject*, 手工引用计数, 而且需要记住哪个API调用返回“新的”(拥有的)引用, 哪个返回“借来的”(原始的)引用。 这些约束不仅麻烦, 而且是主要的错误源, 尤其是面临异常的时候。

Boost.Python provides a class object which automates

阅读更多
想对作者说点什么?

博主推荐

换一批

没有更多推荐了,返回首页