有符号数据参与位与_在参与者之上的声明式数据处理管道？为什么不？

cullen2012

于 2020-09-09 02:46:45 发布

阅读量988

点赞数

文章标签： c++ 编程语言 python java 人工智能

原文链接：https://habr.com/en/post/460123/

版权

有符号数据参与位与

Some time ago, in a discussion on one of SObjectizer's releases, we were asked: "Is it possible to make a DSL to describe a data-processing pipeline?" In other words, is it possible to write something like that:

前段时间，在讨论SObjectizer发行版中的一个时，我们被问到：“是否有可能使DSL来描述数据处理管道？” 换句话说，是否可以这样写：

A | B | C | D

and get a working pipeline where messages are going from A to B, and then to C, and then to D. With control that B receives exactly that type that A returns. And C receives exactly that type that B returns. And so on.

并获得一条工作流水线，其中消息从A到B，再到C，再到D。在控制之下，B接收了A所返回的确切类型。 C恰好收到B返回的那种类型。等等。

It was an interesting task with a surprisingly simple solution. For example, that's how the creation of a pipeline can look like:

使用令人惊讶的简单解决方案，这是一项有趣的任务。例如，这就是管道创建的样子：

auto pipeline = make_pipeline(env, stage(A) | stage(B) | stage(C) | stage(D));

Or, in a more complex case (that will be discussed below):

或者，在更复杂的情况下(将在下面讨论)：

auto pipeline = make_pipeline( sobj.environment(),
        stage(validation) | stage(conversion) | broadcast(
            stage(archiving),
            stage(distribution),
            stage(range_checking) | stage(alarm_detector{}) | broadcast(
                stage(alarm_initiator),
                stage( []( const alarm_detected & v ) {
                        alarm_distribution( cerr, v );
                    } )
                )
            ) );

In this article, we'll speak about the implementation of such pipeline DSL. We'll discuss mostly parts related to stage(), broadcast() and operator|() functions with several examples of usage of C++ templates. So I hope it will be interesting even for readers who don't know about SObjectizer (if you never heard of SObjectizer here is an overview of this tool).

在本文中，我们将讨论这种管道DSL的实现。我们将通过几个使用C ++模板的示例来讨论与stage() ， broadcast()和operator|()函数有关的大部分内容。因此，我希望即使对于不了解SObjectizer的读者来说，它也会很有趣(如果您从未听说过SObjectizer，这里是此工具的概述)。

关于二手演示的几句话 (A couple of words about the used demo)

The example used in the article has been influenced by my old (and rather forgotten) experience in SCADA area.

本文中使用的示例受到了我在SCADA领域的过往(甚至是被遗忘的)经历的影响。

The idea of the demo is the handling of data read from some sensor. The data is acquired from a sensor with some period, then that data has to be validated (incorrect data should be ignored) and converted into some actual values. For example, the raw data read from a sensor can be two 8-bit integer values and those values should be converted into one floating-point number.

该演示的想法是处理从某些传感器读取的数据。需要一段时间才能从传感器获取数据，然后必须验证该数据(应忽略不正确的数据)并将其转换为一些实际值。例如，从传感器读取的原始数据可以是两个8位整数值，并且这些值应转换为一个浮点数。

Then the valid and converted values should be archived, distributed somewhere (on different nodes for visualization, for example), checked for "alarms" (if values are out of safe ranges then that should be specially handled). These operations are independent and can be performed in parallel.

然后，应将有效值和转换后的值存档，分布在某个位置(例如，在不同节点上进行可视化显示)，检查“警报”(如果值超出安全范围，则应进行特殊处理)。这些操作是独立的，可以并行执行。

Operations related to the detected alarm can be performed in parallel too: an "alarm" should be initiated (so the part of SCADA on the current node can react on it) and the information about the "alarm" should be distributed elsewhere (for example: stored to a historical database and/or visualized on SCADA operator's display).

与检测到的警报相关的操作也可以并行执行：应启动“警报”(这样，当前节点上的SCADA部分可以对此作出React)，有关“警报”的信息应分发到其他地方(例如：存储到历史数据库和/或在SCADA操作员的显示屏上可视化)。

This logic can be expressed in textual form that way:

这种逻辑可以用文本形式表示：

optional(valid_raw_data) = validate(raw_data);
if valid_raw_data is not empty then {
   converted_value = convert(valid_raw_data);
   do_async archive(converted_value);
   do_async distribute(converted_value);
   do_async {
      optional(suspicious_value) = check_range(converted_value);
      if suspicious_value is not empty then {
         optional(alarm) = detect_alarm(suspicious_value);
         if alarm is not empty then {
            do_async initiate_alarm(alarm);
            do_async distribute_alarm(alam);
         }
      }
   }
}

Or, in graphical form:

或者，以图形形式：

It's a rather artificial example, but it has some interesting things I want to show. The first is the presence of parallel stages in a pipeline (operation broadcast() exists just because of that). The second is the presence of a state in some stages. For example, alarm_detector is stateful stage.

这是一个比较人为的示例，但是我想展示一些有趣的东西。首先是流水线中存在并行阶段broadcast()正是由于这个原因而存在broadcast()操作)。第二个是在某些阶段存在状态。例如，alarm_detector是有状态阶段。

管道功能 (Pipeline capabilities)

A pipeline is built from separate stages. Each stage is a function or a functor of the following format:

管道是从不同的阶段构建的。每个阶段都是以下格式的函数或函子：

opt<Out> func(const In &);

要么

void func(const In &);

Stages that return void can only be used as the last stage of a pipeline.

返回void阶段只能用作管道的最后阶段。

Stages are bound into a chain. Each next stage receives an object returned by the previous stage. If the previous stage returns empty opt<Out> value then the next stage is not called.

阶段被捆绑在一起。每个下一个阶段都接收上一个阶段返回的对象。如果上一级返回空的opt<Out>值，则不调用下一级。

There is a special broadcast stage. It is constructed from several pipelines. A broadcast stage receives an object from the previous stage and broadcasts it to every subsidiary pipeline.

有一个特殊的broadcast阶段。它是由多个管道构成的。 broadcast阶段从上一阶段接收对象，并将其广播到每个辅助管道。

From the pipeline's point of view, broadcast stage looks like a function of the following format:

从管道的角度来看， broadcast阶段看起来像以下格式的函数：

void func(const In &);

Because there is no return value from broadcast stage a broadcast stage can only be the last stage in a pipeline.

因为从没有返回值broadcast舞台上broadcast台只能在流水线的最后阶段。

为什么管道阶段返回一个可选值？ (Why does the pipeline stage return an optional value?)

It's because there is a need to drop some incoming values. For example, the validate stage returns nothing if a raw value is incorrect, and there is no sense to handle it.

这是因为需要删除一些传入的值。例如，如果原始值不正确，则validate阶段将不返回任何内容，并且没有任何处理意义。

Another example: the alarm_detector stage returns nothing if the current suspicious value doesn't produce a new alarm case.

另一个示例：如果当前的可疑值未产生新的警报情况，则alarm_detector阶段不返回任何内容。

实施细节 (Implementation details)

Let's start from data types and functions related to the application logic. In the discussed example, the following data types are used for passing information from one stage to another:

让我们从与应用程序逻辑相关的数据类型和功能开始。在讨论的示例中，以下数据类型用于将信息从一个阶段传递到另一阶段：

// Raw data from a sensor.
struct raw_measure
{
    int m_meter_id;

    uint8_t m_high_bits;
    uint8_t m_low_bits;
};

// Type of input for validation stage with raw data from a sensor.
struct raw_value
{
    raw_measure m_data;
};

// Type of input for conversion stage with valid raw data from a sensor.
struct valid_raw_value
{
    raw_measure m_data;
};

// Data from a sensor after conversion to Celsius degrees.
struct calculated_measure
{
    int m_meter_id;

    float m_measure;
};

// The type for result of conversion stage with converted data from a sensor.
struct sensor_value
{
    calculated_measure m_data;
};

// Type with value which could mean a dangerous level of temperature.
struct suspicious_value
{
    calculated_measure m_data;
};

// Type with information about detected dangerous situation.
struct alarm_detected
{
    int m_meter_id;
};

An instance of raw_value is going to the first stage of our pipeline. This raw_value contains information acquired from a sensor in the form of raw_measure object. Then raw_value is transformed to valid_raw_value. Then valid_raw_value transformed to sensor_value with an actual sensor's value in the form of calulated_measure. If an instance of sensor_value contains a suspicious value, then an instance of suspicious_value is produced. And that suspicious_value can be transformed into alarm_detected instance later.

raw_value的实例将进入管道的第一阶段。该raw_value包含以raw_measure对象的形式从传感器获取的信息。然后将raw_value转换为valid_raw_value 。然后， valid_raw_value将实际的传感器值转换为sensor_value ，形式为calulated_measure 。如果sensor_value的实例包含可疑值，则将生成suspicious_value的实例。而且该suspicious_value可以在以后转换为alarm_detected实例。

Or, in the graphical form:

或者，以图形形式：

Now we can take a look at the implementation of our pipeline stages:

现在，我们可以看一下流水线阶段的实现：

//
// The first stage of a pipeline. Validation of raw data from a sensor.
//
// Returns valid_raw_value or nothing if value is invalid.
//
stage_result_t< valid_raw_value >
validation( const raw_value & v )
{
    if( 0x7 >= v.m_data.m_high_bits )
        return make_result< valid_raw_value >( v.m_data );
    else
        return make_empty< valid_raw_value >();
}

//
// The second stage of a pipeline. Conversion from raw data to a value
// in Celsius degrees.
//
stage_result_t< sensor_value >
conversion( const valid_raw_value & v )
{
    return make_result< sensor_value >(
        calculated_measure{ v.m_data.m_meter_id,
            0.5f * ((static_cast< uint16_t >( v.m_data.m_high_bits ) << 8) +
                v.m_data.m_low_bits) } );
}

//
// Simulation of the data archiving.
//
void
archiving( const sensor_value & v )
{
    clog << "archiving (" << v.m_data.m_meter_id << ","
        << v.m_data.m_measure << ")" << endl;
}

//
// Simulation of the data distribution.
//
void
distribution( const sensor_value & v )
{
    clog << "distributing (" << v.m_data.m_meter_id << ","
        << v.m_data.m_measure << ")" << endl;
}

//
// The first stage of a child pipeline at third level of the main pipeline.
//
// Checking for to high value of the temperature.
//
// Returns suspicious_value message or nothing.
//
stage_result_t< suspicious_value >
range_checking( const sensor_value & v )
{
    if( v.m_data.m_measure >= 45.0f )
        return make_result< suspicious_value >( v.m_data );
    else
        return make_empty< suspicious_value >();
}

//
// The next stage of a child pipeline.
//
// Checks for two suspicious_value-es in 25ms time window.
//
class alarm_detector
{
    using clock = chrono::steady_clock;

public :
    stage_result_t< alarm_detected >
    operator()( const suspicious_value & v )
    {
        if( m_previous )
            if( *m_previous + chrono::milliseconds(25) > clock::now() )
            {
                m_previous = nullopt;
                return make_result< alarm_detected >( v.m_data.m_meter_id );
            }

        m_previous = clock::now();
        return make_empty< alarm_detected >();
    }

private :
    optional< clock::time_point > m_previous;
};

//
// One of last stages of a child pipeline.
// Imitates beginning of the alarm processing.
//
void
alarm_initiator( const alarm_detected & v )
{
    clog << "=== alarm (" << v.m_meter_id << ") ===" << endl;
}

//
// Another of last stages of a child pipeline.
// Imitates distribution of the alarm.
//
void
alarm_distribution( ostream & to, const alarm_detected & v )
{
    to << "alarm_distribution (" << v.m_meter_id << ")" << endl;
}

Just skip stuff like stage_result_t, make_result and make_empty, we'll discuss it in the next section.

只需跳过stage_result_t ， make_result和make_empty ，我们将在下一部分中讨论它。

I hope that the code of those stages rather trivial. The only part that requires some additional explanation is the implementation of alarm_detector stage.

我希望这些阶段的代码相当简单。唯一需要其他说明的部分是alarm_detector阶段的实现。

In that example, an alarm is initiated only if there are at least two suspicious_values in 25ms time window. So we have to remember the time of the previous suspicious_value instance at alarm_detector stage. That is because alarm_detector is implemented as a stateful functor with a function call operator.

在该示例中，仅在25ms时间窗口中至少有两个suspicious_values时才发出警报。因此，我们必须记住在alarm_detector阶段前一个suspicious_value实例的时间。这是因为alarm_detector通过函数调用运算符实现为有状态函子。

阶段返回SObjectizer的类型，而不是std :: optional (Stages return SObjectizer's type instead of std::optional)

I told earlier that stage could return optional value. But std::optional is not used in code, the different type stage_result_t can be seen in the implementation of stages.

我之前告诉过阶段可以返回可选值。但是在代码中没有使用std::optional ，因此在执行阶段可以看到不同类型的stage_result_t 。

It is because some of SObjectizer's specific plays its role here. The returned values will be distributed as messages between SObjectizer's agents (aka actors). Every message in SObjectizer is sent as a dynamically allocated object. So we have some kind of "optimization" here: instead of returning of std::optional and then allocating a new message object, we just allocate a message object and return a smart pointer to it.

这是因为SObjectizer的某些特定功能在这里发挥了作用。返回的值将作为消息在SObjectizer的代理(也称为actor)之间分发。 SObjectizer中的每个消息都作为动态分配的对象发送。因此，我们在这里进行了某种“优化”：而不是返回std::optional然后分配一个新的消息对象，我们只是分配一个消息对象并返回指向它的智能指针。

In fact, stage_result_t is just a typedef for SObjectizer's shared_ptr analog:

实际上， stage_result_t只是SObjectizer的shared_ptr模拟的typedef：

template< typename M >
using stage_result_t = message_holder_t< M >;

And make_result and make_empty are just helper functions for constructing stage_result_t with or without an actual value inside:

并且make_result和make_empty只是用于构造stage_result_t辅助函数，内部是否带有实际值：

template< typename M, typename... Args >
stage_result_t< M >
make_result( Args &&... args )
{
    return stage_result_t< M >::make(forward< Args >(args)...);
}

template< typename M >
stage_result_t< M >
make_empty()
{
    return stage_result_t< M >();
}

For simplicity it's safe to say the validation stage could be expressed that way:

为简单起见，可以肯定地说validation阶段可以这样表示：

std::shared_ptr< valid_raw_value >
validation( const raw_value & v )
{
    if( 0x7 >= v.m_data.m_high_bits )
        return std::make_shared< valid_raw_value >( v.m_data );
    else
        return std::shared_ptr< valid_raw_value >{};
}

But, because of SObjectizer's specific, we can't use std::shared_ptr and have to deal with so_5::message_holder_t type. And we hide that specific behind stage_result_t, make_result and make_empty helpers.

但是，由于SObjectizer的特定性，我们不能使用std::shared_ptr而必须处理so_5::message_holder_t类型。并且我们将具体的隐藏在stage_result_t ， make_result和make_empty帮助器后面。

stage_handler_t和stage_builder_t分离 (stage_handler_t and stage_builder_t separation)

An important point of the pipeline implementation is the separation of stage handler and stage builder concepts. This is done for simplicity. The presence of these concepts allowed me to have two steps in the pipeline definition.

流水线实现的一个重点是阶段处理程序和阶段构建器概念的分离。这样做是为了简化。这些概念的出现使我在管道定义中分两个步骤。

At the first step, a user describes pipeline stages. As a result, I receive an instance of stage_t that holds all pipeline stages inside.

第一步，用户描述流水线阶段。结果，我收到一个stage_t实例，该实例将所有管道阶段保存在其中。

At the second step, a set of underlying SObjectizer's agents is created. Those agents receive messages with results of the previous stages and call actual stage handlers, then sends the results to the next stages.

第二步，创建一组基础SObjectizer的代理。这些代理接收带有上一阶段结果的消息，并调用实际的阶段处理程序 ，然后将结果发送到下一阶段。

But to create this set of agents every stage has to have a stage builder. Stage builder can be seen as a factory that creates an underlying SObjectizer's agent.

但是要创建这组代理，每个阶段都必须有一个阶段构建器 。 阶段构建器可以看作是创建基础SObjectizer代理的工厂。

So we have the following relation: every pipeline stage produces two objects: stage handler that holds stage-related logic, and stage builder that creates an underlying SObjectizer's agent for calling stage handler at the appropriate time:

因此，我们具有以下关系：每个管道阶段都产生两个对象： 阶段管理器 ，其持有与阶段相关的逻辑；以及阶段构建器 ，其创建基础SObjectizer的代理以在适当的时间调用阶段处理程序 ：

Stage handler is represented in the following way:

阶段处理程序以以下方式表示：

template< typename In, typename Out >
class stage_handler_t
{
public :
    using traits = handler_traits_t< In, Out >;
    using func_type = function< typename traits::output(const typename traits::input &) >;

    stage_handler_t( func_type handler )
        : m_handler( move(handler) )
    {}

    template< typename Callable >
    stage_handler_t( Callable handler ) : m_handler( handler ) {}

    typename traits::output
    operator()( const typename traits::input & a ) const
    {
        return m_handler( a );
    }

private :
    func_type m_handler;
};

Where handler_traits_t are defined the following way:

其中handler_traits_t的定义方式如下：

//
// We have to deal with two types of stage handlers:
// - intermediate handlers which will return some result (e.g. some new
//   message);
// - terminal handlers which can return nothing (e.g. void instead of
//   stage_result_t<M>);
//
// This template with specialization defines `input` and `output`
// aliases for both cases.
//
template< typename In, typename Out >
struct handler_traits_t
{
    using input = In;
    using output = stage_result_t< Out >;
};

template< typename In >
struct handler_traits_t< In, void >
{
    using input = In;
    using output = void;
};

Stage builder is represented by just std::function:

阶段构建器仅由std::function ：

using stage_builder_t = function< mbox_t(coop_t &, mbox_t) >;

助手类型lambda_traits_t和callable_traits_t (Helper types lambda_traits_t and callable_traits_t)

Because stages can be represented by free functions or functors (like instances of alarm_detector class or anonymous compiler-generated classes representing lambdas), we need some helpers to detect types of stage's argument and return value. I used the following code for that purpose:

因为阶段可以由自由函数或函子表示(例如， alarm_detector类的实例或表示lambda的匿名编译器生成的类的实例)，所以我们需要一些帮助程序来检测阶段的参数和返回值的类型。为此，我使用了以下代码：

// 
// Helper type for `arg_type` and `result_type` alises definition.
//
template< typename R, typename A >
struct callable_traits_typedefs_t
{
    using arg_type = A;
    using result_type = R;
};

//
// Helper type for dealing with stateful objects with operator()
// (they could be user-defined objects or generated by compiler
// like lambdas).
//
template< typename T >
struct lambda_traits_t;

template< typename M, typename A, typename T >
struct lambda_traits_t< stage_result_t< M >(T::*)(const A &) const >
    :   public callable_traits_typedefs_t< M, A >
{};

template< typename A, typename T >
struct lambda_traits_t< void (T::*)(const A &) const >
    :   public callable_traits_typedefs_t< void, A >
{};

template< typename M, typename A, typename T >
struct lambda_traits_t< stage_result_t< M >(T::*)(const A &) >
    :   public callable_traits_typedefs_t< M, A >
{};

template< typename A, typename T >
struct lambda_traits_t< void (T::*)(const A &) >
    :   public callable_traits_typedefs_t< void, A >
{};

//
// Main type for definition of `arg_type` and `result_type` aliases.
// With specialization for various cases.
//
template< typename T >
struct callable_traits_t
    :   public lambda_traits_t< decltype(&T::operator()) >
{};

template< typename M, typename A >
struct callable_traits_t< stage_result_t< M >(*)(const A &) >
    :   public callable_traits_typedefs_t< M, A >
{};

template< typename A >
struct callable_traits_t< void(*)(const A &) >
    :   public callable_traits_typedefs_t< void, A >
{};

I hope this code will be quite understandable for readers with good knowledge of C++. If not, feel free to ask me in the comments, I'll be glad to explain the logic behind lambda_traits_t and callable_traits_t in details.

我希望这些代码对于具有C ++知识的读者来说是可以理解的。如果没有，请随时在评论中问我，我将很高兴详细解释lambda_traits_t和callable_traits_t背后的逻辑。

stage()，broadcast()和operator |()函数 (stage(), broadcast() and operator|() functions)

Now we can look inside the main pipeline-building functions. But before that, it's necessary to take a look at the definition of a template class stage_t:

现在，我们可以查看主要的管道构建功能。但是在此之前，有必要查看一下模板类stage_t的定义：

template< typename In, typename Out >
struct stage_t
{
    stage_builder_t m_builder;
};

It's a very simple struct that holds just stage_bulder_t instance. Template parameters are not used inside stage_t, so why they are present here?

这是一个非常简单的结构，仅stage_bulder_t实例。模板参数未在stage_t内部stage_t ，那么为什么要在此处使用它们？

They are necessary for compile-time checking of type compatibility between pipeline stages. We'll see that soon.

它们是在管道阶段之间进行类型兼容性的编译时检查所必需的。我们很快就会看到。

Let's look at the simplest pipeline-building function, the stage():

让我们看一下最简单的管道构建函数stage() ：

template<
    typename Callable,
    typename In = typename callable_traits_t< Callable >::arg_type,
    typename Out = typename callable_traits_t< Callable >::result_type >
stage_t< In, Out >
stage( Callable handler )
{
    stage_builder_t builder{
            [h = std::move(handler)](
                coop_t & coop,
                mbox_t next_stage) -> mbox_t
            {
                return coop.make_agent< a_stage_point_t<In, Out> >(
                        std::move(h),
                        std::move(next_stage) )
                    ->so_direct_mbox();
            }
    };

    return { std::move(builder) };
}

It receives an actual stage handler as a single parameter. It can be a pointer to a function or lambda-function or functor. The types of stage's input and output are deduced automatically because of "template magic" behind callable_traits_t template.

它接收一个实际的阶段处理程序作为单个参数。它可以是指向函数，lambda函数或函子的指针。由于callable_traits_t模板后面的“模板魔术”，自动推导出了舞台的输入和输出类型。

An instance of stage builder is created inside and that instance is returned in a new stage_t object as the result of stage() function. An actual stage handler is captured by stage builder lambda, it'll then be used for the construction of an underlying SObjectizer's agent (we'll speak about that in the next section).

在内部创建了一个阶段构建器的实例，该实例作为stage()函数的结果返回到一个新的stage_t对象中。实际的阶段处理程序由阶段构建器 lambda捕获，然后将其用于构建基础SObjectizer的代理(我们将在下一部分中讨论)。

The next function to review is operator|() that concatenates two stages together and return a new stage:

下一个要检查的函数是operator|() ，它将两个阶段连接在一起并返回一个新阶段：

template< typename In, typename Out1, typename Out2 >
stage_t< In, Out2 >
operator|(
    stage_t< In, Out1 > && prev,
    stage_t< Out1, Out2 > && next )
{
    return {
        stage_builder_t{
            [prev, next]( coop_t & coop, mbox_t next_stage ) -> mbox_t
            {
                auto m = next.m_builder( coop, std::move(next_stage) );
                return prev.m_builder( coop, std::move(m) );
            }
        }
    };
}

The simplest way to explain the logic of operator|() is to try to draw a picture. Let's assume we have the expression:

解释operator|()逻辑的最简单方法是尝试绘制图片。假设我们有表达式：

stage(A) | stage(B) | stage(C) | stage(B)

This expression will be transformed that way:

该表达式将以这种方式转换：

There we can also see how compile-time type-checking is working: the definition of operator|() requires that the output type of the first stage is the input of the second stage. If this is not the case the code won't be compiled.

在这里，我们还可以看到编译时类型检查的工作原理： operator|()的定义要求第一阶段的输出类型是第二阶段的输入。如果不是这种情况，则不会编译代码。

And now we can take a look at the most complex pipeline-building function, the broadcast(). The function itself is rather simple:

现在，我们来看一下最复杂的管道构建函数broadcast() 。该函数本身非常简单：

template< typename In, typename Out, typename... Rest >
stage_t< In, void >
broadcast( stage_t< In, Out > && first, Rest &&... stages )
{
    stage_builder_t builder{
        [broadcasts = collect_sink_builders(
                move(first), forward< Rest >(stages)...)]
        ( coop_t & coop, mbox_t ) -> mbox_t
        {
            vector< mbox_t > mboxes;
            mboxes.reserve( broadcasts.size() );

            for( const auto & b : broadcasts )
                mboxes.emplace_back( b( coop, mbox_t{} ) );

            return broadcast_mbox_t::make( coop.environment(), std::move(mboxes) );
        }
    };

    return { std::move(builder) };
}

The main difference between an ordinary stage and broadcast-stage is that broadcast-stage has to hold a vector of subsidiary stage builders. So we have to create that vector and pass it into the main stage builder of broadcast-stage. Because of that, we can see a call to collect_sink_builders in a lambda's capture list inside broadcast() function:

普通阶段和广播阶段之间的主要区别在于，广播阶段必须拥有辅助阶段构建者的向量。因此，我们必须创建该矢量并将其传递到广播级的主级构建器中。因此，我们可以在broadcast()函数内的lambda捕获列表中看到对collect_sink_builders的调用：

stage_builder_t builder{
    [broadcasts = collect_sink_builders(
            move(first), forward< Rest >(stages)...)]

If we look into collect_sink_builder we'll see the following code:

如果我们查看collect_sink_builder我们将看到以下代码：

//
// Serie of helper functions for building description for
// `broadcast` stage.
//
// Those functions are used for collecting
// `builders` functions for every child pipeline.
//
// Please note that this functions checks that each child pipeline has the
// same In type.
//
template< typename In, typename Out, typename... Rest >
void
move_sink_builder_to(
    vector< stage_builder_t > & receiver,
    stage_t< In, Out > && first,
    Rest &&... rest )
{
    receiver.emplace_back( move( first.m_builder ) );
    if constexpr( 0u != sizeof...(rest) )
        move_sink_builder_to<In>( receiver, forward< Rest >(rest)... );
}

template< typename In, typename Out, typename... Rest >
vector< stage_builder_t >
collect_sink_builders( stage_t< In, Out > && first, Rest &&... stages )
{
    vector< stage_builder_t > receiver;
    receiver.reserve( 1 + sizeof...(stages) );
    move_sink_builder_to<In>(
            receiver,
            move(first),
            std::forward<Rest>(stages)... );

    return receiver;
}

Compile-time type-checking works here too: it's because a call to move_sink_builder_to explicitly parameterized by type 'In'. It means that a call in the form collect_sink_builders(stage_t<In1, Out1>, stage_t<In2, Out2>, ...) will lead to compile error because compiler prohibits a call move_sink_builder_to<In1>(receiver, stage_t<In2, Out2>, ...).

编译时类型检查也可以在这里工作：这是因为对move_sink_builder_to的调用由类型'In'显式地参数化了。这意味着以collect_sink_builders(stage_t<In1, Out1>, stage_t<In2, Out2>, ...)进行的调用将导致编译错误，因为编译器禁止调用move_sink_builder_to<In1>(receiver, stage_t<In2, Out2>, ...) 。

I can also note that because the count of subsidiary pipelines for broadcast() is known at compile-time we can use std::array instead of std::vector and can avoid some memory allocations. But std::vector is used here just for simplicity.

我还可以注意到，因为在编译时对broadcast()辅助管道的计数是已知的，所以我们可以使用std::array而不是std::vector ，并且可以避免一些内存分配。但是这里std::vector只是为了简单起见。

阶段与SObjectizer的代理/ mbox之间的关系 (Relation between stages and SObjectizer's agents/mboxes)

The idea behind the implementation of the pipeline is the creation of a separate agent for every pipeline stage. An agent receives an incoming message, passes it to the corresponding stage handler, analyzes the result and, if the result is not empty, sends the result as an incoming message to the next stage. It can be illustrated by the following sequence diagram:

实施管道背后的想法是为每个管道阶段创建一个单独的代理。代理接收传入的消息，将其传递到相应的阶段处理程序 ，分析结果，如果结果不为空，则将结果作为传入消息发送到下一个阶段。可以通过以下顺序图进行说明：

Some SObjectizer-related things have to be discussed, at least briefly. If you have no interest in such details you can skip the sections below and go to the conclusion directly.

必须至少简短地讨论一些与SObjectizer相关的事情。如果您对此类详细信息不感兴趣，可以跳过以下部分，直接查看结论。

合作社是一组一起工作的代理商 (Coop is a group of agents to work together)

Agents are introduced into SObjectizer not individually but in groups named coops. A coop is a group of agents that should work together and there is no sense to continue the work if one of the agents of the group is missing.

代理不是单独引入到SObjectizer中，而是以名为coops的组引入。合作社是一组应该一起工作的特工，并且如果缺少该组特工中的一个，则没有继续工作的感觉。

So the introduction of agents to SObjectizer looks like the creation of coop instance, filling that instance with the appropriate agents and then registering the coop in SObjectizer.

因此，将代理引入SObjectizer的过程类似于创建coop实例，用适当的代理填充该实例，然后在SObjectizer中注册该coop。

Because of that the first argument for a stage builder is a reference to a new coop. This coop is created in make_pipeline() function (discussed below), then it's populated by stage builders and then registered (again in the make_pipeline() function).

因此， 舞台构建器的第一个参数是对新合作社的引用。此合作社是在make_pipeline()函数中创建的(下面讨论)，然后由舞台构建器填充并进行注册(再次在make_pipeline()函数中)。

留言框 (Message boxes)

SObjectizer implements several concurrency-related models. The Actor Model just one of them. Because of that, SObjectizer can differ significantly from other actor frameworks. One of the differences is the addressing scheme for messages.

SObjectizer实现了几个与并发相关的模型。演员模型只是其中之一。因此，SObjectizer可以与其他参与者框架有很大的不同。区别之一是消息的寻址方案。

Messages in SObjectizer is addressed not to actors, but message boxes (mboxes). Actors have to subscribe to messages from a mbox. If an actor subscribed to a particular message type from a mbox it would receive messages of that type:

SObjectizer中的消息不是发送给参与者的，而是消息框 (mbox)的。演员必须从mbox订阅消息。如果演员从mbox订阅了特定的消息类型，它将收到该类型的消息：

This fact is crucial because it's necessary to send messages from one stage to another. It means that every stage should have its mbox and that mbox should be known for the previous stage.

这个事实至关重要，因为有必要将消息从一个阶段发送到另一个阶段。这意味着每个阶段都应有其mbox，并且上一个阶段应知道mbox。

Every actor (aka agent) in SObjectizer has the direct mbox. This mbox is associated only with the owner agent and can't be used by any other agents. The direct mboxes of agents created for stages will be used for stages interaction.

SObjectizer中的每个actor(aka代理)都有直接的mbox 。此mbox仅与所有者代理相关联，任何其他代理均不能使用。为阶段创建的代理的直接mbox将用于阶段交互。

This SObjectizer's specific feature dictates some pipeline-implementation details.

该SObjectizer的特定功能规定了一些管道实现细节。

The first is the fact that stage builder has the following prototype:

首先是舞台构建器具有以下原型的事实：

mbox_t builder(coop_t &, mbox_t);

It means that stage builder receives a mbox of the next stage and should create a new agent that will send the stage's results to that mbox. A mbox of the new agent should be returned by stage builder. That mbox will be used for the creation of an agent for the previous stage.

这意味着阶段构建器将接收下一个阶段的mbox，并应创建一个新的代理，该代理会将阶段的结果发送到该mbox。 阶段构建器应返回新代理的mbox。该mbox将用于创建上一阶段的代理。

The second is the fact that agents for stages are created in reserve order. It means that if we have a pipeline:

第二个事实是，阶段代理是按备用订单创建的。这意味着如果我们有管道：

stage(A) | stage(B) | stage(C)

An agent for stage C will be created first, then its mbox will be used for the creation of an agent for stage B, and then mbox of B-stage agent will be used for the creation of an agent for stage A.

首先创建阶段C的代理，然后将其mbox用于阶段B的代理创建，然后将B阶段代理的mbox用于阶段A的代理创建。

It also worth to note that operator|() doesn't create agents:

还应该注意， operator|()不会创建代理：

stage_builder_t{
    [prev, next]( coop_t & coop, mbox_t next_stage ) -> mbox_t
    {
        auto m = next.m_builder( coop, std::move(next_stage) );
        return prev.m_builder( coop, std::move(m) );
    }
}

The operator|() creates a builder that only calls other builders but doesn't introduce additional agents. So for the case:

operator|()创建一个仅调用其他构建器但不引入其他代理的构建器。因此，对于这种情况：

stage(A) | stage(B)

only two agents will be created (for A-stage and B-stage) and then they will be linked together in the stage builder created by operator|().

仅创建两个代理(用于A阶段和B阶段)，然后将它们在operator|()创建的阶段构建器中链接在一起。

没有用于`broadcast()`实现的代理 (There is no agent for `broadcast()` implementation)

An obvious way to implement a broadcasting stage is to create a special agent that will receive an incoming message and then resend that message to a list of destination mboxes. That way was used in the first implementation of the described pipeline DSL.

实现广播阶段的一种明显方法是创建一个特殊代理，该代理将接收传入的消息，然后将该消息重新发送到目标mbox列表。在描述的管道DSL 的第一个实现中使用了这种方式。

But our companion project, so5extra, now has a special variant of mbox: broadcasting one. That mbox does exactly what is required here: it takes a new message and delivers it to a set of destination mboxes.

但是我们的配套项目so5extra现在具有mbox的一个特殊变体：广播一个。该mbox完全满足此处的要求：它接收一条新消息并将其传递到一组目标mbox。

Because of that there is no need to create a separate broadcasting agent, we can just use broadcasting mbox from so5extra:

因此，无需创建单独的广播代理，我们可以仅使用so5extra的广播mbox：

//
// A special mbox for broadcasting of a message to a set of destination
// mboxes.
//
using broadcast_mbox_t = so_5::extra::mboxes::broadcast::fixed_mbox_template_t<>;
...
//
// Inside the broadcast() function:
//
stage_builder_t builder{
    [broadcasts = collect_sink_builders(
            move(first), forward< Rest >(stages)...)]
    ( coop_t & coop, mbox_t ) -> mbox_t
    {
        vector< mbox_t > mboxes;
        mboxes.reserve( broadcasts.size() );

        for( const auto & b : broadcasts )
            mboxes.emplace_back( b( coop, mbox_t{} ) );

        // That is the creation of broadcasting mbox instance.
        return broadcast_mbox_t::make( coop.environment(), std::move(mboxes) );
    }
};

实施阶段代理 (Implementation of stage-agent)

Now we can take a look at the implementation of stage agent:

现在我们来看看阶段代理的实现：

//
// An agent which will be used as intermediate or terminal pipeline stage.
// It will receive input message, call the stage handler and pass
// handler result to the next stage (if any).
//
template< typename In, typename Out >
class a_stage_point_t final : public agent_t
{
public :
    a_stage_point_t(
        context_t ctx,
        stage_handler_t< In, Out > handler,
        mbox_t next_stage )
        :   agent_t{ ctx }
        ,   m_handler{ move( handler ) }
        ,   m_next{ move(next_stage) }
    {}

    void so_define_agent() override
    {
        if( m_next )
            // Because there is the next stage the appropriate
            // message handler will be used.
            so_subscribe_self().event( [=]( const In & evt ) {
                    auto r = m_handler( evt );
                    if( r )
                        so_5::send( m_next, r );
                } );
        else
            // There is no next stage. A very simple message handler
            // will be used for that case.
            so_subscribe_self().event( [=]( const In & evt ) {
                    m_handler( evt );
                } );
    }

private :
    const stage_handler_t< In, Out > m_handler;
    const mbox_t m_next;
};

//
// A specialization of a_stage_point_t for the case of terminal stage of
// a pipeline. This type will be used for stage handlers with void
// return type.
//
template< typename In >
class a_stage_point_t< In, void > final : public agent_t
{
public :
    a_stage_point_t(
        context_t ctx,
        stage_handler_t< In, void > handler,
        mbox_t next_stage )
        :   agent_t{ ctx }
        ,   m_handler{ move( handler ) }
    {
        if( next_stage )
            throw std::runtime_error( "sink point cannot have next stage" );
    }

    void so_define_agent() override
    {
        so_subscribe_self().event( [=]( const In & evt ) {
                m_handler( evt );
            } );
    }

private :
    const stage_handler_t< In, void > m_handler;
};

It's rather trivial if you understand the SObjectizer's basics. If not it will be quite hard to explain in a few words (so feel free to ask questions in the comments).

如果您了解SObjectizer的基础知识，那将是微不足道的。如果不是，将很难用几句话来解释(所以请随时在评论中提问)。

The main implementation of a_stage_point_t agent creates a subscription to a message of type In. When a message of this type arrives the stage handler is called. If the stage handler returns an actual result the result is sent to the next stage (if that stage exists).

a_stage_point_t代理的主要实现创建对In类型消息的预订。当此类消息到达时，将调用阶段处理程序 。如果阶段处理程序返回实际结果，则将结果发送到下一个阶段(如果存在该阶段)。

There is also a version of a_stage_point_t for the case when the corresponding stage is the terminal stage and there can't be the next stage.

当相应的阶段是终端阶段而没有下一个阶段时，还有a_stage_point_t的版本。

The implementation of a_stage_point_t can look a bit complicated but believe me, it's one of the simplest agents I've written.

a_stage_point_t的实现可能看起来有些复杂，但请相信我，它是我编写的最简单的代理之一。

make_pipeline()函数 (make_pipeline() function)

It's time to discuss the last pipeline-building function, the make_pipeline():

现在该讨论最后一个管道构建函数make_pipeline() ：

template< typename In, typename Out, typename... Args >
mbox_t
make_pipeline(
    // SObjectizer Environment to work in.
    so_5::environment_t & env,
    // Definition of a pipeline.
    stage_t< In, Out > && sink,
    // Optional args to be passed to make_coop() function.
    Args &&... args )
{
    auto coop = env.make_coop( forward< Args >(args)... );

    auto mbox = sink.m_builder( *coop, mbox_t{} );

    env.register_coop( move(coop) );

    return mbox;
}

There is no magic nor surprises here. We just need to create a new coop for underlying agents of the pipeline, fill that coop with agents by calling a top-level stage builder, and then register that coop into SObjectizer. That all.

这里没有魔术，也没有惊喜。我们只需要为管道的基础代理创建一个新的合作社，通过调用顶级阶段构建器将该合作社填充到代理，然后将该合作社注册到SObjectizer中即可。就这样

The result of make_pipeline() is the mbox of the left-most (the first) stage of the pipeline. That mbox should be used for sending messages to the pipeline.

make_pipeline()的结果是make_pipeline()最左侧(第一级make_pipeline()的mbox。该mbox应该用于将消息发送到管道。

仿真和实验 (The simulation and experiments with it)

So now we have data types and functions for our application logic and the tools for chaining those functions into a data-processing pipeline. Let's do it and see a result:

因此，现在我们有了应用程序逻辑的数据类型和函数，以及将这些函数链接到数据处理管道的工具。让我们看一下结果：

int main()
{
    // Launch SObjectizer in a separate thread.
    wrapped_env_t sobj;

    // Make a pipeline.
    auto pipeline = make_pipeline( sobj.environment(),
            stage(validation) | stage(conversion) | broadcast(
                stage(archiving),
                stage(distribution),
                stage(range_checking) | stage(alarm_detector{}) | broadcast(
                    stage(alarm_initiator),
                    stage( []( const alarm_detected & v ) {
                            alarm_distribution( cerr, v );
                        } )
                    )
                ) );

    // Send messages to a pipeline in a loop with 10ms delays.
    for( uint8_t i = 0; i < static_cast< uint8_t >(250); i += 10 )
    {
        send< raw_value >(
                pipeline,
                raw_measure{ 0, 0, i } );
        std::this_thread::sleep_for( chrono::milliseconds{10} );
    }
}

If we run that example we'll see the following output:

如果运行该示例，我们将看到以下输出：

archiving (0,0)
distributing (0,0)
archiving (0,5)
distributing (0,5)
archiving (0,10)
distributing (0,10)
archiving (0,15)
distributing (0,15)
archiving (0,20)
distributing (0,20)
archiving (0,25)
distributing (0,25)
archiving (0,30)
distributing (0,30)
...
archiving (0,105)
distributing (0,105)
archiving (0,110)
distributing (0,110)
=== alarm (0) ===
alarm_distribution (0)
archiving (0,115)
distributing (0,115)
archiving (0,120)
distributing (0,120)
=== alarm (0) ===
alarm_distribution (0)

It works.

有用。

But it seems that stages of our pipeline work sequentially, one after another, isn't it?

但是似乎我们的管道阶段是依次进行的，不是吗？

Yes, it is. This is because all pipeline agents are bound to the default SObjectizer's dispatcher. And that dispatcher uses just one worker thread for serving message processing of all agents.

是的。这是因为所有管道代理都绑定到默认的SObjectizer的调度程序。该调度程序仅使用一个工作线程来处理所有代理的消息处理。

But this can be easily changed. Just pass an additional argument to make_pipeline() call:

但这很容易改变。只需将另一个参数传递给make_pipeline()调用即可：

// Make a pipeline.
auto pipeline = make_pipeline( sobj.environment(),
        stage(validation) | stage(conversion) | broadcast(
            stage(archiving),
            stage(distribution),
            stage(range_checking) | stage(alarm_detector{}) | broadcast(
                stage(alarm_initiator),
                stage( []( const alarm_detected & v ) {
                        alarm_distribution( cerr, v );
                    } )
                )
            ),
        disp::thread_pool::make_dispatcher( sobj.environment() ).binder(
            disp::thread_pool::bind_params_t{}.fifo(
                disp::thread_pool::fifo_t::individual ) )
);

This creates a new thread pool and binds all pipeline agents to that pool. Each agent will be served by the pool independently from other agents.

这将创建一个新的线程池，并将所有管道代理绑定到该池。每个代理将由池独立于其他代理提供服务。

If we run the modified example we can see something like that:

如果运行修改后的示例，我们将看到类似的内容：

archiving (0,0)
distributing (0,0)
distributing (0,5)
archiving (0,5)
archiving (0,10)
distributing (0,10)
distributing (archiving (0,15)
0,15)
archiving (0,20)
distributing (0,20)
archiving (0,25)
distributing (0,25)
archiving (0,distributing (030)
,30)
...
archiving (0,distributing (0,105)
105)
archiving (0,alarm_distribution (0)
distributing (0,=== alarm (0) ===
110)
110)
archiving (distributing (0,0,115)
115)
archiving (distributing (=== alarm (0) ===
0alarm_distribution (0)
0,120)
,120)

So we can see that different stages of the pipeline work in parallel.

因此，我们可以看到管道的不同阶段并行工作。

But is it possible to go further and to have an ability to bind stages to different dispatchers?

但是，是否可以走得更远，并具有将阶段绑定到不同调度程序的能力？

Yes, it is possible, but we have to implement another overload for stage() function:

是的，这是可能的，但是我们必须为stage()函数实现另一个重载：

template<
    typename Callable,
    typename In = typename callable_traits_t< Callable >::arg_type,
    typename Out = typename callable_traits_t< Callable >::result_type >
stage_t< In, Out >
stage( 
    disp_binder_shptr_t disp_binder,
    Callable handler )
{
    stage_builder_t builder{
            [binder = std::move(disp_binder), h = std::move(handler)](
                coop_t & coop,
                mbox_t next_stage) -> mbox_t
            {
                return coop.make_agent_with_binder< a_stage_point_t<In, Out> >(
                        std::move(binder),
                        std::move(h),
                        std::move(next_stage) )
                    ->so_direct_mbox();
            }
    };

    return { std::move(builder) };
}

This version of stage() accepts not only a stage handler but also a dispatcher binder. Dispatcher binder is a way to bind an agent to the particular dispatcher. So to assign a stage to a specific working context we can create an appropriate dispatcher and then pass the binder to that dispatcher to stage() function. Let's do that:

此版本的stage()不仅接受阶段处理程序 ，还接受调度程序绑定程序。调度程序绑定器是一种将代理绑定到特定调度程序的方法。因此，要将阶段分配给特定的工作上下文，我们可以创建适当的调度程序，然后将绑定程序传递给该调度程序的stage()函数。让我们这样做：

// An active_obj dispatcher to be used for some stages.
auto ao_disp = disp::active_obj::make_dispatcher( sobj.environment() );

// Make a pipeline.
auto pipeline = make_pipeline( sobj.environment(),
        stage(validation) | stage(conversion) | broadcast(
            stage(ao_disp.binder(), archiving),
            stage(ao_disp.binder(), distribution),
            stage(range_checking) | stage(alarm_detector{}) | broadcast(
                stage(ao_disp.binder(), alarm_initiator),
                stage(ao_disp.binder(), []( const alarm_detected & v ) {
                        alarm_distribution( cerr, v );
                    } )
                )
            ),
        disp::one_thread::make_dispatcher( sobj.environment() ).binder() );

In that case stages archiving, distribution, alarm_initiator and alarm_distribution will work on own worker threads. All other stages will work on the same single worker thread.

在这种情况下， archiving ， distribution ， alarm_initiator和alarm_distribution将在自己的工作线程上工作。所有其他阶段将在同一工作线程上工作。

结论 (The conclusion)

This was an interesting experiment and I was surprised how easy SObjectizer could be used in something like reactive programming or data-flow programming.

这是一个有趣的实验，令我惊讶的是SObjectizer可以在诸如React式编程或数据流编程之类的程序中轻松使用。

However, I don't think that pipeline DSL can be practically meaningful. It's too simple and, maybe not flexible enough. But, I hope, it can be a base for more interesting experiments for those why need to deal with different workflows and data-processing pipelines. At least as a base for some ideas in that area. C++ language a rather good here and some (not so complicated) template magic can help to catch various errors at compile-time.

但是，我认为管道DSL几乎没有实际意义。它太简单了，也许不够灵活。但是，我希望它可以为那些为什么需要处理不同的工作流和数据处理管道的人提供更多有趣实验的基础。至少作为该领域一些思想的基础。 C ++语言在这里相当不错，并且一些(不太复杂)模板魔术可以帮助在编译时捕获各种错误。

In conclusion, I want to say that we see SObjectizer not as a specialized tool for solving a particular problem, but as a basic set of tools to be used in solutions for different problems. And, more importantly, that basic set can be extended for your needs. Just take a look at SObjectizer, try it, and share your feedback. Maybe you missed something in SObjectizer? Perhaps you don't like something? Tell us, and we can try to help you.

总之，我想说的是，我们认为SObjectizer并不是解决特定问题的专用工具，而是作为用于解决不同问题的基本工具集。而且，更重要的是，该基本设置可以扩展以满足您的需求。只需看看SObjectizer ，尝试一下，然后分享您的反馈即可。也许您错过了SObjectizer中的某些功能？也许您不喜欢什么？告诉我们，我们会尽力为您提供帮助。

If you want to help further development of SObjectizer, please share a reference to it or to this article somewhere you want (Reddit, HackerNews, LinkedIn, Facebook, Twitter, ...). The more attention and the more feedback, the more new features will be incorporated into SObjectizer.

如果您想帮助进一步开发SObjectizer，请在您需要的地方(Reddit，HackerNews，LinkedIn，Facebook，Twitter等)共享对它或本文的引用。更多的关注和更多的反馈，更多的新功能将被整合到SObjectizer中。

And many thanks for reading this ;)

非常感谢您阅读本文;)

PS. The source code for that example can be found in that repository.

PS。该示例的源代码可以在该存储库中找到。

翻译自: https://habr.com/en/post/460123/

有符号数据参与位与

cullen2012

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
有符号数据参与位与_在参与者之上的声明式数据处理管道？为什么不？

有符号数据参与位与Some time ago, in a discussion on one of SObjectizer's releases, we were asked: "Is it possible to make a DSL to describe a data-processing pipeline?" In other words, is it possible to write ...
复制链接

扫一扫