MXNet-操作符-Part2

最新推荐文章于 2020-09-20 18:13:03 发布

huareal

最新推荐文章于 2020-09-20 18:13:03 发布

阅读量475

点赞数

本文链接：https://blog.csdn.net/huareal/article/details/72855677

版权

继续补充MXNet的操作符的内容

统一NDArray Opeartor和符号Opeartor

两者相似，区别在于后者有一个完整的依赖图；而逻辑底层基本是相同的；

SimpleOp 新的统一化运算符API

为I了构造一个依赖图，你必须搞懂：

是否有输出值、输入数据，或者头部梯度之外什么都不需要

统一操作API的梯度函数，会随着计算的操作类型的差异而辨别。

在了解更多的统一操作符API之前，首先学习下mshadow library guide

计算都为在mshasow::TBlob结构体中

实例：创建一个运算符函数smooth l1 loss，该函数混合了 l1 loss and l2 loss

 loss = outside_weight .* f(inside_weight .* (data - label))
 grad = outside_weight .* inside_weight .* f'(inside_weight .* (data - label))

.*表示：元素相乘

SimpleOP：统一的运算符API

定义Shapes

mshadow库需要显示的内存分配，因而，所有的DataShapes必须在计算发生前准备完毕；

在我们用定义好的函数和梯度处理前，先检查输入的Shape的一致性以及输出的shape

    typedef TShape (*UnaryShapeFunction)(const TShape& src, const EnvArguments& env);
    typedef TShape (*BinaryShapeFunction)(const TShape& const TShape& rhs,lhs,
                                          const EnvArguments& env);

备注：感受此处定义有误，待后续矫正。。

利用mshadow::TShape来检查input Data Shape,并且指派输出Data Shape.

如果不定义该函数，则Input Shape和Output Shape类型一致；

也可以用shape函数来检查附加的参数和资源是可用的；同时关注EnvArguments

    #include <mxnet/operator_util.h>
    #if defined(__CUDACC__)
    #define XPU gpu
    #else
    #define XPU cpu
    #endif

在开始smooth l1 loss实例前，定义一个XPU to cpu or cpu在smooth_l1_unary-inl.h实例中

这样来实现重用 smooth_l1_unary.cc 和smooth_l1_unary.cu的相同代码

在smooth l1 loss实例中，可以使用缺省的行为；

inline TShape SmoothL1Shape_(const TShape& src,
                                 const EnvArguments& env) {
      return TShape(src);

定义函数

通过mshadow:TBlob来创建一个unary或者binary函数

    typedef void (*UnaryFunction)(const TBlob& src,
                                  const EnvArguments& env,
                                  TBlob* ret,
                                  OpReqType req,
                                  RunContext ctx);
    typedef void (*BinaryFunction)(const TBlob& lhs,
                                   const TBlob& rhs,
                                   const EnvArguments& env,
                                   TBlob* ret,
                                   OpReqType req,
                                   RunContext ctx);

函数会根据输入参数的类型，而具备差异性

关注下RunContext,在执行器的运行时是必须的

 struct RunContext {
          void *stream;  // the stream of the device, can be NULL or Stream<gpu>* in GPU mode
          template<typename xpu> inline mshadow::Stream<xpu>* get_stream() // get mshadow stream from Context
        }  // namespace mxnet

从ctx上获取stream的实例

mshadow::stream<xpu> *s = ctx.get_stream<xpu>();

OpReqType req 表示计算结果如何写入到ret中

enum OpReqType {
          kNullOp,  // no operation, do not write anything
          kWriteTo,  // write gradient to provided space
          kWriteInplace,  // perform an in-place write
          kAddTo  // add to the provided space
        };

在operator_util.h中定义了宏，来简化OpReqType.ASSIGN_DISPATCH(out,req,exp)

并且检查req来执行赋值

template<typename xpu>
    void SmoothL1Forward_(const TBlob& src,
                          const EnvArguments& env,
                          TBlob *ret,
                          OpReqType req,
                          RunContext ctx) {
      using namespace mshadow;
      using namespace mshadow::expr;
      mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
      real_t sigma2 = env.scalar * env.scalar;
      MSHADOW_TYPE_SWITCH(ret->type_flag_, DType, {
        mshadow::Tensor<xpu, 2, DType> out = ret->get<xpu, 2, DType>(s);
        mshadow::Tensor<xpu, 2, DType> in = src.get<xpu, 2, DType>(s);
        ASSIGN_DISPATCH(out, req,
                        F<mshadow_op::smooth_l1_loss>(in, ScalarExp<DType>(sigma2)));
      });
    }

注意:

MSHADOW_TYPE_SWITCH(type,DType,...)处理不同类型的详细

ASSIGN_DISPATCH(out,req,exp)检查OpReqType，并执行相关action

定义梯度

可以根据输入的多种类型，来创建一个gradient函数

    // depending only on out_grad
    typedef void (*UnaryGradFunctionT0)(const OutputGrad& out_grad,
                                        const EnvArguments& env,
                                        TBlob* in_grad,
                                        OpReqType req,
                                        RunContext ctx);
    // depending only on out_value
    typedef void (*UnaryGradFunctionT1)(const OutputGrad& out_grad,
                                        const OutputValue& out_value,
                                        const EnvArguments& env,
                                        TBlob* in_grad,
                                        OpReqType req,
                                         RunContext ctx);
    // depending only on in_data
    typedef void (*UnaryGradFunctionT2)(const OutputGrad& out_grad,
                                        const Input0& in_data0,
                                        const EnvArguments& env,
                                        TBlob* in_grad,
                                        OpReqType req,
                                        RunContext ctx);

二进制运算符的梯度函数具有类型的结构；处理Input,TBlob,OpReqType是doubled

GradFunctionArgument,Input0,Input,OutputValue和OutputGrad都共享GradFunctionArgument结构

struct GradFunctionArgument {
        TBlob data;
    }

关注smooth l1 loss实例

template<typename xpu>
    void SmoothL1BackwardUseIn_(const OutputGrad& out_grad,
                                const Input0& in_data0,
                                const EnvArguments& env,
                                TBlob *in_grad,
                                OpReqType req,
                                RunContext ctx) {
      using namespace mshadow;
      using namespace mshadow::expr;
      mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
      real_t sigma2 = env.scalar * env.scalar;
      MSHADOW_TYPE_SWITCH(in_grad->type_flag_, DType, {
        mshadow::Tensor<xpu, 2, DType> src = in_data0.data.get<xpu, 2, DType>(s);
        mshadow::Tensor<xpu, 2, DType> ograd = out_grad.data.get<xpu, 2, DType>(s);
        mshadow::Tensor<xpu, 2, DType> igrad = in_grad->get<xpu, 2, DType>(s);
         ASSIGN_DISPATCH(igrad, req,
                        ograd * F<mshadow_op::smooth_l1_gradient>(src, ScalarExp<DType>(sigma2)));
      });
    }
 
    
  
   

注意是一个f'(x),利用输入来进行梯度计算，UnaryGradFunctionT2是合适的；

为了保证梯度的链条规则，我们需要从in_grad的结果的头部来乘out_grad

把SimpleOP注册到MXNet

在定义完shape,function,gradient后，可以存储为NDArray或者Symbolic操作符；为了简化这步处理，可以用operator_util.h的宏

MXNET_REGISTER_SIMPLE_OP(Name, DEV)
    .set_shape_function(Shape)
    .set_function(DEV::kDevMask, Function<XPU>, SimpleOpInplaceOption)
    .set_gradient(DEV::kDevMask, Gradient<XPU>, SimpleOpInplaceOption)
    .describe("description");

SimpleOpInplaceOption定义如下：

enum SimpleOpInplaceOption {
      kNoInplace,  // do not allow inplace in arguments
      kInplaceInOut,  // allow inplace in with out (unary)
      kInplaceOutIn,  // allow inplace out_grad with in_grad (unary)
      kInplaceLhsOut,  // allow inplace left operand with out (binary)
      kInplaceOutLhs  // allow inplace out_grad with lhs_grad (binary)
    };

关注inplace

我们有一个梯度函数，依赖输入数据;所有该function不能in place写入；输出梯度在梯度计算后没有其它用处；可以in place写入。

    MXNET_REGISTER_SIMPLE_OP(smooth_l1, XPU)
    .set_function(XPU::kDevMask, SmoothL1Forward_<XPU>, kNoInplace)
    .set_gradient(XPU::kDevMask, SmoothL1BackwardUseIn_<XPU>, kInplaceOutIn)
    .set_enable_scalar(true)
    .describe("Calculate Smooth L1 Loss(lhs, scalar)");

在shape函数的讨论中，没有set_shape_function的缺省行为，要求输入具有相同的shape，然后再生成相同的shape作为输出；

我们会后续讨论set_enable_scalar。

NDArray运算符汇总

1.为了决定输出shape来创建一个shape函数

2.通过选择一个合适的函数类型，创建一个函数作为forward 例行程序

3.通过选择一个合适的梯度类型，创建一个梯度作为backward例行程序

4.通过注册process来注册运算符

SimpleOP的附件信息

在环境参数上使用SimpleOP

一些运算符需要一个标量作为输入，比如梯度规模，一套控制行为的关键词参数，一个临时空间来加速计算

可以通过EvnArguments提供额外的参数和资源，来进行可伸缩性和高效的计算。

struct EnvArguments {
      real_t scalar;  // scalar argument, if enabled
      std::vector<std::pair<std::string, std::string> > kwargs;  // keyword arguments
      std::vector<Resource> resource;  // pointer to the resources requested
    };

为了这些附加的特性，需要更多的注册参数；

为了避免参数的混乱性，scalar和kwargs不能够同时出现

为了使scalar生效，使用set_enable_scalar(bool enable_scalar)来注册，然后在前向的梯度函数中，scalar能够从env.scala获取，通过EnvArguments env

为了使kwargs生效，使用set_enable_kwargs(bool enable_kwargs)来注册，然后在前向的梯度函数中，额外的参数包含在env.kwarg中。

而这些函数定义在std::vector<std::pair<std::string, std::string> >中，通过DMLC参数构造来简化关键字参数的解析

关注parameter structure

额外的资源像mshadow::Randow<xpu> 和临时的内存空间可以通过EnvArguments.resource来存取和获得请求；

注册程序是：

set_resource_request(ResourceRequest req)或者set_resource_request(const std::vector<ResourceRequest>)

定义如下：

struct ResourceRequest {
      enum Type {  // Resource type, indicating what the pointer type is
        kRandom,  // mshadow::Random<xpu> object
        kTempSpace  // A dynamic temp space that can be arbitrary size
      };
      Type type;  // type of resources
    };

注册请求是需要被声明的资源请求的，从mxnet::ResourceManager

并且把资源安置在std::vector<Resource> resource和EnvArguments

为了获取资源，通过

auto tmp_space_res = env.resources[0].get_space(some_shape, some_stream);
auto rand_res = env.resources[0].get_random(some_stream);

为此：

在smooth l1 loss实例中,标识loss函数的转换点必须一个 scalar input

所以，在注册的处理过程中,我们使用set_enable_scalar(true),并且在梯度和函数声明中使用 env.scalar 。

手工制作一个Tensor运算符

计算利用mshadow库，所以我们能做operator实现中来创建一个tensor运算符。

如果想定义逐个元素的函数. 可以用mxnet::op::mshadow_op /src/operator/mshadow_op.h来实现。

可以参考mshadow experssion API guide

如果一个操作符，不能通过逐个元素的方式定义；想softmax loss和gradient，需要创建一个新的tensor运算符;

比如mshadow::cuda，创建两个mappers

namespace mshadow_op {
    struct smooth_l1_loss {
      // a is x, b is sigma2
      MSHADOW_XINLINE static real_t Map(real_t a, real_t b) {
        if (a > 1.0f / b) {
          return a - 0.5f / b;
        } else if (a < -1.0f / b) {
          return -a - 0.5f / b;
        } else {
          return 0.5f * a * a * b;
        }
      }
    };
    }

超出2个运算数

新的统一化API被用来设计作为实现运算符的基础；针对两个输出参数以上的运算符；可以参考Opeartor API

huareal

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MXNet-操作符-Part2

继续补充MXNet的操作符的内容统一NDArray Opeartor和符号Opeartor两者相似，区别在于后者有一个完整的依赖图；而逻辑底层基本是相同的；SimpleOp 新的统一化运算符API为I了构造一个依赖图，你必须搞懂：是否有输出值、输入数据，或者头部梯度之外什么都不需要统一操作API的梯度函数，会随着计算的操作类型的差异而辨别。在了解更多的统一操作符
复制链接

扫一扫