Caffe2 - (十二)自定义 Operators

最新推荐文章于 2024-08-21 09:30:35 发布

AIHGF

最新推荐文章于 2024-08-21 09:30:35 发布

阅读量2k

点赞数

分类专栏： Caffe2 Caffe2 文章标签： Caffe2

本文链接：https://blog.csdn.net/zziahgf/article/details/79039225

版权

Caffe2 同时被 2 个专栏收录

37 篇文章 2 订阅

订阅专栏

Caffe2

37 篇文章 45 订阅

订阅专栏

Caffe2 - Operators 自定义

Caffe2 提供了很多 Operators - Operators Catalogue.

如果实际应用中需要自定义新的 Operators，其实现如下.

1. 自定义 Basic Operator

一般情况下，每个 Operator 包括两个文件：

.cc - 注册 operator (registering of the operator)
.h - 具体实现 operator (actual implementation)

也有例外，如：

某些情况下，operators 的实现是在 .cc 中；
某些 operators 有 GPU/CUDA 实现，是在 .cu 中.

如果涉及 CUDA kernel 的 CUDA 实现是 .cu ，则采用 NVCC 编译.

如果只是基于已有的 CUDA Libraries 库的实现，则可以命名为 _gpu.cc 以节省编译时间.

1. .cc - Operator 注册

以 operator FC 的 .cc - fully_connected_op.cc 为例.

#include "caffe2/operators/fully_connected_op.h"

namespace caffe2 {
namespace {

REGISTER_CPU_OPERATOR(FC, FullyConnectedOp<float, CPUContext>);
REGISTER_CPU_OPERATOR(FCGradient, FullyConnectedGradientOp<float, CPUContext>);

首先，注册 operator name 和对应的 gradient operator；Python 中使用 FullyConnectedOp operator 时是与函数 FC 一起的. 其中，float 和 CPUContext分别表示输入类型，context 定义. 根据是 CPU 或 GPU 设备的使用，可以选择 CPUContext 或 CUDAContext.

Fully Connected 的 GPU 实现 - fully_connected_op_gpu.cc.

#include "caffe2/core/context_gpu.h"
#include "caffe2/operators/fully_connected_op.h"

namespace caffe2 {
namespace {
REGISTER_CUDA_OPERATOR(FC, FullyConnectedOp<float, CUDAContext>);
REGISTER_CUDA_OPERATOR(FCGradient, FullyConnectedGradientOp<float, CUDAContext>);
}  // namespace
}  // namespace caffe2

GPU 实现与 CPU 实现的基本区别是：使用 REGISTER_CUDA_OPERATOR 和 CUDAContext，而不是REGISTER_CPU_OPERATOR 和 CPUContext.

头文件 context_gpu.h 是 GPU 实现必须的.

fully_connected_op.cc 中，

FC Operator：定义参数和函数 - .Arg(参数)，.Input 和 .Output.

OPERATOR_SCHEMA(FC)
  .NumInputs(3)
  .NumOutputs(1)
  .SetDoc(R"DOC(
Computes the result of passing an input vector X into a fully connected layer with 2D weight matrix W and 1D bias vector b.

The layer computes Y = X * W + b, where X has size (M x K), W has size (K x N), b has size (N), and Y has size (M x N), where M is the batch size. Even though b  is 1D, it is resized to size (M x N) implicitly and added to each vector in the batch. These dimensions must be matched correctly, or else the operator will throw errors.
)DOC")
  .Arg("axis", "(int32_t) default to 1; describes the axis of the inputs; "
  "defaults to one because the 0th axis most likely describes the batch_size")
  .Input(0, "X", "2D input of size (MxK) data")
  .Input(1, "W", "2D blob of size (KxN) containing fully connected weight "
  "matrix")
  .Input(2, "b", "1D blob containing bias vector")
  .Output(0, "Y", "1D output tensor");

计算输入向量 X 传递到 2D权重W 和 1D 偏置向量的 FC 层.

FC 层计算 $Y= X * W + b$ ；

X - M x K

W - K x N

b - N (1D)

Y - M x N

M - batch size

b 虽然是 1D 的，但还是隐式的转换为 M x N，然后与 batch 内的每个向量相加.

FC Operator 有 3 个 inputs 和 1 个 output，分别由 .NumInputs 和 .NumOutputs 定义.

.SetDoc - 定义了 Operator 的文档. .SetDocR"DOC(docs go here)DOC"

.Arg - 可选参数，默认为 1.

.Input - 设定 operator 的输入数据，如 FC 层的权重矩阵. 其中，第一个参数 0 表示 input 的索引，从 0 开始；第二个参数是变量名，可以是 X，W 或 b；第三个参数是描述.

.Output - 指定输出. 参数类似于 .Input.

FCGradient Operator：

OPERATOR_SCHEMA(FCGradient).NumInputs(3).NumOutputs(2, 3);
class GetFCGradient : public GradientMakerBase {
  using GradientMakerBase::GradientMakerBase;
  vector<OperatorDef> GetGradientDefs() override {
    CHECK_EQ(def_.input_size(), 3);
    return SingleGradientDef(
        "FCGradient", "",
        vector<string>{I(0), I(1), GO(0)},
        vector<string>{GI(1), GI(2), GI(0)});
  }
};
REGISTER_GRADIENT(FC, GetFCGradient);
}  // namespace
}  // namespace caffe2

FCGradient Operator 的 input 和 output 记为：GradientMakerBase::GetGradientDefs() . 采用这种方式，可以有效的传递 gradient operator 的 inputs 和 outputs 与对应 operator 的关系. 第一个 vector 对应 gradient operator 的 inputs；第二个 vector 对应 gradient operator 的 outputs.

1.2 .h - Operator 实现

一般情况下，Operator 的实现细节在头文件 .h 中. CUDA 的实现，在 .cu 中.

fully_connected_op.h

2. Caffe2 Operators 的单元测试(Unit Testing)

单元测试能够确保 operator 的正确实现.

Caffe2 提供了一些辅助 Libraries 库来进行自定义 operator 的 test.

Hypothesis - 性能测试(property-based testing) Libraries.

caffe2/python/hypothesis_test_util.py - HypothesisTestCase.

Operator 的单元测试可以添加到 Caffe2 提供的单元测试目录：caffe2/caffe2/python/operator_tests/.

主要涉及的函数：

assertDeviceChecks(devices, op, inputs, outputs)

确保 operator 计算的输出相同，不受设备影响.

见[样例1].
assertGradientChecks(device, op, inputs, output_, outputs_with_grads)

operator 的标准数值梯度计算的实现.

见[样例1].
assertReferenceChecks(device, op, inputs, reference)

运行参考函数，调用reference(*inputs)，并对比 output.

hypothesis_test_util.py 提供了一些使用样例.

见[样例2].
hu.gcs

gradient checker device (gc) 和 device checker devices (dc)
hu.gcs_cpu_only

仅 CPU 实现的 Operator 时， gradient checker device (gc) 和 device checker devices (dc).

样例 1：

@given(X=hu.tensor(), **hu.gcs)
def test_averaged_loss(self, X, gc, dc):
    op = core.CreateOperator("AveragedLoss", ["X"], ["loss"])
    self.assertDeviceChecks(dc, op, [X], [0])
    self.assertGradientChecks(gc, op, [X], 0, [0])

样例 2：

@given(inputs=hu.tensors(n=3),
       in_place=st.booleans(),
       beta1=st.floats(min_value=0.1, max_value=0.9),
       beta2=st.floats(min_value=0.1, max_value=0.9),
       lr=st.floats(min_value=0.1, max_value=0.9),
       iters=st.integers(min_value=1, max_value=10000),
       epsilon=st.floats(min_value=1e-5, max_value=1e-2),
       **hu.gcs)
def test_adam(self, inputs, in_place, beta1, beta2, lr, iters, epsilon,
              gc, dc):
    grad, m1, m2 = inputs
    m2 += np.abs(m2) + 0.01
    lr = np.asarray([lr], dtype=np.float32)
    iters = np.asarray([iters], dtype=np.int32)
    op = core.CreateOperator(
        "Adam",
        ["grad", "m1", "m2", "lr", "iters"],
        ["grad" if in_place else "grad_o",
         "m1" if in_place else "m1_o",
         "m2" if in_place else "m2_o"],
        beta1=beta1, beta2=beta2, epsilon=epsilon,
        device_option=gc)
    input_device_options = {"lr": hu.cpu_do, "iters": hu.cpu_do}
    self.assertDeviceChecks(
        dc, op, [grad, m1, m2, lr, iters], [0], input_device_options)

    # Reference
    def adam(grad, m1, m2, lr, iters):
        lr = lr[0]
        iters = iters[0]
        t = iters + 1
        corrected_local_rate = lr * np.sqrt(1. - np.power(beta2, t)) / \
            (1. - np.power(beta1, t))

        m1_o = (beta1 * m1) + (1. - beta1) * grad
        m2_o = (beta2 * m2) + (1. - beta2) * np.square(grad)
        grad_o = corrected_local_rate * m1_o / \
            (np.sqrt(m2_o) + epsilon)
        return (grad_o, m1_o, m2_o)

    self.assertReferenceChecks(gc, op, [grad, m1, m2, lr, iters],
                               adam, input_device_options)

样例 3：

@given(prediction=hu.arrays(dims=[10, 3],
                            elements=st.floats(allow_nan=False,
                                               allow_infinity=False,
                                               min_value=0,
                                               max_value=1)),
       labels=hu.arrays(dims=[10],
                        dtype=np.int32,
                        elements=st.integers(min_value=0, max_value=3 - 1)),
        **hu.gcs)
def test_accuracy(self, prediction, labels, gc, dc):
    op = core.CreateOperator("Accuracy",
                             ["prediction", "labels"],
                             ["accuracy"])

    def op_ref(prediction, labels):
        N = prediction.shape[0]
        correct = 0
        max_ids = np.argmax(prediction, axis=1)
        for i in range(0, N):
            if max_ids[i] == labels[i]:
                correct += 1
        accuracy = correct / N
        return (accuracy,)

    self.assertReferenceChecks(device_option=gc,
                               op=op,
                               inputs=[prediction, labels],
                               reference=op_ref)