Caffe2 - Operators 自定义
Caffe2 提供了很多 Operators - Operators Catalogue.
如果实际应用中需要自定义新的 Operators,其实现如下.
1. 自定义 Basic Operator
一般情况下,每个 Operator 包括两个文件:
- .cc - 注册 operator (registering of the operator)
- .h - 具体实现 operator (actual implementation)
也有例外,如:
- 某些情况下,operators 的实现是在 .cc 中;
- 某些 operators 有 GPU/CUDA 实现,是在 .cu 中.
如果涉及 CUDA kernel 的 CUDA 实现是 .cu ,则采用 NVCC 编译.
如果只是基于已有的 CUDA Libraries 库的实现,则可以命名为 _gpu.cc 以节省编译时间.
1. .cc - Operator 注册
以 operator FC 的 .cc - fully_connected_op.cc 为例.
#include "caffe2/operators/fully_connected_op.h"
namespace caffe2 {
namespace {
REGISTER_CPU_OPERATOR(FC, FullyConnectedOp<float, CPUContext>);
REGISTER_CPU_OPERATOR(FCGradient, FullyConnectedGradientOp<float, CPUContext>);
首先,注册 operator name 和对应的 gradient operator;Python 中使用 FullyConnectedOp operator 时是与函数 FC 一起的. 其中,float
和 CPUContext
分别表示输入类型,context 定义. 根据是 CPU 或 GPU 设备的使用,可以选择 CPUContext
或 CUDAContext
.
Fully Connected 的 GPU 实现 - fully_connected_op_gpu.cc.
#include "caffe2/core/context_gpu.h"
#include "caffe2/operators/fully_connected_op.h"
namespace caffe2 {
namespace {
REGISTER_CUDA_OPERATOR(FC, FullyConnectedOp<float, CUDAContext>);
REGISTER_CUDA_OPERATOR(FCGradient, FullyConnectedGradientOp<float, CUDAContext>);
} // namespace
} // namespace caffe2
GPU 实现与 CPU 实现的基本区别是:使用 REGISTER_CUDA_OPERATOR
和 CUDAContext
,而不是REGISTER_CPU_OPERATOR
和 CPUContext
.
头文件 context_gpu.h 是 GPU 实现必须的.
FC
Operator: 定义参数和函数 - .Arg(参数)
,.Input
和 .Output
.
OPERATOR_SCHEMA(FC)
.NumInputs(3)
.NumOutputs(1)
.SetDoc(R"DOC(
Computes the result of passing an input vector X into a fully connected layer with 2D weight matrix W and 1D bias vector b.
The layer computes Y = X * W + b, where X has size (M x K), W has size (K x N), b has size (N), and Y has size (M x N), where M is the batch size. Even though b is 1D, it is resized to size (M x N) implicitly and added to each vector in the batch. These dimensions must be matched correctly, or else the operator will throw errors.
)DOC")
.Arg("axis", "(int32_t) default to 1; describes the axis of the inputs; "
"defaults to one because the 0th axis most likely describes the batch_size")
.Input(0, "X", "2D input of size (MxK) data")
.Input(1, "W", "2D blob of size (KxN) containing fully connected weight "
"matrix")
.Input(2, "b", "1D blob containing bias vector")
.Output(0, "Y", "1D output tensor");
计算输入向量 X 传递到 2D权重W 和 1D 偏置向量的 FC 层.
FC 层计算 Y=X∗W+b ;
X - M x K
W - K x N
b - N (1D)
Y - M x N
M - batch size
b 虽然是 1D 的,但还是隐式的转换为 M x N,然后与 batch 内的每个向量相加.
FC Operator 有 3 个 inputs 和 1 个 output,分别由 .NumInputs
和 .NumOutputs
定义.
.SetDoc
- 定义了 Operator 的文档. .SetDocR"DOC(docs go here)DOC"
.Arg
- 可选参数,默认为 1.
.Input
- 设定 operator 的输入数据,如 FC 层的权重矩阵. 其中,第一个参数 0 表示 input 的索引,从 0 开始;第二个参数是变量名,可以是 X
,W
或 b
;第三个参数是描述.
.Output
- 指定输出. 参数类似于 .Input
.
FCGradient
Operator:
OPERATOR_SCHEMA(FCGradient).NumInputs(3).NumOutputs(2, 3);
class GetFCGradient : public GradientMakerBase {
using GradientMakerBase::GradientMakerBase;
vector<OperatorDef> GetGradientDefs() override {
CHECK_EQ(def_.input_size(), 3);
return SingleGradientDef(
"FCGradient", "",
vector<string>{I(0), I(1), GO(0)},
vector<string>{GI(1), GI(2), GI(0)});
}
};
REGISTER_GRADIENT(FC, GetFCGradient);
} // namespace
} // namespace caffe2
FCGradient
Operator 的 input 和 output 记为:GradientMakerBase::GetGradientDefs()
. 采用这种方式,可以有效的传递 gradient operator 的 inputs 和 outputs 与对应 operator 的关系. 第一个 vector 对应 gradient operator 的 inputs;第二个 vector 对应 gradient operator 的 outputs.
1.2 .h - Operator 实现
一般情况下,Operator 的实现细节在头文件 .h 中. CUDA 的实现,在 .cu 中.
2. Caffe2 Operators 的单元测试(Unit Testing)
单元测试能够确保 operator 的正确实现.
Caffe2 提供了一些辅助 Libraries 库来进行自定义 operator 的 test.
Hypothesis - 性能测试(property-based testing) Libraries.
caffe2/python/hypothesis_test_util.py - HypothesisTestCase
.
Operator 的单元测试可以添加到 Caffe2 提供的单元测试目录:caffe2/caffe2/python/operator_tests/.
主要涉及的函数:
assertDeviceChecks(devices, op, inputs, outputs)
确保 operator 计算的输出相同,不受设备影响.
见[样例1].
assertGradientChecks(device, op, inputs, output_, outputs_with_grads)
operator 的标准数值梯度计算的实现.
见[样例1].
assertReferenceChecks(device, op, inputs, reference)
运行参考函数,调用
reference(*inputs)
,并对比 output.hypothesis_test_util.py 提供了一些使用样例.
见[样例2].
hu.gcs
gradient checker device (gc) 和 device checker devices (dc)
hu.gcs_cpu_only
仅 CPU 实现的 Operator 时, gradient checker device (gc) 和 device checker devices (dc).
样例 1:
@given(X=hu.tensor(), **hu.gcs)
def test_averaged_loss(self, X, gc, dc):
op = core.CreateOperator("AveragedLoss", ["X"], ["loss"])
self.assertDeviceChecks(dc, op, [X], [0])
self.assertGradientChecks(gc, op, [X], 0, [0])
样例 2:
@given(inputs=hu.tensors(n=3),
in_place=st.booleans(),
beta1=st.floats(min_value=0.1, max_value=0.9),
beta2=st.floats(min_value=0.1, max_value=0.9),
lr=st.floats(min_value=0.1, max_value=0.9),
iters=st.integers(min_value=1, max_value=10000),
epsilon=st.floats(min_value=1e-5, max_value=1e-2),
**hu.gcs)
def test_adam(self, inputs, in_place, beta1, beta2, lr, iters, epsilon,
gc, dc):
grad, m1, m2 = inputs
m2 += np.abs(m2) + 0.01
lr = np.asarray([lr], dtype=np.float32)
iters = np.asarray([iters], dtype=np.int32)
op = core.CreateOperator(
"Adam",
["grad", "m1", "m2", "lr", "iters"],
["grad" if in_place else "grad_o",
"m1" if in_place else "m1_o",
"m2" if in_place else "m2_o"],
beta1=beta1, beta2=beta2, epsilon=epsilon,
device_option=gc)
input_device_options = {"lr": hu.cpu_do, "iters": hu.cpu_do}
self.assertDeviceChecks(
dc, op, [grad, m1, m2, lr, iters], [0], input_device_options)
# Reference
def adam(grad, m1, m2, lr, iters):
lr = lr[0]
iters = iters[0]
t = iters + 1
corrected_local_rate = lr * np.sqrt(1. - np.power(beta2, t)) / \
(1. - np.power(beta1, t))
m1_o = (beta1 * m1) + (1. - beta1) * grad
m2_o = (beta2 * m2) + (1. - beta2) * np.square(grad)
grad_o = corrected_local_rate * m1_o / \
(np.sqrt(m2_o) + epsilon)
return (grad_o, m1_o, m2_o)
self.assertReferenceChecks(gc, op, [grad, m1, m2, lr, iters],
adam, input_device_options)
样例 3:
@given(prediction=hu.arrays(dims=[10, 3],
elements=st.floats(allow_nan=False,
allow_infinity=False,
min_value=0,
max_value=1)),
labels=hu.arrays(dims=[10],
dtype=np.int32,
elements=st.integers(min_value=0, max_value=3 - 1)),
**hu.gcs)
def test_accuracy(self, prediction, labels, gc, dc):
op = core.CreateOperator("Accuracy",
["prediction", "labels"],
["accuracy"])
def op_ref(prediction, labels):
N = prediction.shape[0]
correct = 0
max_ids = np.argmax(prediction, axis=1)
for i in range(0, N):
if max_ids[i] == labels[i]:
correct += 1
accuracy = correct / N
return (accuracy,)
self.assertReferenceChecks(device_option=gc,
op=op,
inputs=[prediction, labels],
reference=op_ref)