nvidia迟迟不更新终端上的TensorRT,没办法,只能自己注册了。
拜读了几位大佬的blogTensorRT5.1.5.0 实践 onnx-TensorRT的自定义op,看到builtin_op_importers.cpp的理解
那里有点不理解,为什么input[0], input[1], input[2]分别代表conv输入的tensor、weight和bias。后来想到看看onnx图:
graph(%0 : Float(1!, 3!, 112, 112!)
%1 : Float(64, 3, 3, 3)
%2 : Float(64)
%3 : Float(64)
{
%147 : Float(1, 64, 56, 56) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[2, 2]](%0, %1, %2), scope: Resnet20/Conv2d[conv1_1]
如图所示:每一行op名称后跟着的“[]”里面的是网络结构参数,“()”里面的是代表上一层的“%n”,“参数1”和“参数2”
对应到conv,参数1、2就分别为weight和bias。
所以对于PReLU,他的weight自然就是input[1]
后来发现有一个曲线救国的办法,后续的就参照小伙伴的blog吧。但这种方案需要修改网络结构,比较麻烦,下面介绍一种不需要修改网络的替代方案.
另外一种曲线救国方案
已经经过测试,但用在比较大的网络上并不合适,修改builtin_op_importers.cpp:
DEFINE_BUILTIN_OP_IMPORTER(PRelu) {
ASSERT(inputs.at(0).is_tensor(), ErrorCode::kUNSUPPORTED_NODE);
ASSERT(inputs.at(1).is_weights(), ErrorCode::kUNSUPPORTED_NODE);
ShapedWeights weights = inputs.at(1).weights();
nvinfer1::Dims input_dims = inputs.at(0).tensor().getDimensions();
ASSERT(weights.type == ::ONNX_NAMESPACE::TensorProto::FLOAT,
ErrorCode::kUNSUPPORTED_NODE);
// TODO: Add support for per-channel scale factor
nvinfer1::ITensor& tensor = convertToTensor(inputs.at(1), ctx);
nvinfer1::Dims dims = tensor.getDimensions();
int nchan = dims.d[0];
nvinfer1::Dims scalar_shape{1, {nchan}};
ASSERT(weights.shape == scalar_shape, ErrorCode::kUNSUPPORTED_NODE);
size_t nweight = nchan;
std::vector<float> alpha;
for(size_t i=0; i<nweight; i++)
{
alpha.push_back( (static_cast<float const*>(weights.values))[i] );
}
std::vector<int> output_lengths;
output_lengths.resize(nweight);
for(size_t i=0; i<nweight; i++)
output_lengths[i] = 1;
int noutput = nweight;
nvinfer1::IPluginV2Layer* layer =
ctx->addPluginV2(new SplitPlugin(0, output_lengths),
{&convertToTensor(inputs.at(0), ctx)});
ASSERT(layer, ErrorCode::kUNSUPPORTED_NODE);
ASSERT(layer->getNbOutputs() == noutput, ErrorCode::kINTERNAL_ERROR);
std::vector<TensorOrWeights> outputs;
for( int i=0; i<noutput; ++i ) {
outputs.push_back(layer->getOutput(i));
}
// nvinfer1::ITensor* test_tensor;
std::vector<nvinfer1::ITensor*> after_leak_relu;
after_leak_relu.resize(nweight);
for(size_t i=0; i<nweight; i++)
{
nvinfer1::IPluginV2Layer* layer =
ctx->addPluginV2(new FancyActivationPlugin(FancyActivationPlugin::LEAKY_RELU, alpha[i]),
{&convertToTensor(outputs[i], ctx)});
after_leak_relu[i] = layer->getOutput(0);
}
// nvinfer1::ITensor* after_concat;
auto* concat_layer = ctx->network()->addConcatenation( after_leak_relu.data(), after_leak_relu.size() );
ASSERT(concat_layer, ErrorCode::kUNSUPPORTED_NODE);
concat_layer->setAxis(0);
auto output = concat_layer->getOutput(0);
return {{output}};
RETURN_FIRST_OUTPUT(concat_layer);
}
真正终极解决方案在TensorRT中注册PReLU
DEFINE_BUILTIN_OP_IMPORTER(PRelu) {
nvinfer1::ITensor& tensor = convertToTensor(inputs.at(0), ctx);
nvinfer1::Dims dims = tensor.getDimensions();
int nchan = dims.d[0];
nvinfer1::Dims scalar_shape{1, {nchan}};
ShapedWeights weights = inputs.at(1).weights();
ASSERT(weights.shape == scalar_shape, ErrorCode::kUNSUPPORTED_NODE);
size_t nweight = nchan;
std::vector<float> alpha;
for(size_t i=0; i<nweight; i++)
{
alpha.push_back( (static_cast<float const*>(weights.values))[i] );
}
RETURN_FIRST_OUTPUT(ctx->addPluginV2(new PReluPlugin(inputs.at(1).weights()),
{&convertToTensor(inputs.at(0), ctx)}));
}
这个解决方案需要自己编写preluplugin.cu,涉及公司项目就不放出源代码了.