工作原因需要使用 Armnn 进行神经网络推理,之前使用 tensorflow2 的 detection API 训练了自己SSD模型,转换成tflite以后使用 tensorflow lite 进行推理,效果还可以,但是部署到 Armnn 上却出现了问题。报错:
what(): MemCopyQueueDescriptor: input->10 & output->40 must have the same number of elements.
用tflite推理都没问题,说明模型没问题,那么就是Armnn支持的问题了,因为mobile-net-ssd也没什么特别的操作符,对比了之前运行良好的模型,发现可能出问题的是
TFLite_Detection_PostProcess 这个操作符,查看了Armnn支持的操作符,里边却赫然显示着支持这个操作符,在Armnn找到了相应的issue,把自己的问题也发到上边去了,可是Armnn项目组日理万机,看来只能自己去看看哪里有问题了。在找bug的过程中把Armnn运行的流程大概看了一下,记录下来。
1. Armnn 使用示例
clone下来Armnn源码,里边有 /samples/ObjectDetection 文件夹,这里是一个用目标检测的示例程序,可以使用SSD和YOLO。 还有一个 /tests/ExecuteNetwork 文件夹这里的程序也是一些检测程序,里边也涉及了如何使用Armnn进行推理的流程,主要涉及几个步骤。
1)使用parser_解析模型文件,生成network
//当前版本的Armnn只支持这两种格式的解析了
armnnTfLiteParser::ITfLiteParserPtr parser_
armnnOnnxParser::IOnnxParserPtr
parser_的作用就是读取.tflite文件,并解析为Armnn的网络
parser_ = armnnOnnxParser::IOnnxParser::Create();
network_ = parser_->CreateNetworkFromBinaryFile(model_path);
2)创建runtime_并加载优化后的network
这是Armnn中最核心的结构了,涉及三个文件
IRuntime.hpp
Runtime.cpp
Runtime.hpp
首先根据 options 创建一个 runtime_
armnn::IRuntime::CreationOptions runtimeOptions;
runtime_ = armnn::IRuntime::Create(runtimeOptions);
之后会对第一步中生成的network进行优化,优化过程需要用到runtime_的一些配置
armnn::IOptimizedNetworkPtr opt_net = armnn::Optimize(*network_, preferred_backends, runtime_->GetDeviceSpec(), optimizer_options);
当然优化之前需要设定一些优化选项
armnn::OptimizerOptions optimizer_options;
optimizer_options.m_ReduceFp32ToFp16 = false;
std::vector<armnn::BackendId> preferred_backends;
preferred_backends.push_back(armnn::Compute::GpuAcc);
armnn::BackendOptions gpuAcc("GpuAcc",
{
{ "FastMathEnabled", true },
{ "TuningLevel", 2},
});
optimizer_options.m_ModelOptions.push_back(gpuAcc);
preferred_backends.push_back(armnn::Compute::CpuAcc);
armnn::BackendOptions cpuAcc("CpuAcc",
{
{ "FastMathEnabled", true },
{ "NumberOfThreads", num_threads },
});
optimizer_options.m_ModelOptions.push_back(cpuAcc);
preferred_backends.push_back(armnn::Compute::CpuRef);
最后把网络加载到 runtime_
runtime_->LoadNetwork(networkIdentifier_, std::move(opt_net));
3)设置输入输出tensor的信息并加入到workload里
这里最终会调用 EnqueueWorkload ,这一步就是执行推断,但是之前需要给 input 和 output tensor 配置 tensorinfo,包括维度,内存区域等信。 (我遇到的问题就出在这里,具体来说就是
parser_的GetNetworkInputBindingInfo()函数)
runtime_->EnqueueWorkload(networkIdentifier_, list_armnntensor_in_, list_armnntensor_out_);
2. 问题排查暨源码阅读
找BUG都是倒着查找,这里就不倒着来了,直接按照正向的代码顺序把问题出在哪里理顺一下。
1)解析tflite文件
首先确保 TfLiteParser 解析文件没有解析错误, 涉及三个文件
ITfLiteParser.hpp
TfLiteParser.hpp
TfLiteParser.cpp
TfLiteParser 只是接口,TfLiteParserImpl是其实现,最终调用的是 TfLiteParserImpl 的函数。
INetworkPtr TfLiteParserImpl::CreateNetworkFromBinaryFile(const char* graphFile)
{
ResetParser();
m_Model = LoadModelFromFile(graphFile);// using ModelPtr = std::unique_ptr<tflite::ModelT>;
return CreateNetworkFromModel();
}
函数调用
// 加载2进制文件
TfLiteParserImpl::ModelPtr TfLiteParserImpl::LoadModelFromFile(const char * fileName)
{
if (fileName == nullptr)
{
throw InvalidArgumentException(fmt::format("Invalid (null) file name {}",
CHECK_LOCATION().AsString()));
}
std::error_code errorCode;
fs::path pathToFile(fileName);
if (!fs::exists(pathToFile, errorCode))
{
//fmt::format() could not be used here (format error)
std::stringstream msg;
msg << "Cannot find the file (" << fileName << ") errorCode: " << errorCode
<< " " << CHECK_LOCATION().AsString();
throw FileNotFoundException(msg.str());
}
std::ifstream file(fileName, std::ios::binary);
std::string fileContent((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
return LoadModelFromBinary(reinterpret_cast<const uint8_t *>(fileContent.c_str()),
fileContent.size());
}
// 从2进制解析出来
TfLiteParserImpl::ModelPtr TfLiteParserImpl::LoadModelFromBinary(const uint8_t * binaryContent, size_t len)
{
if (binaryContent == nullptr)
{
throw InvalidArgumentException(fmt::format("Invalid (null) binary content {}",
CHECK_LOCATION().AsString()));
}
flatbuffers::Verifier verifier(binaryContent, len);
if (verifier.VerifyBuffer<tflite::Model>() == false)
{
throw ParseException(
fmt::format("Buffer doesn't conform to the expected Tensorflow Lite "
"flatbuffers format. size:{} {}",
len,
CHECK_LOCATION().AsString()));
}
// 这里把二进制转成tflite的model,调用的是tflite库里的函数
return tflite::UnPackModel(binaryContent);
}
上边代码实现了从二进制文件得到tflite格式网络模型 m_Model,紧接着就是把 m_Model 转换成Armnn的网络 m_Network
INetworkPtr TfLiteParserImpl::CreateNetworkFromModel()
{
using NetworkOptions = std::vector<BackendOptions>;
NetworkOptions networkOptions = {};
if (m_Options && m_Options.value().m_InferAndValidate)
{
// shape 验证方式
BackendOptions shapeInferenceMethodOption("ShapeInferenceMethod",
{
{ "InferAndValidate", true }
});
networkOptions.push_back(shapeInferenceMethodOption);
}
// 创建一个空的network
m_Network = INetwork::Create(networkOptions);//这里只设定验证方式
ARMNN_ASSERT(m_Model.get() != nullptr);
if (m_Model->subgraphs.size() != 1)
{
throw ParseException(
fmt::format("Current TfLite parser only supports 1 subgraph. Current one has: {} {}",
m_Model->subgraphs.size(),
CHECK_LOCATION().AsString()));
}
size_t subgraphIndex = 0;
size_t operatorIndex = 0;
try
{
for (SubgraphPtr const& subgraph : m_Model->subgraphs)
{
m_SubgraphConnections.emplace_back(subgraph->tensors.size());// subgraph一共多少个tensor
for (OperatorPtr const& op : subgraph->operators)// 遍历每一个操作符
{
auto const& opCodePtr = m_Model->operator_codes[op->opcode_index];//得到操作符的code,这个code其实就是一个enmu
auto builtinCode = opCodePtr->builtin_code;
// builtin的就这么多,显然那个PostProcess不是builtin
// 其实解析定制操作就是一种builtin的operator...
if (builtinCode > tflite::BuiltinOperator_MAX)// builtin的就这么多,显然那个PostProcess不是builtin
{
throw ParseException(fmt::format("Operator code {} is out of range 0-{}. "
"subgraph:{} operator idx:{}. {}",
builtinCode, tflite::BuiltinOperator_MAX, subgraphIndex,
operatorIndex, CHECK_LOCATION().AsString()));
}
// lookup and call the parser function
// 这里会查看注册好的所有builtin operater,调用相应的function解析,一点点构建armnn的network
// 显然这里不包括postprocess
auto& parserFunction = m_ParserFunctions[builtinCode];
(this->*parserFunction)(subgraphIndex, operatorIndex);
++operatorIndex;
}
SetupInputLayers(subgraphIndex);
SetupOutputLayers(subgraphIndex);
SetupConstantLayers(subgraphIndex);
++subgraphIndex;
operatorIndex = 0;
}
}
catch (const ParseException& e)
{
std::stringstream errorString;
errorString << "Failed to parse operator #" << operatorIndex << " within subgraph #"
<< subgraphIndex << " error: " << e.what();
ARMNN_LOG(error) << errorString.str();
std::stringstream errors;
errors << errorString.str() << "\n";
throw ParseException(errors.str());
}
// establish the connections from the layer outputs to the inputs of the subsequent layers
// 把layer的输出和下一个layer的输入连接起来
for (subgraphIndex = 0; subgraphIndex < m_SubgraphConnections.size(); ++subgraphIndex)
{
for (size_t tensorIndex = 0; tensorIndex < m_SubgraphConnections[subgraphIndex].size(); ++tensorIndex)
{
if (m_SubgraphConnections[subgraphIndex][tensorIndex].outputSlot != nullptr)
{
for (size_t inputSlotIdx = 0;
inputSlotIdx < m_SubgraphConnections[subgraphIndex][tensorIndex].inputSlots.size();
++inputSlotIdx)
{
m_SubgraphConnections[subgraphIndex][tensorIndex].outputSlot->Connect(
*(m_SubgraphConnections[subgraphIndex][tensorIndex].inputSlots[inputSlotIdx]));
}
}
}
}
return std::move(m_Network);
}
这段代码中最重要的部分就是 try 里的内容,遍历 subgraph 里的所有 operator ,并调用相应的解析函数进行解析,制作成 network 的 layer , ParserFunctions 有很多,根据 builtinCode 进行查找,其中32号是定制操作符的解析函数
m_ParserFunctions[tflite::BuiltinOperator_CUSTOM] = &TfLiteParserImpl::ParseCustomOperator;
void TfLiteParserImpl::ParseCustomOperator(size_t subgraphIndex, size_t operatorIndex)
{
CHECK_MODEL(m_Model, subgraphIndex, operatorIndex);
// NOTE: By default we presume the custom operator is not supported
auto customParserFunction = &TfLiteParserImpl::ParseUnsupportedOperator;
// Identify custom code defined for custom operator
const auto& operatorPtr = m_Model->subgraphs[subgraphIndex]->operators[operatorIndex];
const auto& customCode = m_Model->operator_codes[operatorPtr->opcode_index]->custom_code;
// 如果有这个operater有custom code,那么这里是可以找到的一个string: TFLite_Detection_PostProcess
// Find parser function that correspondes to custom code (if any)
auto iterator = m_CustomParserFunctions.find(customCode);
if (iterator != m_CustomParserFunctions.end())
{
customParserFunction = iterator->second;
}
// Run parser function
(this->*customParserFunction)(subgraphIndex, operatorIndex);// 0 95
}
这个函数会去按照 customCode (TFLite_Detection_PostProcess) 查询有没有相应的解析函数并调用,我们这里是调用的
m_CustomParserFunctions["TFLite_Detection_PostProcess"] = &TfLiteParserImpl::ParseDetectionPostProcess;
可以看到目前只有这一个 customParserFuncton 这样看来如果某些网络有自己特定的操作符,我们也可以在这里进行添加。
接下来就是我这次遇到问题的地方,Armnn 对这个特定操作符的解析存在一定问题
void TfLiteParserImpl::ParseDetectionPostProcess(size_t subgraphIndex, size_t operatorIndex)// 0 95
{
CHECK_MODEL(m_Model, subgraphIndex, operatorIndex);
const auto & operatorPtr = m_Model->subgraphs[subgraphIndex]->operators[operatorIndex];
// 这两个返回的都是 TensorRawPtrVector
auto inputs = GetInputs(m_Model, subgraphIndex, operatorIndex);
auto outputs = GetOutputs(m_Model, subgraphIndex, operatorIndex);
// output是四个
CHECK_VALID_SIZE(outputs.size(), 4);
// Obtain custom options from flexbuffers
auto custom_options = operatorPtr->custom_options;
const flexbuffers::Map& m = flexbuffers::GetRoot(custom_options.data(), custom_options.size()).AsMap();
// Obtain descriptor information from tf lite
DetectionPostProcessDescriptor desc;
desc.m_MaxDetections = m["max_detections"].AsUInt32();
desc.m_MaxClassesPerDetection = m["max_classes_per_detection"].AsUInt32();
desc.m_NmsScoreThreshold = m["nms_score_threshold"].AsFloat();
desc.m_NmsIouThreshold = m["nms_iou_threshold"].AsFloat();
desc.m_NumClasses = m["num_classes"].AsUInt32();
desc.m_ScaleH = m["h_scale"].AsFloat();
desc.m_ScaleW = m["w_scale"].AsFloat();
desc.m_ScaleX = m["x_scale"].AsFloat();
desc.m_ScaleY = m["y_scale"].AsFloat();
if (!(m["use_regular_nms"].IsNull()))
{
desc.m_UseRegularNms = m["use_regular_nms"].AsBool();
}
if (!(m["detections_per_class"].IsNull()))
{
desc.m_DetectionsPerClass = m["detections_per_class"].AsUInt32();
}
if (desc.m_NmsIouThreshold <= 0.0f || desc.m_NmsIouThreshold > 1.0f)
{
throw InvalidArgumentException("DetectionPostProcessTFLiteParser: Intersection over union threshold "
"must be positive and less than or equal to 1.");
}
// 我们知道这里的输入是两个1917x4 1917xclass
armnn::TensorInfo anchorTensorInfo = ToTensorInfo(inputs[2]);
auto anchorTensorAndData = CreateConstTensorNonPermuted(inputs[2], anchorTensorInfo);
auto layerName = fmt::format("DetectionPostProcess:{}:{}", subgraphIndex, operatorIndex);
IConnectableLayer* layer = m_Network->AddDetectionPostProcessLayer(desc, anchorTensorAndData,
layerName.c_str());
ARMNN_ASSERT(layer != nullptr);
// The model does not specify the output shapes.
// The output shapes are calculated from the max_detection and max_classes_per_detection.
unsigned int numDetectedBox = desc.m_MaxDetections * desc.m_MaxClassesPerDetection;
m_OverridenOutputShapes.push_back({ 1, numDetectedBox, 4 }); //40
m_OverridenOutputShapes.push_back({ 1, numDetectedBox }); //10
m_OverridenOutputShapes.push_back({ 1, numDetectedBox }); //10
m_OverridenOutputShapes.push_back({ 1 }); //1
// 这里顺序要改
// 1
m_OverridenOutputShapes_2.push_back({ 1, numDetectedBox }); //10
// 2
m_OverridenOutputShapes_2.push_back({ 1, numDetectedBox, 4 }); //40
// 3
m_OverridenOutputShapes_2.push_back({ 1 }); //1
// 4
m_OverridenOutputShapes_2.push_back({ 1, numDetectedBox }); //10
for (unsigned int i = 0 ; i < outputs.size() ; ++i)
{
armnn::TensorInfo detectionBoxOutputTensorInfo = ToTensorInfo(outputs[i], m_OverridenOutputShapes[i]);
layer->GetOutputSlot(i).SetTensorInfo(detectionBoxOutputTensorInfo);
}
// Register the input connection slots for the layer, connections are made after all layers have been created
// only the tensors for the inputs are relevant, exclude the const tensors
auto inputTensorIndexes = AsUnsignedVector(GetInputTensorIds(m_Model, subgraphIndex, operatorIndex));
RegisterInputSlots(subgraphIndex, operatorIndex, layer, {inputTensorIndexes[0], inputTensorIndexes[1]});
// Register the output connection slots for the layer, connections are made after all layers have been created
auto outputTensorIndexes = AsUnsignedVector(GetOutputTensorIds(m_Model, subgraphIndex, operatorIndex));
// 这里是4567的顺序,是正确的
// 那么下边注册应该也是正确的
RegisterOutputSlots(subgraphIndex, operatorIndex, layer, {outputTensorIndexes[0],
outputTensorIndexes[1],
outputTensorIndexes[2],
outputTensorIndexes[3]});
}
具体来说就是OutputShape设定的问题
m_OverridenOutputShapes.push_back({ 1, numDetectedBox, 4 }); //40
m_OverridenOutputShapes.push_back({ 1, numDetectedBox }); //10
m_OverridenOutputShapes.push_back({ 1, numDetectedBox }); //10
m_OverridenOutputShapes.push_back({ 1 }); //1
这里这个push_back的顺序和后边设置layer输出TensorShape的顺序是相对应的
layer->GetOutputSlot(i).SetTensorInfo(detectionBoxOutputTensorInfo);
而这个顺序是按照 outputs 顺序来的
auto outputs = GetOutputs(m_Model, subgraphIndex, operatorIndex);
TfLiteParserImpl::TensorRawPtrVector TfLiteParserImpl::GetOutputs(const ModelPtr & model,
size_t subgraphIndex,
size_t operatorIndex)
{
CHECK_MODEL(model, subgraphIndex, operatorIndex);
const auto & subgraphPtr = model->subgraphs[subgraphIndex];
const auto & operatorPtr = subgraphPtr->operators[operatorIndex];
size_t outputCount = operatorPtr->outputs.size();
TensorRawPtrVector result(outputCount);
for (size_t i=0; i<outputCount; ++i)
{
uint32_t outputId = CHECKED_NON_NEGATIVE(operatorPtr->outputs[i]);
CHECK_TENSOR(model, subgraphIndex, outputId);
result[i] = subgraphPtr->tensors[outputId].get();
}
return result;
}
而这个outputs的顺序是通过operator的output顺序得到的,这些顺序目前是没有问题的,模型可以正常解析并转换,但是后边在进行推断的时候会去检查这个输出TensorShape,那里确是按照subgraph的outputs顺序检查的,tensorflow1的模型里这两个顺序是一样的,但是我用tensorflow2.7训练的模型这两个顺序不一样,因为没有研究tensorflowlite的源码,我也不清楚为啥不一样,就是因为顺序不一样,导致推断时会报出开篇提到的那个错误。我们先把这里马克一下,回头需要进行修改。
2)模型推断流程
Status LoadedNetwork::EnqueueWorkload(const InputTensors& inputTensors,
const OutputTensors& outputTensors)
{
const Graph& graph = m_OptimizedNetwork->pOptimizedNetworkImpl->GetGraph();
// Walk graph to determine the order of execution.
if (graph.GetNumLayers() < 2)
{
ARMNN_LOG(warning) << "IRuntime::EnqueueWorkload()::Less than two nodes in graph";
return Status::Failure;
}
// Data that must be kept alive for the entire execution of the workload.
WorkloadData workloadData(inputTensors, outputTensors);
if (graph.GetNumInputs() != inputTensors.size())
{
throw InvalidArgumentException("Number of inputs provided does not match network.");
}
// For each input to the network, call EnqueueInput with the data passed by the user.
{
ARMNN_SCOPED_PROFILING_EVENT(Compute::Undefined, "PrepareInputs");
m_InputQueue.clear();
m_InputQueue.reserve(graph.GetNumInputs());
for (const BindableLayer* inputLayer : graph.GetInputLayers())
{
const TensorPin& pin = workloadData.GetInputTensorPin(inputLayer->GetBindingId());
EnqueueInput(*inputLayer, pin.GetTensorHandle(), pin.GetTensorInfo());
}
}
// For each output to the network, call EnqueueOutput with the data passed by the user.
{
ARMNN_SCOPED_PROFILING_EVENT(Compute::Undefined, "PrepareOutputs");
m_OutputQueue.clear();
m_OutputQueue.reserve(graph.GetNumOutputs());
for (const BindableLayer* outputLayer : graph.GetOutputLayers()){
}
for (const BindableLayer* outputLayer : graph.GetOutputLayers())
{
const TensorPin& pin = workloadData.GetOutputTensorPin(outputLayer->GetBindingId());
EnqueueOutput(*outputLayer, pin.GetTensorHandle(), pin.GetTensorInfo());
}
}
std::unique_ptr<TimelineUtilityMethods> timelineUtils =
TimelineUtilityMethods::GetTimelineUtils(m_ProfilingService);
ProfilingGuid inferenceGuid = m_ProfilingService.GetNextGuid();
if (timelineUtils)
{
// Add inference timeline trace if profiling is enabled.
ProfilingGuid networkGuid = m_OptimizedNetwork->GetGuid();
timelineUtils->CreateTypedEntity(inferenceGuid, LabelsAndEventClasses::INFERENCE_GUID);
timelineUtils->CreateRelationship(ProfilingRelationshipType::RetentionLink,
networkGuid,
inferenceGuid,
LabelsAndEventClasses::EXECUTION_OF_GUID);
timelineUtils->RecordEvent(inferenceGuid, LabelsAndEventClasses::ARMNN_PROFILING_SOL_EVENT_CLASS);
}
bool executionSucceeded = true;
{
if (m_ProfilingService.IsProfilingEnabled())
{
m_ProfilingService.IncrementCounterValue(armnn::profiling::INFERENCES_RUN);
}
ARMNN_SCOPED_PROFILING_EVENT(Compute::Undefined, "Execute");
ARMNN_SCOPED_HEAP_PROFILING("Executing");
executionSucceeded = Execute(timelineUtils, inferenceGuid);
}
if (timelineUtils)
{
// Add end of life of the inference timeline if profiling is enabled.
timelineUtils->RecordEvent(inferenceGuid, LabelsAndEventClasses::ARMNN_PROFILING_EOL_EVENT_CLASS);
timelineUtils->Commit();
}
return executionSucceeded ? Status::Success : Status::Failure;
}
从示例程序中我们看到了,EnqueueWorkload 这个函数就是执行推断,其中这个函数
EnqueueOutput(*outputLayer, pin.GetTensorHandle(), pin.GetTensorInfo());
会创建一个
std::unique_ptr<IWorkload> outputWorkload =
std::make_unique<CopyMemGenericWorkload>(outputQueueDescriptor, info);
CopyMemGenericWorkload 创建的时候就会去检查 validate ,这就是报错的地方。
但是其实从 workload 创建的时候已经错了,原因是
Status LoadedNetwork::EnqueueWorkload(const InputTensors& inputTensors,
const OutputTensors& outputTensors)
这里的 OutputTensors 就是错的,通过示例程序可知,这个输入参数是通过
GetNetworkOutputBindingInfo
得到的,我们看一下这个函数是怎么实现的。
BindingPointInfo TfLiteParserImpl::GetNetworkOutputBindingInfo(size_t subgraphId,
const std::string& name) const
{
CHECK_SUBGRAPH(m_Model, subgraphId);
// struct TensorT : public flatbuffers::NativeTable {
// name id elements
// call_0 --> 247 --> 1
// call_1 --> 246 --> 10
// call_2 --> 245 --> 10
// call_3 --> 244 --> 40
auto outputs = GetSubgraphOutputs(m_Model, subgraphId);
// outputId : 246
// outputId : 244
// outputId : 247
// outputId : 245
// 在4个output里找出来name一致的那一个,是第i个,那么去读第i个的shape
for (unsigned int i = 0; i < outputs.size(); ++i)
{
auto const output = outputs[i];// outputs vector<std::pair<size_t, TensorRawPtr>>,
if (output.second->name == name)// Tensor的名字
{
auto bindingId = GenerateLayerBindingId(subgraphId, output.first);
std::vector<unsigned int> shape = m_OverridenOutputShapes.size() > 0 ?
m_OverridenOutputShapes[i] : AsUnsignedVector(output.second->shape);
return std::make_pair(bindingId, ToTensorInfo(output.second, shape));
}
}
std::stringstream bindings;
for (auto const & output : outputs)
{
bindings << "'" << output.second->name << "' ";
}
throw ParseException(
fmt::format("No output binding found for subgraph:{} and name:{}. "
"Possible outputs are: [{}] {}",
subgraphId,
name,
bindings.str(),
CHECK_LOCATION().AsString()));
}
可以看到这里的 outputs 是从 subgraph 得到的
TfLiteParserImpl::TensorIdRawPtrVector TfLiteParserImpl::GetSubgraphOutputs(const ModelPtr & model,
size_t subgraphIndex)
{
CHECK_SUBGRAPH(model, subgraphIndex);
const auto & subgraphPtr = model->subgraphs[subgraphIndex];
size_t outputCount = subgraphPtr->outputs.size();// 4
TensorIdRawPtrVector result(outputCount);
// using TensorIdRawPtr = std::pair<size_t, TensorRawPtr>;
// using TensorIdRawPtrVector = std::vector<TensorIdRawPtr>;
for (size_t i=0; i<outputCount; ++i)
{
uint32_t outputId = CHECKED_NON_NEGATIVE(subgraphPtr->outputs[i]);
// 这里的outputId应该还是tensorflowlite里的,不是6xxxx
// outputId : 246
// outputId : 244
// outputId : 247
// outputId : 245
result[i] = std::make_pair(outputId, subgraphPtr->tensors[outputId].get());
}
return result;
}
就是因为他这里用了 m_OverridenOutputShapes 而这个东西是写死的,
// name id elements
// call_0 --> 247 --> 1
// call_1 --> 246 --> 10
// call_2 --> 245 --> 10
// call_3 --> 244 --> 40
auto outputs = GetSubgraphOutputs(m_Model, subgraphId);
// outputId : 246
// outputId : 244
// outputId : 247
// outputId : 245
m_OverridenOutputShapes.push_back({ 1, numDetectedBox, 4 }); //40
m_OverridenOutputShapes.push_back({ 1, numDetectedBox }); //10
m_OverridenOutputShapes.push_back({ 1, numDetectedBox }); //10
m_OverridenOutputShapes.push_back({ 1 }); //1
例如我现在在检查 call_0
会在第3个的时候发现 name 匹配,然后就去读取 m_OverridenOutputShape[3]
而第3个却是在解析时的 call_2
又不能简单地改 m_OverridenOutputShape 的顺序,因为这个顺序改了,subgraph 得到的 outputs顺序也会变,永远对不齐。更深层次的为什么 operator 和 subgraph 的 outputs 顺序不一样得去 tensorflow 源码里看了。
但是 Armnn 这种写死的做法肯定也不太好,我们给他改一下。
PS:
TFLite_Detection_PostProcess 这个操作符不能使用GPU加速,只能使用CpuRef,这样会导致包含这个操作符的网络推理时间都会边长。