第三方框架可以是用户自己实现,也可以是业内其他开源的框架,一般都具备在线构图的能力,即可以将多个算子构建成一张子图发放给设备执行。
新增自定义Delegate类
自定义Delegate要继承自deleate类。可以在构造函数中完成对第三方框架调度硬件设备有关config的初始化,如NPU指定频率、CPU指定线程数等。
代码如下:
class XXXDelegate : public Delegate {
public:
XXXDelegate() = default;
~XXXDelegate() = default;
Status Init() = 0;
Status Build(DelegateModel *model) = 0;
}
在完成上面后,我们要初始化接口,
代码如下:
Status XXXDelegate::Init() {
// 1. Check whether the inference device matches the delegate framework.
// 2. Initialize delegate related resources.
}
实现构图接口
Build会在Model的Build接口被调用。具体的位置在MindSpore Lite内部代码Schedule::Schedule函数中,此时已完成内置算子选择,算子存放在DelegateModel的Kernel列表中。Build需要实现以下功能:
遍历Kernel列表,调用GetPrimitive获取每个算子对应的属性值,解析该算子的属性值,判断Delegate框架是否支持。
对连续可支持的一段算子列表,构建一张Delegate子图,调用Replace用子图Kernel去替换这段连续的算子。
代码如下:
Status XXXDelegate::Build(DelegateModel *model) {
KernelIter from = model->BeginKernelIterator(); // Record the start operator position supported by the Delegate
KernelIter end = model->BeginKernelIterator(); // Record the end operator position supported by the Delegate
for (KernelIter iter = model->BeginKernelIterator(); iter != model->EndKernelIterator(); iter++) {
kernel::Kernel *kernel = *iter;
if (IsSupport(kernel, model->GetPrimitive(kernel))) { // Check whether the Delegate framework supports the kernel according to the primitive
end = iter;
} else { // The current kernel is not supported, and the sub-graph is truncated
if (from != end) {
auto xxx_graph_kernel = CreateXXXGraph(from, end, model); // Create a Delegate sub-graph Kernel
iter = model->Replace(from, end + 1, xxx_graph_kernel); // Replace the supported kernels list with a Delegate sub-graph Kernel
}
from = iter + 1;
end = iter + 1;
}
}
return RET_OK;
}
实现子图Kernel
上述CreateXXXGraph接口要返回一张Delegate的子图,
代码如下所示:
kernel::Kernel *XXXDelegate::CreateXXXGraph(KernelIter from, KernelIter end, DelegateModel *model) {
auto in_tensors = GraphInTensors(…); // Find the input tensors of the Delegate sub-graph
auto out_tensors = GraphOutTensors(…); // Find the output tensors of the Delegate sub-graph
auto graph_kernel = new (std::nothrow) XXXGraph(in_tensors, out_tensors);
if (graph_kernel == nullptr) {
MS_LOG(ERROR) << “New XXX Graph failed.”;
return nullptr;
}
// Build graph online, load model, etc.
return graph_kernel;
}
子图要定义继承:
要注意的是:根据原始的Kernel列表找到正确的in_tensors和out_tensors,以便Execute时,能找到正确的输入tensor和输入数据,并将输出数据写回到正确的地址中。
代码如下:
class XXXGraph : public kernel::Kernel {
public:
XXXGraph(const std::vector<tensor::MSTensor *> &inputs, const std::vector<tensor::MSTensor *> &outputs)
: kernel::Kernel(inputs, outputs, nullptr, nullptr) {}
~XXXGraph() override;
int Prepare() override {
// Generally, the model will be built only once, so Prepare is also called once.
// Do something without input data, such as pack the constant weight tensor, etc.
}
int Execute() override {
// Obtain input data from in_tensors.
// Execute the inference process.
// Write the result back to out_tensors.
}
int ReSize() override {
// Support dynamic shape, and input shape will changed.
}
};
Lite框架调度
Lite框架要调度用户自定义的Delegate,在创建Context时,需要通过setdelegate设置自定义Delegate指针,见以下示例代码。再通过build传递给Lite框架。如果Context中的Delegate为空指针,推理流程会调用到Lite框架内置的推理。
代码如下:
auto context = std::make_sharedmindspore::Context();
if (context == nullptr) {
MS_LOG(ERROR) << “New context failed”;
return RET_ERROR;
}
auto delegate = std::make_shared();
if (delegate == nullptr) {
MS_LOG(ERROR) << “New XXX delegate failed”;
return RET_ERROR;
}
context->SetDelegate(delegate);
auto model = new (std::nothrow) mindspore::Model();
if (model == nullptr) {
std::cerr << “New Model failed.” << std::endl;
}
// Assuming that we have read a ms file and stored in the address pointed by model_buf
auto build_ret = model->Build(model_buf, size, mindspore::kMindIR, context);
delete;
if (build_ret != mindspore::kSuccess) {
std::cerr << “Build model failed.” << std::endl;
}
实现Build接口
Build接口解析DelegateModel实例,主要实现算子支持判断、子图构建、在线构图等功能。
代码如下:
Status NPUDelegate::Build(DelegateModel *model) {
KernelIter from, end; // Record the start and end positions of kernel supported by the NPU sub-graph.
std::vector<NPUOp *> npu_ops; // Save all NPUOp used to construct an NPU sub-graph.
int graph_index = 0;
for (KernelIter iter = model->BeginKernelIterator(); iter != model->EndKernelIterator(); iter++) {
kernel::Kernel *kernel = *iter;
auto npu_op = GetOP(kernel, model->GetPrimitive(kernel)); // Obtain an NPUOp according to the kernel and the primitive. Each NPUOp contains information such as input tensors, output tensors and operator attribute.
if (npu_op != nullptr) { // NPU supports the current kernel.
if (npu_ops.size() == 0) {
from = iter;
}
npu_ops.push_back(npu_op);
end = iter;
} else { // NPU does not support the current kernel.
if (npu_ops.size() > 0) {
auto npu_graph_kernel = CreateNPUGraph(npu_ops); // Create a NPU sub-graph kernel.
if (npu_graph_kernel == nullptr) {
MS_LOG(ERROR) << “Create NPU Graph failed.”;
return RET_ERROR;
}
npu_graph_kernel->set_name(“NpuGraph” + std::to_string(graph_index++));
iter = model->Replace(from, end + 1, npu_graph_kernel); // Replace the supported kernel list with a NPU sub-graph kernel.
npu_ops.clear();
}
}
}
auto ret = npu_manager_->LoadOMModel(); // Build model online. Load NPU model.
if (ret != RET_OK) {
MS_LOG(ERROR) << “NPU client load model failed.”;
return RET_ERROR;
}
return RET_OK;
}
实现构图代码
用于生成一张NPU子图。
代码如下:
kernel::Kernel *NPUDelegate::CreateNPUGraph(const std::vector<NPUOp *> &ops) {
auto in_tensors = GraphInTensors(ops);
auto out_tensors = GraphOutTensors(ops);
auto graph_kernel = new (std::nothrow) NPUGraph(ops, npu_manager_, in_tensors, out_tensors);
if (graph_kernel == nullptr) {
MS_LOG(DEBUG) << “New NPU Graph failed.”;
return nullptr;
}
ret = graph_kernel->Init();
if (ret != RET_OK) {
MS_LOG(DEBUG) << “NPU Graph Init failed.”;
return nullptr;
}
return graph_kernel;
}