dl4j keras 推理_第2部分创建一个简单的keras模型以在微控制器上进行推理

dl4j keras 推理

Welcome to the second article about running machine learning algorithms on microcontrollers.In the previous article, we have created and trained a simple Keras model that was able to classify 3 different classes from CIFAR-10 dataset.We looked at how to prepare that model for running on a microcontroller, quantized it and saved it to disk as a C file.

欢迎阅读第二篇关于在微控制器运行机器学习算法的文章 。在上一篇文章中 ,我们创建并训练了一个简单的Keras模型,该模型能够对CIFAR-10数据集中的3个不同类别进行分类。在微控制器上运行,对其进行量化,然后将其作为C文件保存到磁盘。

It is now time to look at how to setup the development environment and how to run inference on an actual microcontroller.

现在该看一下如何设置开发环境以及如何在实际的微控制器上进行推理了。

For this we will need two things to follow along. First, we will need to clone MicroMl project, which will provide us with the example code.

为此,我们需要遵循两件事。 首先,我们需要克隆MicroMl项目,这将为我们提供示例代码。

Second, we will need some kind of ARM development board. Example code uses ST’s Nucleo-F767ZI board. For demonstration, we will stick with the preferred choice, as we can run the example code as it is, no modifications needed.

其次,我们将需要某种ARM开发板。 示例代码使用ST的Nucleo-F767ZI板。 为了演示,我们将坚持首选,因为我们可以按原样运行示例代码,而无需进行任何修改。

If we do not have that specific board at hand, no need to worry! MicroML was created with customization in mind, so we will show you how to modify code so you can make it run on the development board of your choice.

如果我们手头没有该特定板,则无需担心! MicroML的创建考虑到了自定义,因此我们将向您展示如何修改代码,以便使其能够在您选择的开发板上运行。

One thing before we start, it is recommended to read the prerequisites section of README.md of MicroML, so you can see what you will need to get this project running on Windows or Linux.

在开始之前的一件事,建议阅读MicroML的README.md的先决条件部分 ,这样您就可以知道要使该项目在Windows或Linux上运行,您将需要什么。

1.设定 (1. Setup)

We will start by cloning MicroMl repo recursively, and changing to keras_article branch.

我们将从递归克隆MicroMl存储库开始,然后更改为keras_article分支。

git clone --recurse-submodules https://github.com/SkobecSlo/MicroML.git
git checkout keras_article

NOTE: MicroML pulls in whole TensorFlow library, which is around 2.5 GB big. For this step to take some time is perfectly normal.

注意:MicroML提取整个TensorFlow库,该库大约有2.5 GB。 为此花费一些时间是完全正常的。

TensorFlow设置 (TensorFlow setup)

Before we can even compile a simple example for our target, we need to run a hello_world example that will be executed on our host machine. This is needed because makefiles written by TensorFlow team pull in several necessary repositories from third parties, which are missing by default. After compilation, these libraries can be found under tensorflow/lite/micro/tools/make/downloads. To get them, we first move inside tensorflow folder and run make:

在我们甚至可以为目标编译一个简单的示例之前,我们需要运行一个hello_world示例,该示例将在我们的主机上执行。 这是必需的,因为TensorFlow团队编写的makefile从第三方获取了几个必要的存储库,默认情况下会丢失这些存储库。 编译后,可以在tensorflow/lite/micro/tools/make/downloads下找到这些库。 为了获得它们,我们首先进入tensorflow文件夹并运行make:

cd MicroMl/tensorflow
sudo make -f tensorflow/lite/micro/tools/make/Makefile hello_world

This will call several scripts that download the libraries. Then it will compile the source files for hello_world example. It might take a while, but we have to do this step only once.

这将调用几个下载库的脚本。 然后它将为hello_world示例编译源文件。 这可能需要一段时间,但是我们只需执行一次即可。

In short, what the TensorFlow hello_world example does is that it is feeding values into a model that approximates the sine wave function. Both the input and output values are continuously printed. You can see the program in action by running

简而言之,TensorFlow hello_world示例所做的是将值输入到近似正弦波函数的模型中。 输入值和输出值都被连续打印。 您可以通过运行以下程序查看正在运行的程序

./tensorflow/lite/micro/tools/make/gen/linux_x86_64/bin/hello_world

libopencm3设置 (libopencm3 setup)

Libopencm3 will provide us with libraries for our targets, it will also generate necessary linker and startup files. Visit their GitHub page and make sure that your target is supported.

Libopencm3将为我们提供用于目标的库,还将生成必要的链接器和启动文件。 访问他们的GitHub页面 ,并确保您的目标受支持。

To generate all necessary files just run the following command inside MicroML’s root directory. Again, this is needed just once.

要生成所有必需的文件,只需在MicroML的根目录中运行以下命令。 同样,这仅需要一次。

make -C libopencm3

2.建设CIFAR-10项目 (2. Building CIFAR-10 project)

Before compiling CIFAR-10 project we need to build microlite.a and testlite.a files.

在编译CIFAR-10项目之前,我们需要构建microlite.atestlite.a文件。

This is done with following two commands, execute them from main directory:

这是通过以下两个命令完成的,从主目录执行它们:

make -C tensorflow/ -f ../archive_makefile PROJECT=cifar_stm32f7
make -C tensorflow/ -f ../archive_makefile PROJECT=cifar_stm32f7 test

These two commands compile all TensorFlow specific source files with specific microcontroller flags (by default for Nucleo-F767ZI) and create microlite.a and testlite.a files.The former is used in linking step when we are building the code for our specific target, whereby the latter is used in linking step when we are building test code that will run on our development machine. Running tests on the development machine while we are building our application is preferable so we can avoid unnecessary flashing cycles to the device.

这两个命令使用特定的微控制器标志(默认为Nucleo-F767ZI)编译所有TensorFlow特定的源文件,并创建microlite.atestlite.a文件。前者在为特定目标构建代码时用于链接步骤,因此,当我们构建将在我们的开发计算机上运行的测试代码时,将后者用于链接步骤。 在构建应用程序时,最好在开发机器上运行测试,这样可以避免不必要的设备刷新周期。

To run CIFAR-10 code on our development machine we move into projects/cifar_stm32f7 folder and run make test. The code will be compiled and executed. You will see output similar to this one:

为了在我们的开发机器上运行CIFAR-10代码,我们进入projects/cifar_stm32f7文件夹并运行make test 。 该代码将被编译并执行。 您将看到类似于以下内容的输出:

$ make test 
SIZE test_build/test_firmware
text data bss dec hex filename
254341 2864 51296 308501 4b515 test_build/test_firmware
Testing TestInvoke
Input:
Dimension: 4
First Dimension: 1
Rows: 32
Columns: 32
Channels: 1
Input type: 9Output:
Dimension size: 2
First Dimension: 1
Rows: 3
Output type: 1Picture 0
[[0.000000 0.929688 0.070312]]
Inference time: 12.585000 ms
Picture 1
[[0.000000 0.000000 0.996094]]
Inference time: 7.100000 ms
Picture 2
[[0.000000 0.000000 0.996094]]
Inference time: 7.040000 ms
Picture 3
[[0.000000 0.000000 0.996094]]
Inference time: 6.987000 ms
Picture 4
[[0.996094 0.003906 0.000000]]
Inference time: 6.904000 ms
Picture 5
[[0.812500 0.187500 0.000000]]
Inference time: 7.114000 ms
1/1 tests passed
~~~ALL TESTS PASSED~~~

If you check output values, they should match the ones from the previous article (see section 14. Testing model with python interpreter). This means that our neural network is working as expected.

如果检查输出值,则它们应与上一篇文章中的值匹配(请参阅第14节。使用python解释器测试模型)。 这意味着我们的神经网络正在按预期工作。

To run the same code on NUCLEO-767Zi, we can run make flash and open serial terminal to see the output. The output should be similar to the one above, but inference time will be longer.

要在NUCLEO-767Zi上运行相同的代码,我们可以运行make flash并打开串行终端以查看输出。 输出应该类似于上面的输出,但是推理时间会更长。

3.准备图像文件 (3. Preparing image files)

Before jumping into code explanation, it is important to say a word about how picture files were changed. When we converted our Keras model to tflite model, we specified that we want our inputs to be type of singed integer of 8-bit width. By default they would be floating-point which is unnecessary for our purpose. When we fed our picture data to xxd tool, pictures were in int8 format, as we cast them like that. Xxd tool by default puts data into an unsigned char array, like so:

在进入代码解释之前,重要的一点是要说一下如何更改图片文件。 当我们将Keras模型转换为tflite模型时,我们指定我们希望输入为8位宽度的整数形式。 默认情况下,它们将是浮点数,这对于我们的目的是不必要的。 当我们将图片数据输入xxd工具时,图片采用int8格式,因为我们像这样进行投射。 默认情况下,Xxd工具将数据放入无符号char数组中,如下所示:

unsigned char picture0[] = {

It is necessary to change that to:

有必要将其更改为:

const signed char picture0[] = {

That way we won’t have problems later when we will be feeding picture data to the TensorFlow interpreter.

这样,当我们将图片数据提供给TensorFlow解释器时,我们以后就不会有问题。

To make our picture arrays available to the main program we also need to create a header file where we declare our picture files. This header file should be included in each picture.c file.Likewise, CIFAR model c file should also have its own header file.

为了使图片数组可用于主程序,我们还需要创建一个头文件,在其中声明图片文件。 该头文件应包含在每个picture.c文件中picture.c同样,CIFAR模型c文件也应具有自己的头文件。

4.代码说明 (4. Code explanation)

Now we will explain step by step how the code works. This will be a walk-through through the code that is written in cifar_test.cc, which is the source file that we used for our running program on our development machine. We will not be covering the version of Nucleo-767ZI code that is inside main.cpp as it is the same, except for the hardware-specific sections.

现在,我们将逐步解释代码的工作原理。 这将是通过cifar_test.cc编写的代码的cifar_test.cccifar_test.cc是我们在开发计算机上运行程序所使用的源文件。 除了特定于硬件的部分外,我们将不讨论main.cpp内的Nucleo-767ZI代码的版本。

We first include several TensorFlow specific header files. These will give us access to a small unit test environment, micro interpreter and operator resolver. We also include our cifar model and pictures. In model_settings.h we specify the size of our picture input.

我们首先包括几个TensorFlow特定的头文件。 这些将使我们能够访问小型单元测试环境,微型解释器和操作员解析器。 我们还包括我们的cifar模型和图片。 在model_settings.h我们指定图片输入的大小。

#include "tensorflow/lite/c/common.h"
#include "tensorflow/lite/micro/kernels/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/testing/micro_test.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/kernels/micro_ops.h"#include <stdio.h>
#include <time.h>
#include "cifar_model.h"
#include "pictures/pictures.h"
#include "model_settings.h"

We need to define the size of our tensor arena, which takes some trial and error. The simplest method is to start with a big value and gradually proceed with assigning a smaller one until the program fails to compile. For our specific case, 50Kb of memory were enough.

我们需要定义张量竞技场的大小,这需要反复试验。 最简单的方法是从一个大值开始,然后逐渐分配一个较小的值,直到程序无法编译为止。 对于我们的特定情况,50Kb的内存就足够了。

constexpr int tensor_arena_size = 50 * 1024;
uint8_t tensor_arena[tensor_arena_size];

Below we can see how picture data is loaded as an input.

下面我们可以看到如何将图片数据作为输入加载。

void load_data(const signed char * data, TfLiteTensor * input)
{
for (int i = 0; i < input->bytes; ++i)
{
input->data.int8[i] = data[i];
}
}

Because input->data is a union structure which contains all data types, we can reuse the same line to load different type of data, like floating-point for example:

因为input->data是包含所有数据类型的联合结构,所以我们可以重用同一行来加载不同类型的数据,例如浮点数:

input->data.f[i] = data[i];

We wrap the program with TF_LITE_MICRO_TESTS_BEGIN and TF_LITE_MICRO_TESTS_END, which will enable us to do assert tests (they are extremely useful when you are starting with new, unknown code).

我们用TF_LITE_MICRO_TESTS_BEGINTF_LITE_MICRO_TESTS_END包装程序,这将使我们能够进行断言测试(当您从新的未知代码开始时,它们非常有用)。

Main program starts with the code below:

主程序从以下代码开始:

TF_LITE_MICRO_TESTS_BEGIN
TF_LITE_MICRO_TEST(TestInvoke) {
// Set up logging.
tflite::MicroErrorReporter micro_error_reporter;
tflite::ErrorReporter* error_reporter = &micro_error_reporter; // Map the model into a usable data structure. This doesn't involve any
// copying or parsing, it's a very lightweight operation.
const tflite::Model* model = ::tflite::GetModel(cifar_quant_8bit_tflite);
if (model->version() != TFLITE_SCHEMA_VERSION) {
TF_LITE_REPORT_ERROR(error_reporter,
"Model provided is schema version %d not equal "
"to supported version %d.\n",
model->version(), TFLITE_SCHEMA_VERSION);
}

We create an instance of error_reporter which we will use to print out any errors that happen.We load our tflite model into the model instance and check if model of version matches the version of schema. This is needed to make sure that our model is interpreted correctly.

我们创建一个error_reporter实例,以打印出所有发生的错误。将tflite模型加载到模型实例中,并检查版本模型是否与架构版本匹配。 这需要确保正确解释我们的模型。

Next, we need to prepare operator implementations for different layer functions like convolution layer, fully connected layer and activation functions like relu or softmax. This can be done either by calling AllOpsResolver, which pulls all implementations, or calling MicroOpsReslover and then specifying each operator individually. In our case, it makes sense to use the MicroOpsResolver. This way we can call only operators that we need and therefore save space.

接下来,我们需要为不同的层功能(例如卷积层,完全连接的层)和激活功能(例如relu或softmax)准备操作程序实现。 可以通过调用AllOpsResolver (拉动所有实现)或调用MicroOpsReslover ,然后分别指定每个运算符来完成此操作。 在我们的例子中,使用MicroOpsResolverMicroOpsResolver 。 这样,我们可以仅调用所需的运算符,从而节省空间。

tflite::MicroOpResolver <6> micro_op_resolver;
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_CONV_2D,
tflite::ops::micro::Register_CONV_2D(),
3 //version number
); micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_MAX_POOL_2D,
tflite::ops::micro::Register_MAX_POOL_2D(),
2 //version number
);
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_RESHAPE,
tflite::ops::micro::Register_RESHAPE()
);
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_FULLY_CONNECTED,
tflite::ops::micro::Register_FULLY_CONNECTED(),
4 //version number
);
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_SOFTMAX,
tflite::ops::micro::Register_SOFTMAX(),
2 //version number
);
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_DEQUANTIZE,
tflite::ops::micro::Register_DEQUANTIZE(),
2 //version number
);

When we are not sure which exact operators to use, we can just call micro_op_resolver without calling AddBuiltin method and compiler will tell us what we need to add and also which version of operator to add.

当我们不确定要使用哪个确切的运算符时,可以直接调用micro_op_resolver而不调用AddBuiltin方法,编译器会告诉我们需要添加的内容以及要添加的运算符版本。

After all the setup we can finally create our interpreter and allocate tensors for it:

在完成所有设置之后,我们最终可以创建我们的解释器并为其分配张量:

// Build an interpreter to run the model with.
tflite::MicroInterpreter interpreter(model,
micro_op_resolver,
tensor_arena,
tensor_arena_size,
error_reporter);
interpreter.AllocateTensors();

Next set is important for verifying that our model is exactly how we expected it to be:

下一组对于验证我们的模型是否符合我们的预期非常重要:

// Get information about the memory area to use for the model's input.
TfLiteTensor* input = interpreter.input(0); // Make sure the input has the properties we expect.
TF_LITE_MICRO_EXPECT_NE(nullptr, input);
TF_LITE_MICRO_EXPECT_EQ(4, input->dims->size);
TF_LITE_MICRO_EXPECT_EQ(1, input->dims->data[0]);
TF_LITE_MICRO_EXPECT_EQ(kNumRows, input->dims->data[1]);
TF_LITE_MICRO_EXPECT_EQ(kNumCols, input->dims->data[2]);
TF_LITE_MICRO_EXPECT_EQ(kNumChannels, input->dims->data[3]);
TF_LITE_MICRO_EXPECT_EQ(kTfLiteInt8, input->type);

TF_LITE_MICRO_EXPECT_EQ are basically assert macros which will check if two input parameters are equal to each other. We know that we are expecting input tensor of shape [1, 32, 32, 1], which can be checked with size and data members. Rows, columns and channels are defined in model_settings.h file. We can see that input type has to match int8 as we set it during our conversion of tflite format. We can do the same check with our output tensor.

TF_LITE_MICRO_EXPECT_EQ基本上是断言宏,它将检查两个输入参数是否相等。 我们知道我们期望形状为[1、32、32、1]的输入张量,可以使用大小和数据成员进行检查。 行,列和通道在model_settings.h文件中定义。 我们可以看到,在转换tflite格式时,输入类型必须与int8相匹配。 我们可以使用输出张量进行相同的检查。

After making sure that everything as it is supposed to be we, can start executing our model.

在确保应该做的一切之后,就可以开始执行我们的模型了。

load_data(picture1, input);
start = clock();
interpreter.Invoke();
end = clock();
output = interpreter.output(0);
print_result("Picture 1", output, end-start);

Above we see that we load our picture into input tensor, call invoke method and print result with a helper function. We are also measuring how long the inference took. We can run this sort of block as many times we want with different inputs. We then only finish the program with TF_LITE_MICRO_TESTS_END macro.

在上方,我们看到将图片加载到输入张量中,调用了invoke方法并使用helper函数打印结果。 我们还测量了推理所需的时间。 我们可以使用不同的输入多次运行这种块。 然后,我们仅使用TF_LITE_MICRO_TESTS_END宏完成该程序。

5.将cifar项目移植到您的微控制器 (5. Porting cifar project to your microcontroller)

MicroML was designed with idea that changing platforms should not be difficult. As long as you plan to use a microcontroller supported by libopencm3, porting should not be hard. First thing that has to be changed is the project.mk file. At the start of the file, there is a DEVICE variable that is in our example set to stm32f767zi, which is the microcontroller used on NUCLEO-767zi board. This variable can easily be changed to something else (e.g. stm32f405vg ).heChanging the DEVICE requires also to adjust all hardware-specific functions which deal with clock, systick, uart and gpio setups. Datasheets and other resources available online are your best friends in this process. We then need to rebuild microlite.a and testlite.a files, as mentioned earlier.

MicroML的设计理念是,改变平台应该不难。 只要您计划使用libopencm3支持的微控制器,移植就不会很困难。 必须更改的第一件事是project.mk文件。 在文件的开头,在我们的示例中有一个DEVICE变量,设置为stm32f767zi ,这是NUCLEO-767zi板上使用的微控制器。 此变量可以很容易地更改为其他变量(例如stm32f405vg )。更改DEVICE还需要调整所有特定于硬件的功能,这些功能处理时钟,系统,uart和gpio设置。 在线获取数据表和其他资源是您在此过程中最好的朋友。 然后,我们需要重建microlite.atestlite.a文件,如前所述。

To get a better sense of how to use MicroMl for your specific needs check Building your projects section of MicroML’s README.

为了更好地了解如何使用MicroMl满足您的特定需求,请查看MicroML的README的“ 构建项目”部分

最后的想法 (Final thoughts)

This has been quite an involved process from start to finish. We have first created and trained our neural network model and converted it into a microcontroller friendly format. Then we have prepared a development environment and tested our model, first on a computer than on a microcontroller.

从开始到结束,这是一个相当复杂的过程。 我们首先创建并训练了我们的神经网络模型,并将其转换为微控制器友好格式。 然后,我们准备了一个开发环境并测试了模型,首先是在计算机上而不是在微控制器上。

The field of machine learning on embedded systems is very new and rapidly changing. What is true today, might be obsolete next month. What makes it even more challenging is that it is combining two disciplines that used to be separated from each other.

嵌入式系统上的机器学习领域是一个非常新的领域,并且瞬息万变。 今天的情况可能会在下个月过时。 使它更具挑战性的是,它结合了过去彼此分离的两个学科。

Many machine learning engineers have no experience with constrained embedded environments, where speed and size matter the most. Most of the embedded programmers also never needed to use tools like Colaboratory or frameworks like TensorFlow to work on their projects.

许多机器学习工程师对受限的嵌入式环境没有任何经验,在受限的嵌入式环境中,速度和大小至关重要。 大多数嵌入式程序员也永远不需要使用诸如Colaboratory之类的工具或诸如TensorFlow之类的框架来处理其项目。

With a high rise in demand for low-power and low-bandwidth applications (e.g. IoT), machine learning applications on embedded systems will have to match the pace with their development.

随着对低功耗和低带宽应用程序(例如IoT)的需求上升,嵌入式系统上的机器学习应用程序将必须与他们的发展步伐保持一致。

Rapidly changing fields such as this one always require engineers to dig deeper and figure out all intricacies of the technology that they never had to deal with in order to make things simpler and more efficient at a later stage. But we are not there yet. Whichever way we look at it, in the long and short term, the people who manage to overcome the struggles of going out of their comfort zones in the development process will benefit the most.

诸如此类的快速变化的领域总是要求工程师更深入地研究并弄清他们从未处理过的技术的所有复杂性,以便在以后的阶段使事情变得更简单,更高效。 但是我们还没有。 无论从长远来看,无论从哪种角度看,在开发过程中设法克服走出舒适地带的斗争的人们都会从中受益最大。

翻译自: https://medium.com/@institute_irnas/part-2-creating-a-simple-keras-model-for-inference-on-microcontrollers-b450102c3ea9

dl4j keras 推理

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值