Note_TinyML for Ubiquitous Edge AI

Abstract

TinyML(Tiny machine learning)

TinyML is a fast-growing multidisciplinary field at the intersection of machine learning, hardware, and software
enabling deep learning algorithms on embedded (microcontroller powered) devices operating at extremely low power range (mW range and below).

Challenges

1、designing power-efficient
2、compact deep neural network models
3、 enable inference applications on battery-operated, resource constrained devices

Executive Summary

Goal

1、To understand the technology behind TinyML
2、To discuss the trend and challenges that TinyML technology currently faces

TinyML framework

trend

1、chip design:
a.more power-efficient,
b.faster computations
c.novel chip architectures optimized to support specific neural network algorithms(application-specific chip design)
2、Software libraries:machine learning algorithm optimized for faster execution
3、model compression:produce lightweight and reliable machine learning models
4、cross-domain:
a.Fitting the neural network models to the resource-constrained hardware ( considering the diversity across algorithms and hardware platforms).
b.power consumption vs. processing speed trade-off and its effect on the algorithm’s accuracy.

challenges

1、model training
a、resource-constrained embedded devices are not capable of performing model training.
b、The model training is done using a more powerful computing device
c、pre-trained model is deployed to the embedded devices
2、continuous model update:the absence of a reliable connection to the device
3、remote deployment of these models to embedded devices:

TinyML framework

Frameworks deploying neural network models to microcontrollers:
1、TensorFlow Lite
2 、ELL
3 、ARM-NN
HOW: first converted into a common format type , then optimized internally for deployment on the particular hardware platform
Applications: involving computer vision, audio processing, or NLP algorithms;in healthcare, autonomous systems, and surveillance
Finally, there are already many companies working on the development of efficient and preoptimized solutions based on the TinyML software and off-the-shelf hardware, with the goal to shorten the final production cycle significantly. They all work together toward developing standardized machine learning models, practices, and benchmarking tools that will enable
systematic development and broad adoption of TinyML technology.

1.Introduction

Evolution of computing technologies

Figure 1 Evolution of computing technologies
	Figure 1 Evolution of computing technologies

Evolution

1、wireless communications
2、mobile computing platforms :laptops, smartphones, and tablets
3、used in many fields: such as robotics, self-driving cars, or augmented reality
4、 the expansion of machine learning and AI fields, and the IoT field,Today, the number of microcontroller (MCU)-powered devices is growing exponentially

2.Machine Learning on MCUs

advantages:embedded devices do not process any collected data with undesired data latency, data loss, and data privacy issues
disadvantages:the fundamental challenges in deploying machine learning algorithms to embedded devices with limited resources (in terms of processing speed, memory, energy).

2.1 Challenges of Machine Learning on Embedded Devices

small memory size and the short battery life
Table 1 Representative devices supported by TensorFlow Lite for Microcontrollers.
Table 1 Representative devices supported by TensorFlow Lite for Microcontrollers.
1、Peak memory usage:the model must be analyzed in terms of multiply-add operations per inference
2、 limited battery life:In general, energy efficiency depends on the machine learning algorithm’s computational cost and the processing duty cycle

3.Overcoming Challenges of Machine Learning on the Edge

1、Model reduction
to reduce the size of the neural model to fit it to the microcontroller:Several methods are used, such as model shrinking (reducing the number of network layers), model pruning (setting the low-value weights equal to zero), or parameter quantization
2、Lightweight frameworks
open source tools, These frameworks include highly efficient inference libraries and workflow processes。

3.1 Model Compression Algorithms

Figure 3 The pipeline processing architecture introduced in [11] for model reduction. Each block
introduces a significant reduction in model size with no effect on model accuracy.
在这里插入图片描述

3.1.1 Model Pruning

aiming to reduce the number of connections in the neural network model by pruning non-informative weights based on some loss function.
1、weighs pruning based on Hessian of the loss function。参考文献 [13]
2、using the percentage of zero outputs to prune unimportant connections。参考文献 [14]

3.1.2 Parameter Quantization

1、reducing the number of bits needed to represent each weight , deciding the number of bits needed to represent weights based on observed [min, max] 。
2、linear quantization, non-linear (log based) quantization, and k-means clustering, even single bit quantization for binary weight neural networks.

3.1.3 Network Compression

Huffman coding

3.1.4 Knowledge Distillation

1、the set of methods used to transfer knowledge from a “teacher” model to the much smaller “student” model
2、It is based on the fact that a smaller model does not have a sufficient capacity to learn all interdependencies of a large dataset so that this knowledge can be transferred from the larger model.
3、useful when the application involves small data sets or when it requires a significant efficiency improvement.

3.2 Model Selection: Complexity-Accuracy Trade-off

Understanding of differences and trade-offs between different neural network architectures is
critical for the model selection process
在这里插入图片描述
Convolutional neural networks:
1、generally easier to optimize for good performance, and they are successful in low-level feature extractions.
2、require a large number of operations to converge to a good result.
3、use fixed input size data blocks, which is not the best option for processing data that involves temporal dependency, such as audio processing.

Recurrent neural networks:
1、perform well on tasks that involve temporal sequences (audio tasks)
2、contain large number of parameters, making them more challenging for deployment on highly constrained embedded devices.

4.TinyML Frameworks

1、MCU-powered devices cannot currently support model training the model is first trained in the cloud or on a more powerful device, and then deployed back to the embedded device.
2、A good overview of challenges and directions toward making on-device training possible is provided in [34].
3、There are three ways to deploy the model to the embedded system: coding by hand, code generation, and ML interpreters [18].

4.1 TensorFlow Lite

1、consists of two main tools, Converter and Interpreter.
Converter:transforms the TensorFlow code into a particular format, reduces the model’s size, and optimizes the code for minimal accuracy loss.
Interpreters: a library that executes the code on the embedded device.
2、 TensorFlow Lite for Microcontrollers:
a、surpport 32-bit microcontrollers with just a few kB of memory
b、successfully ported to many processors: Arm Cortex-M Series 、 ESP32
c、as an Arduino library: can be ported to other C++ 11 projects as an open-source library
d、the limitations of TensorFlow for Microcontrollers:given in [19].
e、the steps needed to build “wake words” application:
Figure 4 An example of wake words application developed using TensorFlow Micro [19] tool。Figure 4 An example of wake words application developed using TensorFlow Micro [19] tool.

4.2 Embedded Learning Library (ELL)

cooperation:Microsoft.
platform:ARM Cortex-A and Cortex-M,such as Arduino, Raspberry Pi, and micro:bit [20].
usage:develop machine learning models.
input:
OpenNeural Network Exchange (ONNX) format [30],
TensorFlow [31],
Microsoft Cognitive Toolkit (CNTK) [32] format.
output: .ell file

4.3 ARM-NN

cooperation:Arm.
core: CMSIS NN [33] software library.
platform:Cortex-M.
usage:open source Linux software for machine learning inference on embedded devices.
special:model parameters are quantized to 8-bit or 16-bit integers.

5.TinyML Applications

5.1 Healthcare

5.2 Surveillance

5.3 Embedded Security

5.4 Industrial Monitoring

5.5 Autonomous Systems

5.6 Augmented/Virtual Reality

5.7 Smart Spaces

6.TinyML Challenges

Hardware and software heterogeneity
Lack of benchmarking tools
Adaptation and lifelong learning
Lack of appropriate datasets
Lack of widely accepted models
Need for other types of machine learning models

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值