边缘计算中高效ML的EEoI

最新推荐文章于 2024-09-15 22:31:42 发布

weixin_26632369

最新推荐文章于 2024-09-15 22:31:42 发布

阅读量634

点赞数

文章标签： python

原文链接：https://medium.com/mindboard/eeoi-for-efficient-ml-with-edge-computing-9e597175f080

版权

边缘计算 (Edge Computing)

A recent trend in computer networking is the utilization of edge devices, intermediate computing devices that bridge the gap between end devices and the cloud. End devices (e.g. phones, tablets, smart speakers) are restricted by their lack of computational resources, limiting the employment of resource intensive functions such as machine learning. Cloud computing has provided an alternative way of executing tasks that would otherwise be infeasible on an end device. Devices can offload processes to remote servers, which are capable of faster computation, and return the results back to the end device. However, not all computing tasks benefit from the cloud. Sending data to the cloud can introduce latency caused by network unreliability. If possible, it is more efficient to perform computations locally.

计算机网络的最新趋势是利用边缘设备，桥接终端设备和云之间差距的中间计算设备。 终端设备 (例如电话，平板电脑，智能扬声器)由于缺乏计算资源而受到限制，从而限制了诸如机器学习之类的资源密集型功能的使用。 云计算提供了另一种执行任务的替代方式，否则在终端设备上是不可行的。设备可以将进程分流到能够进行更快计算的远程服务器上，然后将结果返回给终端设备。但是，并非所有计算任务都可以从云中受益。将数据发送到云会引入由网络不可靠引起的延迟。如果可能，在本地执行计算会更有效。

An alternate to the cloud, edge devices are locally positioned servers that can communicate with end devices at lower latency. However, edge devices offer less available resources than the cloud. One basic strategy for edge computing is to perform tasks as close to the place of request as possible. Vertical collaboration does this by restricting communication so that only adjacent devices can offload tasks. For example, if a calculation is requested on a mobile device, the process will first be attempted on the device itself. If the device’s resources are insufficient, the task will be offloaded to an edge node. If the edge node is unable to complete the job, the process will be moved to the cloud, where it can be executed and returned to the end device. More complex computing architectures can split up a process into multiple sub-processes that are executed across each layer in the network hierarchy. Networks may also incorporate multiple edge nodes for parallel computation.

边缘设备是云的替代产品，它们是位于本地的服务器，可以以较低的延迟与终端设备进行通信。但是，边缘设备提供的可用资源少于云。边缘计算的一种基本策略是执行尽可能靠近请求位置的任务。 垂直协作通过限制通信来做到这一点，这样只有相邻的设备才能卸载任务。例如，如果在移动设备上请求计算，则将首先在设备本身上尝试该过程。如果设备的资源不足，则任务将被卸载到边缘节点。如果边缘节点无法完成作业，则该过程将移至云，可以在其中执行并返回到终端设备。更复杂的计算体系结构可以将一个过程分成多个子过程，这些子过程跨网络层次结构的每个层执行。网络还可以合并多个边缘节点以进行并行计算。

提前退出推理(EEoI) (Early Exit of Inference (EEoI))

EEoI is an application of vertical collaboration within an edge network that provides an efficient method for running Neural Network model inference on end devices with scarce resources. A typical neural network uses a series of computational layers to sequentially transform input data. The output of one layer feeds into the next, with the output of the last layer producing the final model output. The number of layers in a model is determined before the model is trained and cannot be adjusted without training an entirely new model. Having more layers in a neural network allows more complex distributions to be modeled, so there need to be enough layers to accommodate the most complex portion of the training data set. However, there is usually a trade off between model accuracy and efficiency. Models with more layers have the capacity to achieve higher inference accuracy over more complex data, but also have a larger memory footprint and have slower inference speed.

EEoI是边缘网络中垂直协作的应用程序，它为在资源稀缺的终端设备上运行神经网络模型推断提供了一种有效的方法。典型的神经网络使用一系列计算层来顺序转换输入数据。一层的输出馈入下一层，最后一层的输出产生最终的模型输出。模型中的层数是在训练模型之前确定的，如果不训练全新模型就无法调整。神经网络中具有更多的层可以对更复杂的分布进行建模，因此需要足够的层来容纳训练数据集的最复杂部分。但是，通常在模型准确性和效率之间进行权衡。具有更多层的模型能够在更复杂的数据上实现更高的推理精度，但也具有更大的内存占用量和较慢的推理速度。

A drawback of the typical feed-forward network architecture is that the model size is fixed. Every inference call uses the entire stack of layers to produce an output. As a result, the complexity requirements of many neural network models make them incapable of directly servicing end devices. Larger models are relegated to edge or cloud devices, increasing model inference latency. However, most data sources contain samples of varying levels of complexity. Within a data set, some samples are more well behaved, strictly adhering to a primary distribution. On the other hand, other samples will contain more unique elements or noise. We will use this fact to design a neural network architecture that only uses as many layers as it needs to for inferring any given sample. To do this, we will implement Early Exit on Inference.

典型的前馈网络体系结构的缺点是模型大小是固定的。每个推理调用都使用整个层堆栈来产生输出。结果，许多神经网络模型的复杂性要求使其无法直接维修终端设备。较大的模型被降级到边缘或云设备，从而增加了模型推理延迟。但是，大多数数据源都包含不同复杂程度的样本。在数据集中，某些样本的行为更为严格，严格遵循原始分布。另一方面，其他样本将包含更多独特元素或噪声。我们将利用这一事实来设计一种神经网络体系结构，该体系仅使用推理任何给定样本所需的多层。为此，我们将实现Inference的Early Exit 。

EEoI incorporates multiple output layers, at various layer depths within a neural network. Each output layer is learning to accurately produce classification predictions based only upon the information that has been processed within the layers that have come before it. Output layers that are deeper in the network have the opportunity to learn more complex distributions, leading to better inference accuracy than the preceding output layers. During inference, each successive output layer is observed. Inference continues through the proceeding layers until an output layer makes a prediction with a sufficiently high probability, or until the final output layer is reached. The prediction probability threshold for an EEoI is a pre-set hyper-parameter, with a trade-off between overall inference accuracy and inference speed. A higher probability threshold produces more accurate, but slower inference, on average.

EEoI在神经网络内以不同层深度合并了多个输出层。每个输出层都在学习仅基于已在其之前的层中处理过的信息来准确地产生分类预测。网络中较深的输出层有机会学习更复杂的分布，从而导致比之前的输出层更好的推理精度。在推论过程中，观察到每个连续的输出层。推理将继续进行，直到输出层以足够高的概率进行预测，或者直到到达最终输出层为止。 EEoI的预测概率阈值是预设的超参数，需要在总体推理精度和推理速度之间进行权衡。平均而言，较高的概率阈值会产生更准确但较慢的推断。

将EEoI应用于边缘计算 (Applying EEoI to Edge Computing)

EEoI can be applied to edge computing, allowing a model to be separated across multiple devices. Initial layers can be placed on an end device, giving the device the chance to perform inference on well-behaved samples. If the desired output confidence is not achieved, the end device can offload the last layer of its portion of the model to an edge device. The last layer of the model is effectively an encoded version of the original input sample. This provides the added benefit of smaller data size for more efficient offloading. The edge device will continue the inference processes, offloading to the cloud if need be.

EEoI可以应用于边缘计算，从而允许在多个设备之间分离模型。可以将初始层放置在终端设备上，从而使设备有机会对行为良好的样本进行推理。如果未达到所需的输出置信度，则终端设备可以将其模型部分的最后一层卸载到边缘设备。模型的最后一层实际上是原始输入样本的编码版本。这提供了较小的数据大小以实现更有效的卸载的附加好处。边缘设备将继续推理过程，并在需要时将其卸载到云中。

Let’s take a look at an example of EEoI on the MNIST digits data set in Keras (TF 2.1). First, we will load the data set.

让我们看一下Keras(TF 2.1)中MNIST数字数据集上EEoI的示例。首先，我们将加载数据集。

Next, we create a classification model in 3 portions, one for each device. Each device has a classification prediction output layer. The end and edge device models also need offloading output layers that supply the encoded data representations that will be offloaded to the next device if need be.

接下来，我们分三部分创建分类模型，每个设备一个。每个设备都有一个分类预测输出层。终端和边缘设备模型还需要卸载输出层，这些输出层将提供编码的数据表示形式，如果需要，可以将其卸载到下一个设备。

Putting all 3 models together, we have a complete model that we can train. The target labels need to be supplied to all 3 output layers.

将所有3个模型放在一起，我们就可以训练一个完整的模型。目标标签需要提供给所有3个输出层。

Now we are ready to train the full model. The EarlyStopping callback is used to terminate training before the model over-fits the test data.

现在我们准备训练完整的模型。 EarlyStopping回调用于在模型过度拟合测试数据之前终止训练。

With our trained model, we can test the performance of EEoI by running inference over the test data set with each device model. Using different confidence thresholds, we can observe the total accuracy and we can see how often each device is used for inference.

利用我们训练有素的模型，我们可以通过对每种设备模型的测试数据集进行推断来测试EEoI的性能。使用不同的置信度阈值，我们可以观察到总体准确度，并且可以看到每个设备用于推理的频率。

Results

结果

Just as we had hoped, increasing the confidence threshold improves our overall accuracy in exchange for increased utilization of edge and cloud devices.

就像我们希望的那样，增加置信度阈值可以提高我们的整体准确性，以换取边缘和云设备利用率的提高。

Now just for fun, let’s take a look at some of the compressed samples that are offloaded to the edge and cloud devices.

现在，只是为了好玩，让我们看一些卸载到边缘和云设备的压缩样本。

As a sample is offloaded from end to edge and edge to cloud, it is compressed and encoded.

随着样本从一端到另一端以及从另一端到云的卸载，它被压缩和编码。

Conclusion

结论

EEoI provides an elegant method for efficiently determining when to offload neural network inference to edge and cloud devices. EEoI is one of many innovative techniques that can help us to better utilize the edge for machine learning.

EEoI提供了一种优雅的方法，可以有效地确定何时将神经网络推理转移到边缘和云设备。 EEoI是可以帮助我们更好地利用边缘进行机器学习的众多创新技术之一。