【计算机科学】【2017.09】深度学习的有效方法与硬件实现

在这里插入图片描述
本文为美国斯坦福大学（作者：SongHan）的博士论文，共125页。

未来将普遍需要廉价、低功耗硬件平台的智能设备。深度神经网络已经发展成为机器学习任务的最先进技术。然而，这些算法的计算量很大，使得在硬件资源有限且功耗预算紧张的嵌入式设备上部署非常困难。由于摩尔定律和技术的发展速度正在放缓，单凭硬件发展是无法解决这个问题的。

为了解决这个问题，我们着重于高效的算法和专门为该算法设计的特定领域的体系结构。通过从应用程序到硬件对整个堆栈进行优化，我们通过更小的模型尺寸、更高的预测精度、更快的预测速度和更低的功耗提高了深度学习的效率。我们的方法从改变算法开始，使用“深度压缩”，通过修剪、训练量化和可变长度编码显著减少深度学习模型的参数数量和计算要求。“深度压缩”可将模型尺寸减小18×至49×而不影响预测精度。我们还发现修剪和稀疏约束不仅适用于模型压缩，而且也适用于正则化，提出了密集-稀疏-密集（DSD）训练，可以提高广泛深度学习模型的预测精度。为了在硬件上有效地实现“深度压缩”，我们开发了EIE，即“高效推理引擎”，一种应用于特定领域的硬件加速器，它直接对压缩后的模型进行推理，显著地节省了内存带宽。利用压缩模型，能有效地处理不规则的计算模式，EIE比GPU提高了13倍的速度和3400倍的能源效率。

The future will be populated withintelligent devices that require inexpensive, low-power hardware platforms.Deep neural networks have evolved to be the state-of-the-art technique formachine learning tasks. However, these algorithms are computationallyintensive, which makes it difficult to deploy on embedded devices with limitedhardware resources and a tight power budget. Since Moore’s law and technology scaling areslowing down, technology alone will not address this issue. To solve thisproblem, we focus on efficient algorithms and domain-specific architecturesspecially designed for the algorithm. By performing optimizations across thefull stack from application through hardware, we improved the efficiency ofdeep learning through smaller model size, higher prediction accuracy, fasterprediction speed, and lower power consumption. Our approach starts by changingthe algorithm, using “Deep Compression” that significantly reducesthe number of parameters and computation requirements of deep learning modelsby pruning, trained quantization, and variable length coding. “DeepCompression” can reduce the model size by 18× to 49× without hurting theprediction accuracy. We also discovered that pruning and the sparsityconstraint not only applies to model compression but also applies toregularization, and we proposed dense-sparse-dense training (DSD), which canimprove the prediction accuracy for a wide range of deep learning models. Toefficiently implement “Deep Compression” in hardware, we developed EIE,the “Efficient Inference Engine”, a domain-specific hardwareaccelerator that performs inference directly on the compressed model whichsignificantly saves memory bandwidth. Taking advantage of the compressed model,and being able to deal with the irregular computation pattern efficiently, EIEimproves the speed by 13× and energy efficiency by 3,400× over GPU.

1 引言
2 项目背景
3 修剪的深度神经网络
4 训练量化与深度压缩
5 DSD：密集-稀疏-密集训练
6 EIE：用于稀疏神经网络的高效推理引擎
7 结论