1. Introduction
This paper is published in ISSCC in 2017. Recently there has been increased interest in deep learning for mobile IoT to enable intelligence at the edge. Therefore, low power is a critical design constraint. The researchers introduce a low-power, programmable deep learning accelerator (DLA).
2. Innovation points
Top-level diagram of proposed DLA is shown below.
2.1 Four processing elements (PEs) are located amidst the weight storage memory
This accelerator is almost entirely on-chip storage, minimizing data movement overhead. But I think we need to take cost into account
2.2 Adopt a non-uniform memory hierarchy
The non-uniform memory hierarchy provides a trade-off between small, low-power memory banks for frequently used data and larger, high density banks with higher power for the large amount of infrequently accessed data.
3. Summary
The performance of the chip and die photo are shown below.