神经网络与体系结构的故事01

A 2.2 GHz SRAM with High Temperature Variation Immunity for Deep Learning Application under 28nm(论文)

1.abstract

(1) The implementation of machine learning algorithms needs intensive memory access, and SRAM is critical for the overall performance. This paper proposes a new design of high speed SRAM for machine learning purposes.

(2) Compared with Samsung HL 152,our design has smaller size (121X43 um2 vs 127X44 um2) with half the number of pins ports (12 vs 25) and higher speed (2.2GHz vs 0.8GHz).

2.introduction

(1) In hardware implementation of machine learning algorithms, data transition is a critical part in regards to performance and power consumption.

(2 )As shown in [2,3], conventional SRAM design can only perform stable read/write speed to 0.4~0.8 GHz within normal operating conditions. On the other hand, processor clock can run at > 2GHz. Consequently, it usually takes 3~5 clock cycles to finish memory access operation, which can be a bottleneck for hardware Deep Learning (DL) performance.

(3) process, voltage, and temperature (PVT).

(4)The contributions of this paper are as follows:

  1. We designed a high-speed SRAM running at 2.2 GHz. With fast access time, the proposed SRAM can help accelerate hardware in machine learning applications.
  2. We proposed a temperature-variation-immune SRAM design. The proposed work is compatible with conventional SRAM process without additional mask cost. The temperaturevariation-immune SRAM design can benefit machine learning applications, since intensive memory access of machine learning can significantly vary due to the temperature of SRAM.
  3. The smaller size of bank is designed to gain higher configurability and faster read/write. The machine learning algorithm could turn off some neurons to avoid unnecessary leakage current with less power consumption.

3.access in machine learning application

(1) Convolution layer
在这里插入图片描述

(2) Fully connected layer
在这里插入图片描述

(3) Flow chart of neuron computation in neural network
在这里插入图片描述

(4) Conventional design methodology requires every part of SRAM to pass the worst-case temperature variation condition, which means slower SRAM system clock.

(5) In order to avoid functional failure, the design needs to be pessimistic. To avoid the overconservative design methodology, real-time temperature monitors for different SRAM parts need to be implemented, and the corresponding temperature compensation is required so that the performance of SRAM does not degrade even with significant temperature variation.

4.system architecture

In the proposed SRAM, fast access is achieved by folded SRAM structure and careful layout optimization. In addition, dual-loop process/temperature compensation is implemented to eliminate the effect of process and temperature to access time. Furthermore, the proposed SRAM uses one-port design instead of conventional dual-port and thereby saves routing resource. With optimized structure design, the proposed fast SRAM can run reliably at
1.8~2.2 GHz.

(1)Folded Structure
(2)Dual-Loop Process/ Temperature Compensation

5.conclusion

In this paper, we propose a new design of high speed SRAM for machine learning purposes. With fast access time (cycle time: 650 ps, access time: 350 ps), low sensitivity to temperature variation and high configurability (less than 10% performance differencebetween 125_rcw_tt vs 0_rcw_tt) the proposed SRAM is a better candidate for hardware machine learning system the conventional SRAM. Compared with Samsung HL 152, our design has smaller size (121X43 um2 vs 127X44 um2) with half the number of pins ports (12 vs 25) and higher speed (2.2GHz vs 800MHz).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值