Neurosim的manual细读（二）

最新推荐文章于 2024-06-06 09:34:20 发布

郑心怡呀

最新推荐文章于 2024-06-06 09:34:20 发布

阅读量1.9k

点赞数 4

文章标签：神经网络机器学习深度学习

本文链接：https://blog.csdn.net/qq_41382643/article/details/120521035

版权

一下都写下来，实在有些乱，我觉得还是一章一写，好一些，那这篇就写Chip Level Architectures

In this framework, we consider the on-chip memory is sufficient to store synaptic weights of the entire
neural network, thus the only off-chip memory access is to fetch in the input data. Fig. 3 shows the modeled
chip hierarchy, where the top level of chip is consist of multiple tiles, global buffer, accumulation units,
activation units (sigmoid or ReLU), and pooling units. Fig. 3 (b) shows the structure of a tile, which contains
several processing elements (PEs), tile buffer to load in neural activations, accumulation modules to add up
partial sums from PEs and output buffer. Similarly, as Fig. 3 © shows, a PE is built up by a groups of
synaptic sub-arrays, PE buffers, accumulation modules and output buffer. In Fig. 3 (d), it shows an example
of synaptic sub-array, which is based on one-transistor-one-resistor (1T1R) architecture for eNVMs. At
sub-array level, the array architecture is different for SRAM or FeFET (not shown in this figure).

这段话很重要
在这个架构中，我们姑且认为片上的memory足够去存储神经网络的突触权重（synaptic weights）
这里的Synaptic weights不知道是什么意思，结合语境，大概指的是每个单元的权值，先这样理解。
因此，唯一需要从外部存储器访问的就是输入值input了
在这里插入图片描述
此图中显示了建模的芯片层次结构。
芯片的顶级芯片由多个tile、全局缓冲区（global buffer）、积累单元（accumulation units）、激活单元(activation units s型或ReLU)和池化单元( pooling units)组成。
tile由处理单元（PE）、tile buffer（用来load in neural activation）、 accumulation modules（用来做加法）
PE由一些突触阵列（synaptic sub-arrays）、PE缓冲器（PE buffers）、 accumulation modules 、 output buffer组成
突触阵列（synaptic sub-array）由eNVM的1T1R组成。
在子阵列级别上（ sub-array level），SRAM或FeFET的阵列体系结构是不同的

1、对于互连线，需要找到功耗和延迟的折中，方案是H型连线
在这里插入图片描述
2、 Floorplan of Neural Networks 神经网络的Floorplan

To map various neural networks according to the defined chip architecture, it is crucial to follow a certain
rule which does not violate hardware structure (and data flow) while guarantees high-enough memory utilization
map这里是寻址？这里大概意思是需要有一个rule，不违背器件结构，同时保证高的memory利用率。

We defined an algorithm to automatically generate the floorplan based on two kinds of weight-mapping methods, which optimize the memory utilization and define the tile size, PE size, number of tiles
needed, based on user-defined synaptic array size.
我们定义了一种算法，自动的产生floorplan ，基于的是两种权重映射方法（weight-mapping methods）
优化了memory利用率，定义了tile size ，PE size，tile数量，基于的是用户定义的突触阵列尺寸。

The floorplan starts from tile sizing to PE sizing, while the size of synaptic array is defined by users in
Param.cpp.
floorplan从tile的尺寸确定开始，到PE尺寸确定，而synaptic array的尺寸由用户定义

With pre-defined network structure and weight mapping method, NeuroSim automatically
calculate weight-matrix size for each layer (especially for convolutional ones, where 3D kernels will be
unrolled to 2D matrices),
用户会设置network structure和weight mapping method，Neurosim会自动计算每层的weight-matrix size
虽然说不知道啥是weight-matrix size，代表了什么，哪个层次，不太知道。。。

the tile size firstly is set to a maximum value which could contain the largest weight-matrix among all the layers, then NeuroSim calculate the memory utilization (defined as memory mapped by synaptic weights / total memory storage on chip), keep decreasing the tile size till NeuroSim find a solution with optimal memory utilization.
tile的size会首先设置成最大，以包含各层的最大的weight-matrix
然后NeuroSim 计算memory的利用率，持续减小tile size直到NeuroSim 找到一个具有最佳memory利用率的解决方案
这段说实话还是不太懂，总之大意可能是想说NeuroSim 可以自动计算一些东西。。。

To further increase memory utilization and speed up the processing speed of whole network as much as
possible, weight duplication is introduced to each layer.
为了提高memory利用率，加速整个network的处理速度，每层都引入了weight duplication
weight duplication直译大概是权重重复，不知道是什么含义。。。

Since the layer structure (such as input feature size, channel depth and kernel size) varies significantly in DNNs, which could occupy various amounts of synaptic arrays, it is possible that, the weight of several layers cannot fully fill one PE or even one synaptic array, a naïve way to custom-design the hardware is to mix multiple such small layers into one tile (or even one PE), however, this could make it complicated to define tile/PE size and number of tiles needed, thus, in this framework, we assume one tile is the minimum computation units for each layer, i.e., it is not allowed to map more than one layer into one tile, but there could be multiple tiles to map one single layer.
没看懂，最后一句大概是tile是每层的最小的计算单元，不允许去把多个layer 映射到一个tile上，但可以把多个tile映射到一个layer上，不太懂。。。。

Finally, weight duplication could be further utilized inside PE, i.e. duplicate weight among synaptic arrays,
in the similar way as PE design, the only difference is the synaptic array size if fixed. With these three stage
floorplans, NeuroSim could guarantee high-enough memory utilization, meanwhile optimize the inference
process speed.
最终，权重重复会在PE中进一步使用，在synaptic arrays中与PE中类似，唯一的不同在于synaptic array size是否固定。
通过这三步floorplans，NeuroSim 就把memory利用率和处理速度优化好了
虽然还是一点没懂

3、Weight Mapping Methods

We support two mapping methods in this framework, conventional mapping and novel mapping method
which was proposed in [8].
method有两种，传统的和新的。

Fig. 6 shows the example of conventional mapping for one convolutional layer, where each 3D kernel (weight) is unrolled into a long column, since the partial sums in each 3D will be summed up to get the final output. Thus, the total kernels in each convolutional layer will form a group of such long columns, i.e., a large weight matrix. 在这里插入图片描述
此图展示了一个对一层卷积层的conventional mapping的例子，好吧我放弃了。。。

4、Pipeline System

In this framework, we assume all the synaptic weights are mapped on to the inference chip, which means it
is possible to build up a pipeline system with acceptable global buffer overhead (to save activations for
different images), to improve throughput and energy efficiency (less leakage for idle cycles).
原因没看懂，反正就是可以pipeline，后面也没看懂。。。。

那chip级就到这里。。。讲的都是一些很具体的东西，机器学习我又没看过，所以理解起来太困难了，只知道大概是什么就行了，有时间回来细看。

下一篇就是最长的电路级了，和我们所学的知识也更加贴近，应该能看懂的多一点。