网络模型计算量评估

最新推荐文章于 2024-05-29 13:27:18 发布

WTHunt

最新推荐文章于 2024-05-29 13:27:18 发布

阅读量2k

点赞数 2

分类专栏：轻量神经网络文章标签： macc FLOPS FLOPs 计算量神经网络

本文链接：https://blog.csdn.net/qq_20880415/article/details/102822740

版权

轻量神经网络专栏收录该内容

5 篇文章 0 订阅

订阅专栏

计算量

访存

计算量

计算性能指标：

● FlOPS: floating point operations per second

计算量指标：

● MACCs or MADDs: multiply accumulate operations

FLOPS和FLOPs的区别：

FLOPS：注意全大写，是floating point operations per second的缩写，意指每秒浮点运算次数，理解为计算速度。是一个衡量硬件性能的指标。

FLOPs：注意s小写，是floating point operations的缩写（s表复数），意指浮点运算数，理解为计算量。可以用来衡量算法/模型的复杂度。

注意点：MACCs就是乘加次数，FLOPs就是乘与加的次数之和

点乘求和举例说明：

● y = w[0]*x[0] + w[1]*x[1] + w[2]*x[2] + ... + w[n-1]*x[n-1]

w[0]*x[0] + ... 认为是1个MACC，所以是n MACCs

上式乘加表达式中包含n个浮点乘法和n - 1浮点加法，所以是2n-1 FLOPS

一个 MACC差不多是两个FLOPS

注意点: 严格的说，上述公式中只有n-1个加法，比乘法数少一个。这里MACC的数量是一个近似值，就像Big-O符号是一个算法复杂性的近似值一样。

实际卷积计算量：

关于计算量相关的细节可以参考文章《PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE, ICLR2017》，

假设采用滑动窗实现卷积且忽略非线性计算开销，则卷积核的FLOPs为:

$FLOPs=2HW(C_{in}K^{2}+1)C_{out}$

神经网络各层FLOPs计算：

● Full Connected Layer：multiplying a vector of length I with an I × J matrix to get a vector of length J, takes I × J MACCs or (2I - 1) × J FLOPs.

● Activate Layer: We do not measure these in MACCs but in FLOPs, because they’re not dot products.

● Convolution Layer: K × K × Cin × Hout × Wout × Cout MACCs

● Depthwise-Seperable Layer: (K × K × Cin × Hout × Wout) + (Cin × Hout × Wout × Cout) MACCs

->Cin × Hout × Wout × (K × K + Cout) MACCs

● Factor is K × K × Cout / (K × K + Cout).

访存

● 计算量仅仅是运算速度的一个方面，另一个重要的方面是内存带宽(memory bandwidth)，甚至比计算量还重要

● 对现代计算机来说，a single memory access from main memory is much slower than a single computation — by a factor of about 100 or more!

● 一个网络来说，内存访问需要访问多少次呢？对每一层来说，包括了以下内存访问：

1. 读取每层的输入

2. 计算结果：包括了载入权重

3. 输出每层的结果

Memory for weights

● Full Connected：输入I/输出J，总共是(I+1)*J

● Convolutional layers have less weights than fully-connected layers：K × K × Cin × Cout

● 因为内存访问非常慢，所以大量的内存访问给网络运行带来的很大的影响，甚至超过了计算量

Feature maps and intermediate results

● Convolution Layer:

(the weights here are negligible)

input = Hin × Win × Cin × K × K × Cout

output = Hout × Wout × Cout

weights = K × K × Cin × Cout + Cout

Example: Cin = 256, Cout = 512, H = W = 28, K = 3,S = 1

1、Normal convolution layer

input = 28 × 28 × 256 × 3 × 3 × 512 = 924,844,032

output = 28 × 28 × 512 = 401,408

weights = 3 × 3 × 256 × 512 + 512 = 1,180,160

total = 926,425,600

2、depthwise layer+pointwise layer

1)depthwise layer

input = 28 × 28 × 256 × 3 × 3 = 1,806,336

output = 28 × 28 × 256 = 200,704

weights = 3 × 3 × 256 + 256 = 2,560

total = 2,009,600

2)pointwise layer

input = 28 × 28 × 256 × 1 × 1 × 512 = 102,760,448

output = 28 × 28 × 512 = 401,408

weights = 1 × 1 × 256 × 512 + 512 = 131,584

total = 103,293,440

total of both layers = 105,303,040

案例研究

● Input dimension: 126x224

MobileNet V1 parameters (multiplier = 1.0): 1.6M

MobileNet V2 parameters (multiplier = 1.0): 0.5M

MobileNet V2 parameters (multiplier = 1.4): 1.0M

MobileNet V1 MACCs (multiplier = 1.0): 255M

MobileNet V2 MACCs (multiplier = 1.0): 111M

MobileNet V2 MACCs (multiplier = 1.4): 214M

MobileNet V1 memory accesses (multiplier = 1.0): 283M

MobileNet V2 memory accesses (multiplier = 1.0): 159M

MobileNet V2 memory accesses (multiplier = 1.4): 286M

MobileNet V2 (multiplier = 1.4) is slightly slower than MobileNet V1 (multiplier = 1.0)

This provides some proof for my hypothesis that the amount of memory accesses is the primary factor for determining the speed of the neural net.

结论

“I hope this shows that all these things — number of computations, number of parameters, and number of memory accesses — are deeply related. A model that works well on mobile needs to carefully balance those factors.”

WTHunt

关注

2
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
网络模型计算量评估

目录计算量访存计算量计算性能指标：● FlOPS: floating point operations per second 计算量指标：● MACCs or MADDs: multiply accumulate operationsFLOPS和FLOPs的区别：FLOPS：注意全大写，是floating point operations per seco...
复制链接

扫一扫