我的机器学习支线「模型复杂度」

模型复杂度

模型复杂度通常是指前向过程的计算量(反映模型所需要的计算时间)和参数个数(反映模型所需要的计算机内存空间)

时间复杂度

用于评价模型运行效率高低,通常意味着模型运行速度

  • 计算复杂度使用浮点运算数 FLOPs

  • 另外并行性也会影响模型运行速度,可使用最大顺序操作数 Minimum number of sequential operations 和吞吐量 Throughput (image/s) 以及推理时间 Inference time (bacth/ms) 衡量

    其中吞吐量与推理时间不仅仅与模型有关,还与硬件性能有关

FLOPs
1. Convolution
卷积

F L O P s = ( 2 × C i n p u t ⋅ S f i l t e r h ⋅ S f i l t e r w − 1 ) ∗ ⋅ C o u t p u t ⋅ S i n p u t h ⋅ S i n p u t w e . g . C i n p u t = 3 C o u t p u t = 4 S f i l t e r h = S f i l t e r w = 3 S i n p u t h = S i n p u t w = 6 F L O P s = ( 2 × 3 × 3 2 − 1 ) × 4 × 6 2 = 7632 FLOPs=(2\times C_{input}\cdot S_{filter_h}\cdot S_{filter_w}-1)^*\cdot C_{output}\cdot S_{input_h}\cdot S_{input_w}\\ \begin{aligned}\\ e.g.\quad &C_{input}=3\quad C_{output}=4\quad S_{filter_h}=S_{filter_w}=3\quad S_{input_h}=S_{input_w}=6\\ &FLOPs=(2\times3\times3^2-1)\times4\times6^2=7632 \end{aligned} FLOPs=(2×CinputSfilterhSfilterw1)CoutputSinputhSinputwe.g.Cinput=3Coutput=4Sfilterh=Sfilterw=3Sinputh=Sinputw=6FLOPs=(2×3×321)×4×62=7632
* 卷积有偏置则不需要 -1

2. Attention
image-20220122113102360

F L O P s = { 2 D k N D x    +    2 D k N 2    +    1 3 D 2 N    +    2 D N 2    +    1 i f D x = D k = D v = O u r   D m o d e l = D FLOPs=\begin{cases} 2D_kND_x\;+\;2D_kN^2\;+\;1\\ 3D^2N\;+\;2DN^2\;+\;1\quad if\quad D_x=D_k=D_v=Our\,D_{model}=D \end{cases} FLOPs={2DkNDx+2DkN2+13D2N+2DN2+1ifDx=Dk=Dv=OurDmodel=D

3. Fully connected

假设全连接包括输入层隐含层输出层三层,输入层包含 N 批次 D 个神经元,隐含层包含 N 批次 4D 个神经元,输出层进行非线性激活

F L O P s    =    ( D + D − 1 ) ∗ ⋅ 4 D ⋅ N =    8 D 2 N − 4 D N \begin{aligned}\\ FLOPs\;&=\;(D+D-1)^*\cdot 4D\cdot N\\ &=\;8D^2N-4DN \end{aligned} FLOPs=(D+D1)4DN=8D2N4DN
* 全连接有偏置则不需要 -1

空间复杂度

用于评价模型占用空间大小,通常意味着模型能否运行

  • 参数量 Parameters
  • 数据位数 Data bits
Parameters

P a r a m e t e r s = V o l u m e ( T e n s o r W e i g h t ) Parameters=Volume(Tensor_{Weight}) Parameters=Volume(TensorWeight)

Data bits

F l o a t 32 o r F l o a t 64 ⋯ Float32\quad or\quad Float64\quad\cdots Float32orFloat64

深度学习模型调研

0. Attention Is All You Need

Per-layer complexity, minimum number of sequential operations for different layer types and maximum path length

image-20220121125955712

n n n 是 sequence length、 d d d 是 representation dimension、 k k k 是卷积核尺寸和 r r r 受限自注意力机制的领域尺寸

首次提出完全基于注意力和全联接的 Transformer 架构的自然语言处理神经网络,maximum path length O ( x ) O(x) O(x) x x x 越大代表在长距离依赖的结点传递信息时,信息交互越难,信息丢失越严重

1. Densely Connected Convolutional Networks
image-20220120211612502 image-20220120204504919

具有 BottleNeck 结构的 DenseNet- L ( k = n ) (k=n) (k=n)L 代表模型深度,即可学习的层数(卷积层与全连接层) k k k 为输入的 feature 经过一个 Dense Block 中的一个 Dense Layer 后增加的特征通道数,经过一个 Dense Block 后,紧接着的 Transition Layer 后会将当前 feature 的特征通道数压缩一半

“If a dense block contains m feature-maps, we let the following transition layer generate ⌊ θ m ⌋ ⌊ θm ⌋ θm output featuremaps, where 0 < θ ≤ 1 0 <θ ≤ 1 0<θ1 is referred to as the compression factor.”

“We refer the DenseNet with θ < 1 θ<1 θ<1 as DenseNet-C, and we set θ = 0.5 θ = 0.5 θ=0.5 in our experiment. When both the bottleneck and transition layers with θ < 1 θ < 1 θ<1 are used, we refer to our model as DenseNet-BC.”

2. Deep Residual Learning for Image Recognition
image-20220121203650386

其中 FLOPs 被误为 MACs,实际 FLOPs 应该是上述的两倍大小,L-layerL 代表可学习的层数

image-20220120205312977

加入 bottleneck 结构后网络参数量明显下降,实现了超过 1000 层的网络

3. https://github.com/sovrasov/flops-counter.pytorch

通过调用外部库 flops-counter 计算的主流卷积模型的参数量和乘加操作数,并相应给出了 Top1Top5 精度

ModelInput ResolutionParams(M)MACs(G)Acc@1Acc@5
alexnet224x22461.10.7256.43279.194
densenet121224x2247.982.8874.64692.136
densenet161224x22428.687.8277.5693.798
densenet169224x22414.153.4276.02692.992
densenet201224x22420.014.3777.15293.548
dpn107224x22486.9218.4279.74694.684
dpn131224x22479.2516.1379.43294.574
dpn68224x22412.612.3675.86892.774
dpn68b224x22412.612.3677.03493.59
dpn92224x22437.676.5679.494.62
dpn98224x22461.5711.7679.22494.488
inceptionv3299x29927.165.7377.29493.454
inceptionv4299x29942.6812.3180.06294.926
resnet101224x22444.557.8577.43893.672
resnet152224x22460.1911.5878.42894.11
resnet18224x22411.691.8270.14289.274
resnet34224x22421.83.6873.55491.456
resnet50224x22425.564.1276.00292.98
se_resnet101224x22449.337.6378.39694.258
se_resnet152224x22466.8211.3778.65894.374
se_resnet50224x22428.093.977.63693.752
vgg11224x224132.867.6368.9788.746
vgg13224x224133.0511.3469.66289.264
vgg16224x224138.3615.571.63690.354
vgg19224x224143.6719.6772.0890.822
4. AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
image-20220122152835106 image-20220122154046936

VIT 完全基于注意力机制和全连接的视觉神经网络

5. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
image-20220122152002688 image-20220122153651890

Swin 完全基于具有滑动窗口的注意力机制和全连接的视觉神经网络

散射成像领域的模型对比

以下的计算 Batch 统一设置为 2

1. Deep speckle correlation: a deep learning approachtoward scalable imaging through scattering media
image-20220122144343659
  • Input Resolution : 256 × 256 256\times 256 256×256

  • Parameters : 21.8505 × 1 0 6 21.8505\times 10^6 21.8505×106

  • FLOPs : 0.0577 × 1 0 9 0.0577\times 10^9 0.0577×109

  • Throughput : 8.9   i m a g e / s 8.9\,image/s 8.9image/s

  • Inference time : 223.2022   b a t c h / m s 223.2022\,batch/ms 223.2022batch/ms

2. High-generalization deep sparse pattern reconstruction: feature extraction of speckles using self-attention armed convolutional neural networks
image-20220122144525611
SA-CNN
  • Input Resolution : 256 × 256 256\times 256 256×256
  • Parameters : 13.9231 × 1 0 6 13.9231\times 10^6 13.9231×106
  • FLOPs : 17.4204 × 1 0 9 17.4204\times 10^9 17.4204×109
  • Throughput : 40.8   i m a g e / s 40.8\,image/s 40.8image/s
  • Inference time : 49.0446   b a t c h / m s 49.0446\,batch/ms 49.0446batch/ms
SA-CNN-Single
  • Input Resolution : 256 × 256 256\times 256 256×256
  • Parameters : 13.5972 × 1 0 6 13.5972\times 10^6 13.5972×106
  • FLOPs : 8.9002 × 1 0 9 8.9002\times 10^9 8.9002×109
  • Throughput : 44.4   i m a g e / s 44.4\,image/s 44.4image/s
  • Inference time : 45.0413   b a t c h / m s 45.0413\,batch/ms 45.0413batch/ms

其中 -Single 是 仅有中间一层注意力

3. Our SpT UNet
Xnip2022-01-22_15-06-45
SpT UNet
  • Input Resolution : 200 × 200 224 × 224 256 × 256 200\times 200\quad 224\times 224\quad 256\times 256 200×200224×224256×256
  • Parameters : 6.6184 × 1 0 6 6.6184\times 10^6 6.6184×106
  • FLOPs : 19.3602 × 1 0 9 24.2856 × 1 0 9 31.7197 × 1 0 9 19.3602\times 10^9\quad 24.2856\times 10^9\quad 31.7197\times 10^9 19.3602×10924.2856×10931.7197×109
  • Throughput : 86.9   i m a g e / s 83.3   i m a g e / s 62.5   i m a g e / s 86.9\,image/s\quad 83.3\,image/s\quad 62.5\,image/s\quad 86.9image/s83.3image/s62.5image/s
  • Inference time : 23.0214   b a t c h / m s 24.0215   b a t c h / m s 31.3427   b a t c h / m s 23.0214\,batch/ms\quad 24.0215\,batch/ms\quad 31.3427\,batch/ms 23.0214batch/ms24.0215batch/ms31.3427batch/ms
SpT UNet-B
  • Input Resolution : 200 × 200 224 × 224 256 × 256 200\times 200\quad 224\times 224\quad 256\times 256 200×200224×224256×256
  • Parameters : 2.4179 × 1 0 6 2.4179\times 10^6 2.4179×106
  • FLOPs : 8.2659 × 1 0 9 16.2256 × 1 0 9 21.2318 × 1 0 9 8.2659\times 10^9\quad 16.2256\times 10^9\quad 21.2318\times 10^9 8.2659×10916.2256×10921.2318×109
  • Throughput : 105.2   i m a g e / s 95.2   i m a g e / s 72.9   i m a g e / s 105.2\,image/s\quad 95.2\,image/s\quad 72.9\,image/s\quad 105.2image/s95.2image/s72.9image/s
  • Inference time : 19.0217   b a t c h / m s 21.0189   b a t c h / m s 27.4584   b a t c h / m s 19.0217\,batch/ms\quad 21.0189\,batch/ms\quad 27.4584\,batch/ms 19.0217batch/ms21.0189batch/ms27.4584batch/ms

其中 -Bpuffed 下采样和 leaky 上采样采用 Bottleneck 结构

  • 0
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 6
    评论
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

昊大侠

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值