CNN原理总结

curry dog

已于 2023-11-17 20:33:20 修改

阅读量443

点赞数

分类专栏：面试 c++ 3D视觉文章标签： 3d 图形学博客

于 2020-10-28 14:49:28 首次发布

本文链接：https://blog.csdn.net/weixin_39849839/article/details/109257932

版权

c++ 同时被 3 个专栏收录

31 篇文章 0 订阅

订阅专栏

面试

23 篇文章 0 订阅

订阅专栏

3D视觉

5 篇文章 0 订阅

订阅专栏

基础知识

一般来说我们会使用一种叫混合布局的思路，即如果是向量或者矩阵对标量求导，则使用分子布局为准，如果是标量对向量或者矩阵求导，则以分母布局为准。对于向量对对向量求导，有些分歧，文章中会以分子布局的雅克比矩阵为主。

标量对矩阵求导(分母布局)：
$df=\sum\limits_{i=1}^m\sum\limits_{j=1}^n\frac{\partial f}{\partial X_{ij}}dX_{ij} = tr((\frac{\partial f}{\partial \mathbf{X}})^Td\mathbf{X})$
因为矩阵乘法是行和列相乘，所以为了 $\frac{\partial f}{\partial X_{ij}}$ 和 $dX_{ij}$ 下标相同，所以这里 $\frac{\partial f}{\partial \mathbf{X}}$ 要转置。
所以求标量对某个矩阵的导数时，先把df化为某个矩阵的迹的形式。
常见公式：
$d (X + Y) = d X + d Y, d (X - Y) = d X - d Y$
$d (X Y) = (d X) Y + X (d Y)$
$d(X^T) =(dX)^T$
$d t r (X) = t r (d X)$
$\odot Y) = X\odot dY + dX \odot Y$
$\sigma(X) =\sigma'(X) \odot dX$ 逐元素标量函数运算
$d X^{-1}= -X^{-1}dXX^{-1}$
$d |X|= |X|tr(X^{-1}dX)$
$t r (x) = x$
$tr(A^T) =tr(A)$
$t r (A B) = t r (B A)$
$t r (X + Y) = t r (X) + t r (Y), t r (X - Y) = t r (X) - t r (Y)$
$tr((A\odot B)^TC)= tr(A^T(B \odot C))$
例子
$tr(\mathbf{a}^Tdexp(\mathbf{X}\mathbf{b})) = tr(\mathbf{a}^T (exp(\mathbf{X}\mathbf{b}) \odot d(\mathbf{X}\mathbf{b}))) = tr((\mathbf{a} \odot exp(\mathbf{X}\mathbf{b}) )^T d\mathbf{X}\mathbf{b}) = tr(\mathbf{b}(\mathbf{a} \odot exp(\mathbf{X}\mathbf{b}) )^T d\mathbf{X})$
标量对向量的求导法则 (混合布局)
$\frac{\partial z}{\partial \mathbf{y_1}} = (\frac{\partial \mathbf{y_n}}{\partial \mathbf{y_{n-1}}} \frac{\partial \mathbf{y_{n-1}}}{\partial \mathbf{y_{n-2}}} ...\frac{\partial \mathbf{y_2}}{\partial \mathbf{y_1}})^T\frac{\partial z}{\partial \mathbf{y_n}}$
这里的每一个yn都是列向量
标量对矩阵链式法则
$\to \frac{\partial z}{\partial X} = A^T\frac{\partial z}{\partial Y}$
$\to \frac{\partial z}{\partial X} = \frac{\partial z}{\partial Y}A^T$

证明：
$dz=tr(df(\mathbf{Y}))=tr(f'(\mathbf{Y})\odot(\mathbf{A}d\mathbf{X}))=tr((f'(\mathbf{Y})\odot\mathbf{A}d\mathbf{X})^{T}\mathbf{E})=tr(f'(\mathbf{Y})^{T}(\mathbf{A}d\mathbf{X}\odot\mathbf{E}))=tr((\mathbf{A}^{T}f'(\mathbf{Y}))^{T}d\mathbf{X})$

DNN反向传播-求导法 (混合布局)
$a^l = \sigma(z^l) = \sigma(W^la^{l-1} + b^l)$
先求最后一层误差
再递推

其中,根据向量与向量求导的性质(分子布局)，有：
$\frac{\partial z^{l+1}}{\partial z^{l}} = W^{l+1}diag(\sigma^{'}(z^l))$
参数的导数
微分法
$da^{l}=(g'(\mathbf{z}^{l})\odot d\mathbf{z}^{l})$
$dz^{l+1}=\mathbf{W}^{l+1}da^{l}=\mathbf{W}^{l+1}(g'(\mathbf{z}^{l})\odot d\mathbf{z}^{l})$

在这里插入图片描述

CNN反向传播

池化误差传递
对于误差
$\delta_k^l = \left( \begin{array}{ccc} 2& 8 \\ 4& 6 \end{array} \right)$
如果是max池化，upsample后为
$\left( \begin{array}{ccc} 0&0&0&0 \\ 0&2& 8&0 \\ 0&4&6&0 \\ 0&0&0&0 \end{array} \right)$
如果是average池化，upsample后为
$\left( \begin{array}{ccc} 2&0&0&0 \\ 0&0& 0&8 \\ 0&4&0&0 \\ 0&0&6&0 \end{array} \right)$
$\delta_k^{l-1} = (\frac{\partial a_k^{l-1}}{\partial z_k^{l-1}})^T\frac{\partial J(W,b)}{\partial a_k^{l-1}} = upsample(\delta_k^l) \odot \sigma^{'}(z_k^{l-1})$
卷积误差传递

$z^l = a^{l-1}*W^l +b$
$\delta^{l-1} = (\frac{\partial z^{l}}{\partial z^{l-1}})^T\delta^{l} = \delta^{l}*rot180(W^{l}) \odot \sigma^{'}(z^{l-1})$
如果stride不等与1，则在 $\delta^{l}$ 中填充相应的间隔：
如果pad不为0

在这里插入图片描述

求卷积层w,b的梯度
$\frac{\partial J(W,b)}{\partial W^{l}}=a^{l-1} *\delta^l$
如果stride不等与1，同样是在 $\delta^{l}$ 中填充相应的间隔
$\frac{\partial J(W,b)}{\partial b^{l}} = \sum\limits_{u,v}(\delta^l)_{u,v}$

激活函数

在这里插入图片描述

对于用于分类的softmax激活函数，对应的损失函数一般都是用对数似然函数，对于第i类的损失函数为：

$a_i^L = \frac{e^{z_i^L}}{\sum\limits_{j=1}^{n_L}e^{z_j^L}}$
$J(W,b,a^L,y) = -lna_i^L$
对于输出层参数w的梯度为：
在这里插入图片描述向量形式为：
$\frac{\partial L}{\partial b} = {a}^{L} -a^{*}$
$\frac{\partial L}{\partial W} =({a}^{L} - a^{*}) (a^{L-1})^T$
可以看出跟激活函数没有关系，避免了激活函数的影响
tanh的导数：
[tanh(x)]'=[cosh²(x)-sinh²(x)]/cosh²(x) =1-tanh²(x)

初始化

激活值饱和：根据 $\delta^{l} =(W^{l+1})^T\delta^{l+1}\odot \sigma^{'}(z^l)$ ,使用sigma函数 $\sigma(z) = \frac{1}{1+e^{-z}}$ 时，z过大或过小会使激活函数梯度为0，进而导致上一层误差梯度为0.
根据 $\frac{\partial J(W,b,x,y)}{\partial W^l} = \delta^{l}(a^{l-1})^T$ ,上一层的激活值为0(z为负无穷)会直接导致这一层的参梯度为0.
一个好的初始化要求各层的激活值a和状态值梯度 $\delta$ 的方差在正向和反向传播过程中保持不变，
Xavier初始化做了glorot假设，即激活函数是对称的，初始时状态值z落在激活函数线性区域，即梯度为1.
所以Xavier初始化只能用于tanh

而he 初始化时专门针对relu提出的。
xavier初始化：
$\mathcal{N}\left(0, \frac{2}{f_{a n_{i n}}+f_{a n_{\text {out }}}}\right)$
he初始化：
由于relu只有一半的方差能传递到下一层，所以方差为xavier的两倍

CNN实现

im2col

im2col的作用就是优化卷积运算，如何优化呢，我们先学习一下这个函数的原理。
我们假设卷积核的尺寸为22，输入图像尺寸为33.im2col做的事情就是对于卷积核每一次要处理的小窗，将其展开到新矩阵的一行（列），新矩阵的列（行）数，就是对于一副输入图像，卷积运算的次数（卷积核滑动的次数），如下图所示
在这里插入图片描述

C++ 实现

__global__ void im2col_h(const int n, const float *data_im, const int height,
                         const int width, const int kernel_h,
                         const int kernel_w, const int pad_h, const int pad_w,
                         const int stride_h, const int stride_w,
                         const int height_col, const int width_col,
                         float *data_col, int im_stride, int col_stride) {
  int index = blockIdx.x * blockDim.x + threadIdx.x;

  if (index < n) {
    const int batch_idx = blockIdx.y;
    data_im += batch_idx * im_stride; //原始图片数据 C.H.W
    data_col += batch_idx * col_stride;//展开后的数据 (C.H_k.W_k).(H_col.W_col)

    const int h_index = index / width_col;
    const int h_col = h_index % height_col;
    const int w_col = index % width_col;
    const int c_im = h_index / height_col;  
    const int c_col = c_im * kernel_h * kernel_w;
    const int h_offset = h_col * stride_h - pad_h;
    const int w_offset = w_col * stride_w - pad_w;

    // channel offset
    float *data_col_ptr = data_col;
    data_col_ptr += (c_col * height_col + h_col) * width_col + w_col;
    const float *data_im_ptr = data_im;
    data_im_ptr += (c_im * height + h_offset) * width + w_offset;

    // copy to col
    for (int i = 0; i < kernel_h; ++i) {
      for (int j = 0; j < kernel_w; ++j) {
        int h_im = h_offset + i;
        int w_im = w_offset + j;
        *data_col_ptr =
            (h_im >= 0 && w_im >= 0 && h_im < height && w_im < width)
                ? data_im_ptr[i * width + j]
                : 0;
        data_col_ptr += height_col * width_col;
      }
    }
  }
}

其中，griddim.x $\cdot$ blockdim.x的布局为
在这里插入图片描述

在这里插入图片描述

col2im

__global__ void col2im_h(const int n, const float *data_col, const int height,
                         const int width, const int channels,
                         const int kernel_h, const int kernel_w,
                         const int pad_h, const int pad_w, const int stride_h,
                         const int stride_w, const int height_col,
                         const int width_col, float *data_im,
                         const int im_stride, const int col_stride) {
  int index = blockIdx.x * blockDim.x + threadIdx.x;

  if (index < n) {
    const int batch_idx = blockIdx.y;
    data_im += batch_idx * im_stride;
    data_col += batch_idx * col_stride;

    float val = 0;
    const int w_im = index % width + pad_w;
    const int h_im = (index / width) % height + pad_h;
    const int c_im = index / (width * height);

    // compute the start and end of the col
    const int w_col_start =
        (w_im < kernel_w) ? 0 : (w_im - kernel_w) / stride_w + 1; // w_col_start 代表最靠左能包含w_in的窗口
    const int w_col_end = fminf(w_im / stride_w + 1, width_col); // w_col_end 代表最靠右能包含w_in的窗口
    const int h_col_start =
        (h_im < kernel_h) ? 0 : (h_im - kernel_h) / stride_h + 1;
    const int h_col_end = fminf(h_im / stride_h + 1, height_col);

    // copy to im
    for (int h_col = h_col_start; h_col < h_col_end; h_col += 1) {
      for (int w_col = w_col_start; w_col < w_col_end; w_col += 1) {
        int h_k = (h_im - h_col * stride_h); 
        int w_k = (w_im - w_col * stride_w); //wim在w_col所代表窗口中的位置
        int data_col_index =
            (((c_im * kernel_h + h_k) * kernel_w + w_k) * height_col + h_col) *
                width_col +
            w_col;
        val += data_col[data_col_index];
      }
    }
    data_im[index] = val;
  }
}

deformable convolution and Deformable RoI pooling

在这里插入图片描述

首先有一个额外的channel为72(对应着3X3的kernel size ，每个点会有 X 方向和 Y 方向的偏移，（x,y）组合起来就对应着一个方向向量，3*3 =9 个像素点则需要 18 个output channel, 这样的 offset 需要预测4个group,因此最后的输出channel 是72) 的传统卷积学习offset（偏移量），然后前面的input feature maps和offset共同作为deformable conv层的输入，deformable conv层操作采样点发生偏移，再进行卷积。

deconvolution 反卷积

可以看成col2im和im2col对公式 $\to \frac{\partial z}{\partial X} = A^T\frac{\partial z}{\partial Y}$ 的结果并没有影响。
在这里插入图片描述

curry dog

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
CNN原理总结

基础知识一般来说我们会使用一种叫混合布局的思路，即如果是向量或者矩阵对标量求导，则使用分子布局为准，如果是标量对向量或者矩阵求导，则以分母布局为准。对于向量对对向量求导，有些分歧，文章中会以分子布局的雅克比矩阵为主。标量对矩阵求导(分母布局)：df=∑i=1m∑j=1n∂f∂XijdXij=tr((∂f∂X)TdX)df=\sum\limits_{i=1}^m\sum\limits_{j=1}^n\frac{\partial f}{\partial X_{ij}}dX_{ij} = tr((\fr
复制链接

扫一扫