【NLP】卷积神经网络基础

在这里插入图片描述

The role of the ConvNet is to reduce the images into a form which is easier to process, without losing features which are critical for getting a good prediction.
在这里插入图片描述

我们今天就大概看三种layers:convolution Layer , pooling layer 和 fully connected layer.

1. Convolution Layer

1.1 Parameters:

Kernel Size, Stride, Padding, Input & Output Channels

- Kernel/Filter

在这里插入图片描述
如上动图:Kernel/Filter在这里是个3x3x1 matrix,每个channel选的kernel都不一样

Kernel/Filter, K = 
1  0  0
1  -1  -1
1  0  -1

卷积核大小:
卷积核通常使用奇数 3*3, 55, 77, 9*9, (为了对称)
通常小而深

- Stride

楼上kernel的Stride Length = 1 (Non-Strided),如果是stride=2的就如下图:
(蓝色的表示input,阴影是kernel,青色的表示output)
在这里插入图片描述

- Padding

还有各种带padding的示意图,看这里here

a) Same Padding

在这里插入图片描述
(图:SAME padding: 5x5x1 image is padded with 0s to create a 6x6x1 image)

When we augment the 5x5x1 image into a 6x6x1 image and then apply the 3x3x1 kernel over it, we find that the convolved matrix turns out to be of dimensions 5x5x1. Hence the name — Same Padding.

b) Valid Padding

Iif we perform the same operation without padding, we are presented with a matrix which has dimensions of the Kernel (3x3x1) itself — Valid Padding.

1.2 Do the math:

· multiply, then add up:

简单来讲,看下图,黄的里面每一个格子和对应绿色里面的格子相乘,然后相加,得到的就就是Convolved feature
在这里插入图片描述
(图:Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 convolved feature)
详细的见本页2.2 Convolution Matrix.

2. Transposed Convolutions

也叫 deconvolutions 或者 fractionally strided convolutions
对叫作deconvolution抱有怨念大有人在,如图:
在这里插入图片描述
为啥叫deconvolution,请找 Zeiler.

2.1 why use?

the need of up-sampling, 譬如从低像素到高像素,使用Transposed Convolutions就是个很好的办法。
在这里插入图片描述
参考

2.2 Intuition:

- Convolution Operation

回顾一下卷积运算:
在这里插入图片描述

input: 4x4 
stride: 1
kernel: 3x3
padding: 无
output: 2x2 

这里,9个数变1个数,卷积运算是多对一的关系。

- Going Backward

在这里插入图片描述
如果我们想:

input: 2x2 
output: 4x4 

也就是1个数变成9个数,一对多的关系。
我们首先需要弄明白Convolution Matrix和Transposed Convolution Matrix:

- Convolution Matrix

在这里插入图片描述
如下图,把 3x3 kernel变成4x16 matrix,把 4x4 input matrix变成16x1的column vector,
然后将 4x16 convolution matrix 与 16x1 input matrix (16 dimensional column vector)矩阵相乘,得到4x1 matrix
在这里插入图片描述
4x1 matrix进行变换就是2x2 matrix :
在这里插入图片描述

为啥要弄成Convolution Matrix?
With the convolution matrix, you can go from 16 (4x4) to 4 (2x2) because the convolution matrix is 4x16. Then, if you have a 16x4 matrix, you can go from 4 (2x2) to 16 (4x4).

- Transposed Convolution Matrix

我们若想:

input: 2x2 
output: 4x4 

我们要用到 16x4 matrix,这里要保证 1个数对应9个数的关系:

  • Transpose the convolution matrix C (4x16) to CT (16x4).
  • Matrix-multiply CT (16x4) with a column vector (4x1) to generate an output matrix (16x1)
  • The transposed matrix connects 1 value to 9 values in the output.
    在这里插入图片描述
    把结果变形就得到了4x4 matrix:
    在这里插入图片描述

注意:the actual weight values in the matrix does not have to come from the original convolution matrix. What’s important is that the weight layout is transposed from that of the convolution matrix. here

tensorflow

tf.nn.conv2d_transpose(
    value,
    filter, 
    output_shape,
    strides,
    padding='SAME', 
    data_format='NHWC',
    name=None
)

3. Pooling Layer

In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not change. — Page 342, Deep Learning, 2016.

说白了就是化繁为简,等于downsampling:
在这里插入图片描述

作用1: decrease the computational power required to process the data through dimensionality reduction.
作用2: useful for extracting dominant features which are rotational and positional invariant, thus maintaining the process of effectively training of the model.

两种pooling:

3.1 Max pooling

字面意思,max(),如图

3.2 Average pooling

字面意思,average(),如图
在这里插入图片描述

4. Fully Connected Layer

在这里插入图片描述

Basically, a FC layer looks at what high level features most strongly correlate to a particular class and has particular weights so that when you compute the products between the weights and the previous layer, you get the correct probabilities for the different classes.

5. CNN Architectures

参考

5.1 Classic network architectures

- LeNet-5

在这里插入图片描述

看这里:Gradient-based learning applied to document recognition

- AlexNet

在这里插入图片描述
看这里:ImageNet Classification with Deep Convolutional Neural Networks

- VGG 16

在这里插入图片描述
看这里:Very Deep Convolutional Networks for Large-Scale Image Recognition

5.2 Modern network architectures

- Inception(GoogLeNet)

在这里插入图片描述
看这里: Going deeper with convolutions

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值