语义分割简介
文章目录
一.定义
首先,我们来将语义分割与其他类似的名词区别开:
- 图像分类:识别图像中的内容——知道图像是什么
- 物体识别检测:识别图像中的内容以及其位置——知道图像是什么、在哪里(通过的是**边界框**)
- 语义分割:识别图像中存在的内容以及位置(通过像素点)
也就是说,物体识别检测就是将图片进行**一部分一部分地分类**;
而对于语义分割,是将图片进行一个像素一个像素地分类。
对于语义分割,如果我们将计算机比作一个人,那么计算机语义分割的过程就相当于一个人将一张图片中的部分1识别为类1、部分2识别为类2…
如果说计算机图像分类就是对图像进行分类的话,计算机语义分割就是对像素进行分类
二.类型
1.标准的语义分割(standard semantic segmentation)
又称为全像素语义分割,就是将每个像素分类为属于对象类的过程。
2.实例感知语义分割(instance aware semantic segmentation)
又称为标准予以分割或者全像素语义分割的子类型,就是将每个像素分类为属于对象类以及该类的实体ID。
三.应用
1.土地检测——区别土地类型。
2.自动驾驶——识别驾驶过程中哪部分是人、哪部分是车等。
3.面部分割——用于区分人的类别特征(人脸识别)。
4.时尚——分类服装。
四.实现过程
其实有点类似于GAN,首先是特征提取过程,就是通过卷积神经网络将一张图片变成特征图(类似于GAN的discriminator), 然后通过反卷积将一张张特征图复现为原图(类似于GAN的generator),如果是GAN是G+D,那么语义分割就相当于**“D+G”**,而且不存在fake_data和real_data的说法。
五.代码解读
encoder(编码器部分)
from keras.models import *
from keras.layers import *
import keras.backend as K
import keras
IMAGE_ORDERING = 'channels_last'
def relu6(x):
"""Rectified linear unit.
With default values, it returns element-wise `max(x, 0)`.
# Arguments
x: A tensor or variable.
alpha: A scalar(标量), slope of negative section (default=`0.`).(负载面坡度,就是relu函数的右边部分)
max_value: Saturation threshold.(饱和阈值)
# Returns
A tensor.
"""
return K.relu(x, max_value=6)
# convblock就是Resnet中的,有残差的结构,残差结构的作用就是减少了训练过程中的信息损失
def _conv_block(inputs, filters, alpha, kernel=(3, 3), strides=(1, 1)):
channel_axis = 1 if IMAGE_ORDERING == 'channels_first' else -1
filters = int(filters * alpha)
"""
Zero-padding layer for 2D input (e.g. picture).
This layer can add rows and columns of zeros
at the top, bottom, left and right side of an image tensor.
"""
# convblock通过zeropadding来增加维度:
# 根据zeropdding的函数文档我们知道,它用
# 于2D数据,例如图片,可以在图片的上下左右
# 增加0矩阵,以此来让一个图片”变胖“。
# zeropadding-Conv2D-BN
x = ZeroPadding2D(padding=(1, 1), name='conv1_pad', data_format=IMAGE_ORDERING)(inputs)
x = Conv2D(filters, kernel, data_format=IMAGE_ORDERING, padding='valid', use_bias=False, strides=strides,name='conv1')(x)
# BN在深度神经网络训练过程中使得每一层神经网络的输入保持相同分布的
x = BatchNormalization(axis=channel_axis, name='conv1_bn')(x)
return Activation(relu6, name='conv1_relu')(x)
def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha,depth_multiplier=1, strides=(1, 1), block_id=1):
channel_axis = 1 if IMAGE_ORDERING == 'channels_first' else -1
pointwise_conv_filters = int(pointwise_conv_filters * alpha)
x = ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING, name='conv_pad_%d' % block_id)(inputs)
x = DepthwiseConv2D((3, 3), data_format=IMAGE_ORDERING, padding='valid', depth_multiplier=depth_multiplier, strides=strides, use_bias=False, name='conv_dw_%d' % block_id)(x)
x = BatchNormalization(axis=channel_axis, name='conv_dw_%d_bn' % block_id)(x)
x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)
x = Conv2D(pointwise_conv_filters, (1, 1), data_format=IMAGE_ORDERING,
padding='same',
use_bias=False,
strides=(1, 1),
name='conv_pw_%d' % block_id)(x)
x = BatchNormalization(axis=channel_axis, name='conv_pw_%d_bn' % block_id)(x)
return Activation(relu6, name='conv_pw_%d_relu' % block_id)(x)
# 前面的搭建的块其实都是在为接下来搭建moblienet做准备,
# 这里的get_mobilenet_encoder就是结构图中的主干部分。
def get_mobilenet_encoder( input_height=224 , input_width=224 , pretrained='imagenet' ):
alpha = 1.0
depth_multiplier = 1
dropout = 1e-3
img_input = Input(shape=(input_height,input_width , 3 ))
x = _conv_block(img_input, 32, alpha, strides=(2, 2))
x = _depthwise_conv_block(x, 64, alpha, depth_multiplier, block_id=1)
f1 = x
x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, strides=(2, 2), block_id=2)
x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, block_id=3)
f2 = x
x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, strides=(2, 2), block_id=4)
x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, block_id=5)
f3 = x
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, strides=(2, 2), block_id=6)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=7)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=8)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=9)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=10)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=11)
f4 = x
x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, strides=(2, 2), block_id=12)
x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13)
f5 = x
# 返回f1, f2 ,f3, f4, f5的作用就是为decoder过程提供更多的数据选择
return img_input, [f1, f2, f3, f4, f5]
# 至此,编码部分已经完成,我们的一张张原图片成为了为decoder能生成”分类图“而准备的一张张特征图
decoder(解码器部分)
from keras.models import *
from keras.layers import *
from encoder import get_mobilenet_encoder
IMAGE_ORDERING = 'channels_last'
# assert 语句的作用是:当条件表达
# 式的值为真时,该语句什么也不做,
# 程序正常运行;反之,若条件表达式
# 的值为假,则 assert 会抛出
# AssertionError 异常。
def segnet_decoder(f, n_classes, n_up=3):
assert n_up >= 2
o = f
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(512, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
# 进行一次UpSampling2D,此时hw变为原来的1/8
# 52,52,512
"""Upsampling layer for 2D inputs.
Repeats the rows and columns of the data
by size[0] and size[1] respectively.
"""
# 上采样函数的作用类似于zeropadding,
# 作用也是让一张张图片变”胖“,起到放大
# 图片的作用。常见的形式:预定义插值式、
# 反卷积、Sub-layer。
o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(256, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
# 进行一次UpSampling2D,此时hw变为原来的1/4
# 104,104,256
for _ in range(n_up - 2):
o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(128, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
# 进行一次UpSampling2D,此时hw变为原来的1/2
# 208,208,128
o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(64, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
# 此时输出为h_input/2,w_input/2,nclasses
o = Conv2D(n_classes, (3, 3), padding='same', data_format=IMAGE_ORDERING)(o)
return o
def _segnet(n_classes, encoder, input_height=416, input_width=608, encoder_level=3):
# encoder通过主干网络
img_input, levels = encoder(input_height=input_height, input_width=input_width)
# 获取hw压缩四次后的结果
feat = levels[encoder_level]
# 将特征传入segnet网络
o = segnet_decoder(feat, n_classes, n_up=3)
# 将结果进行reshape
o = Reshape((int(input_height / 2) * int(input_width / 2), -1))(o)
o = Softmax()(o)
model = Model(img_input, o)
return model
def mobilenet_segnet(n_classes, input_height=224, input_width=224, encoder_level=3):
model = _segnet(n_classes, get_mobilenet_encoder, input_height=input_height, input_width=input_width,
encoder_level=encoder_level)
model.model_name = "mobilenet_segnet"
return model
# 以上定义的三个函数彼此所属的关系,第一个函数构建模型
# 的基本框架,用于第二个函数;第二个函数对结果进行一定
# 处理;第三个函数对模型名字进行一个定义。主干为第一个
# 函数。
main
from decoder import mobilenet_segnet
model = mobilenet_segnet(2, input_height=416, input_width=416)
model.summary()
summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 416, 416, 3) 0
_________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 418, 418, 3) 0
_________________________________________________________________
conv1 (Conv2D) (None, 208, 208, 32) 864
_________________________________________________________________
conv1_bn (BatchNormalization (None, 208, 208, 32) 128
_________________________________________________________________
conv1_relu (Activation) (None, 208, 208, 32) 0
_________________________________________________________________
conv_pad_1 (ZeroPadding2D) (None, 210, 210, 32) 0
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D) (None, 208, 208, 32) 288
_________________________________________________________________
conv_dw_1_bn (BatchNormaliza (None, 208, 208, 32) 128
_________________________________________________________________
conv_dw_1_relu (Activation) (None, 208, 208, 32) 0
_________________________________________________________________
conv_pw_1 (Conv2D) (None, 208, 208, 64) 2048
_________________________________________________________________
conv_pw_1_bn (BatchNormaliza (None, 208, 208, 64) 256
_________________________________________________________________
conv_pw_1_relu (Activation) (None, 208, 208, 64) 0
_________________________________________________________________
conv_pad_2 (ZeroPadding2D) (None, 210, 210, 64) 0
_________________________________________________________________
conv_dw_2 (DepthwiseConv2D) (None, 104, 104, 64) 576
_________________________________________________________________
conv_dw_2_bn (BatchNormaliza (None, 104, 104, 64) 256
_________________________________________________________________
conv_dw_2_relu (Activation) (None, 104, 104, 64) 0
_________________________________________________________________
conv_pw_2 (Conv2D) (None, 104, 104, 128) 8192
_________________________________________________________________
conv_pw_2_bn (BatchNormaliza (None, 104, 104, 128) 512
_________________________________________________________________
conv_pw_2_relu (Activation) (None, 104, 104, 128) 0
_________________________________________________________________
conv_pad_3 (ZeroPadding2D) (None, 106, 106, 128) 0
_________________________________________________________________
conv_dw_3 (DepthwiseConv2D) (None, 104, 104, 128) 1152
_________________________________________________________________
conv_dw_3_bn (BatchNormaliza (None, 104, 104, 128) 512
_________________________________________________________________
conv_dw_3_relu (Activation) (None, 104, 104, 128) 0
_________________________________________________________________
conv_pw_3 (Conv2D) (None, 104, 104, 128) 16384
_________________________________________________________________
conv_pw_3_bn (BatchNormaliza (None, 104, 104, 128) 512
_________________________________________________________________
conv_pw_3_relu (Activation) (None, 104, 104, 128) 0
_________________________________________________________________
conv_pad_4 (ZeroPadding2D) (None, 106, 106, 128) 0
_________________________________________________________________
conv_dw_4 (DepthwiseConv2D) (None, 52, 52, 128) 1152
_________________________________________________________________
conv_dw_4_bn (BatchNormaliza (None, 52, 52, 128) 512
_________________________________________________________________
conv_dw_4_relu (Activation) (None, 52, 52, 128) 0
_________________________________________________________________
conv_pw_4 (Conv2D) (None, 52, 52, 256) 32768
_________________________________________________________________
conv_pw_4_bn (BatchNormaliza (None, 52, 52, 256) 1024
_________________________________________________________________
conv_pw_4_relu (Activation) (None, 52, 52, 256) 0
_________________________________________________________________
conv_pad_5 (ZeroPadding2D) (None, 54, 54, 256) 0
_________________________________________________________________
conv_dw_5 (DepthwiseConv2D) (None, 52, 52, 256) 2304
_________________________________________________________________
conv_dw_5_bn (BatchNormaliza (None, 52, 52, 256) 1024
_________________________________________________________________
conv_dw_5_relu (Activation) (None, 52, 52, 256) 0
_________________________________________________________________
conv_pw_5 (Conv2D) (None, 52, 52, 256) 65536
_________________________________________________________________
conv_pw_5_bn (BatchNormaliza (None, 52, 52, 256) 1024
_________________________________________________________________
conv_pw_5_relu (Activation) (None, 52, 52, 256) 0
_________________________________________________________________
conv_pad_6 (ZeroPadding2D) (None, 54, 54, 256) 0
_________________________________________________________________
conv_dw_6 (DepthwiseConv2D) (None, 26, 26, 256) 2304
_________________________________________________________________
conv_dw_6_bn (BatchNormaliza (None, 26, 26, 256) 1024
_________________________________________________________________
conv_dw_6_relu (Activation) (None, 26, 26, 256) 0
_________________________________________________________________
conv_pw_6 (Conv2D) (None, 26, 26, 512) 131072
_________________________________________________________________
conv_pw_6_bn (BatchNormaliza (None, 26, 26, 512) 2048
_________________________________________________________________
conv_pw_6_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pad_7 (ZeroPadding2D) (None, 28, 28, 512) 0
_________________________________________________________________
conv_dw_7 (DepthwiseConv2D) (None, 26, 26, 512) 4608
_________________________________________________________________
conv_dw_7_bn (BatchNormaliza (None, 26, 26, 512) 2048
_________________________________________________________________
conv_dw_7_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pw_7 (Conv2D) (None, 26, 26, 512) 262144
_________________________________________________________________
conv_pw_7_bn (BatchNormaliza (None, 26, 26, 512) 2048
_________________________________________________________________
conv_pw_7_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pad_8 (ZeroPadding2D) (None, 28, 28, 512) 0
_________________________________________________________________
conv_dw_8 (DepthwiseConv2D) (None, 26, 26, 512) 4608
_________________________________________________________________
conv_dw_8_bn (BatchNormaliza (None, 26, 26, 512) 2048
_________________________________________________________________
conv_dw_8_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pw_8 (Conv2D) (None, 26, 26, 512) 262144
_________________________________________________________________
conv_pw_8_bn (BatchNormaliza (None, 26, 26, 512) 2048
_________________________________________________________________
conv_pw_8_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pad_9 (ZeroPadding2D) (None, 28, 28, 512) 0
_________________________________________________________________
conv_dw_9 (DepthwiseConv2D) (None, 26, 26, 512) 4608
_________________________________________________________________
conv_dw_9_bn (BatchNormaliza (None, 26, 26, 512) 2048
_________________________________________________________________
conv_dw_9_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pw_9 (Conv2D) (None, 26, 26, 512) 262144
_________________________________________________________________
conv_pw_9_bn (BatchNormaliza (None, 26, 26, 512) 2048
_________________________________________________________________
conv_pw_9_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pad_10 (ZeroPadding2D) (None, 28, 28, 512) 0
_________________________________________________________________
conv_dw_10 (DepthwiseConv2D) (None, 26, 26, 512) 4608
_________________________________________________________________
conv_dw_10_bn (BatchNormaliz (None, 26, 26, 512) 2048
_________________________________________________________________
conv_dw_10_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pw_10 (Conv2D) (None, 26, 26, 512) 262144
_________________________________________________________________
conv_pw_10_bn (BatchNormaliz (None, 26, 26, 512) 2048
_________________________________________________________________
conv_pw_10_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pad_11 (ZeroPadding2D) (None, 28, 28, 512) 0
_________________________________________________________________
conv_dw_11 (DepthwiseConv2D) (None, 26, 26, 512) 4608
_________________________________________________________________
conv_dw_11_bn (BatchNormaliz (None, 26, 26, 512) 2048
_________________________________________________________________
conv_dw_11_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
conv_pw_11 (Conv2D) (None, 26, 26, 512) 262144
_________________________________________________________________
conv_pw_11_bn (BatchNormaliz (None, 26, 26, 512) 2048
_________________________________________________________________
conv_pw_11_relu (Activation) (None, 26, 26, 512) 0
_________________________________________________________________
zero_padding2d_1 (ZeroPaddin (None, 28, 28, 512) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 26, 26, 512) 2359808
_________________________________________________________________
batch_normalization_1 (Batch (None, 26, 26, 512) 2048
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 52, 52, 512) 0
_________________________________________________________________
zero_padding2d_2 (ZeroPaddin (None, 54, 54, 512) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 52, 52, 256) 1179904
_________________________________________________________________
batch_normalization_2 (Batch (None, 52, 52, 256) 1024
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 104, 104, 256) 0
_________________________________________________________________
zero_padding2d_3 (ZeroPaddin (None, 106, 106, 256) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 104, 104, 128) 295040
_________________________________________________________________
batch_normalization_3 (Batch (None, 104, 104, 128) 512
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 208, 208, 128) 0
_________________________________________________________________
zero_padding2d_4 (ZeroPaddin (None, 210, 210, 128) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 208, 208, 64) 73792
_________________________________________________________________
batch_normalization_4 (Batch (None, 208, 208, 64) 256
_________________________________________________________________
conv2d_5 (Conv2D) (None, 208, 208, 2) 1154
_________________________________________________________________
reshape_1 (Reshape) (None, 43264, 2) 0
_________________________________________________________________
softmax_1 (Softmax) (None, 43264, 2) 0
=================================================================
Total params: 5,541,378
Trainable params: 5,524,738
Non-trainable params: 16,640
_________________________________________________________________
Process finished with exit code 0
六.对一些名词的理解
1.relu以及其他激活函数
其实刚开始接触relu函数的时候,对它的理解就是——好用、好用、就用它。至于,为什么要用relu乃至所有激活函数,就算我看了许多文章,也没有一个清晰的解释,但在写了那么多次代码之后,我对其有了进一步的理解(虽然进展不大)。其实比较简单,就是如果我们将神经网络层很简单的一层一层连接起来,eg:
x1_1 = conv2D(........)(inputs)
x2_1 = maxpooling2D(......)(x1)
x1_2 = conv2D(........)(x2_1)
x2_2 = maxpooling2D(......)(x1_2)
......
这就是简单的线性关系:
x
i
−
1
=
k
x
i
x_{i-1}=kx_i
xi−1=kxi
但如果,我们要对每一层进行不同的处理,然后在合并呢,就像残差结构那样(只是打个比方哈,因为能力实在有限)。如果,我们写一层,处理一层,那样可能会造成变量的覆盖,从而导致数据损失;如果有使用copy呢?你不嫌麻烦啊,所以呢我们就要引入一个函数(f(x)),对我们的结果进行直接处理
残
差
结
构
:
y
=
x
+
H
(
x
)
(
x
是
某
一
层
的
输
入
或
者
是
中
间
过
程
,
H
(
x
)
是
某
一
层
的
输
出
)
残差结构:\quad y=x+H(x)\quad(x是某一层的输入或者是中间过程,H(x)是某一层的输出)
残差结构:y=x+H(x)(x是某一层的输入或者是中间过程,H(x)是某一层的输出)
这样我就可以有多种非线性的表达函数了,同时科学家们发现这种函数和生物学上的神经激活结构研究居然碰巧一样的(relu函数就是从是生物学上搞来的),这就是激活函数的由来。
这里呢,我干脆就把所有激活函数都简单接受一遍(免得自己以后给忘了):
Sigmoid(常用于二分类问题)
f
(
x
)
=
1
1
+
e
−
x
f(x)=\frac{1}{1+e^{-x}}
f(x)=1+e−x1
优点:
-
求导简单
f ′ ( x ) = f ( x ) ( 1 − f ( x ) ) f'(x)=f(x)(1-f(x)) f′(x)=f(x)(1−f(x)) -
定义域内处处可导
-
不是伪非线性函数
f ( x ) ≈ x f(x)\approx x f(x)≈x -
饱和激活函数
lim x → ∞ f ( x ) = 1 lim x → − ∞ f ( x ) = 0 \lim_{x\to\infty} f(x)=1\\ \lim_{x\to-\infty}f(x)=0 x→∞limf(x)=1x→−∞limf(x)=0
-
函数是单调函数
缺点:
-
激活函数运算量大(包含幂的运算)
-
函数输出不关于原点对称,使得权重更新效率变低,同时这会导致后一层的神经元将得到上一层输出的非0均值的信号作为输入,随着网络的加深,会改变数据的原始分布(就需要BN来解决了)
-
由图像知道导数的取值范围[0,0.25],非常的小。在进行反向传播计算的时候就会乘上一个很小的值,如果网络层次过深,就会发生“梯度消失”的现象了,无法更新浅层网络的参数了。
f ′ ( x ) → 0 f'(x)\to0 f′(x)→0
Tanh(双曲正切函数)
f
(
x
)
=
e
x
−
e
−
x
e
x
+
e
−
x
f
′
(
x
)
=
1
−
(
f
(
x
)
)
2
f(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}\\ f'(x)=1-(f(x))^2
f(x)=ex+e−xex−e−xf′(x)=1−(f(x))2
优点:
- 饱和激活函数
- 不是”伪非线性激活函数“
- 单调函数
- 定义域为负无穷到正无穷,输出区间在(-1,1)之间
缺点:
- 运算量大
- 不能从根本上解决梯度消失问题
relu
ReLU函数代表的的是“修正线性单元”,它是带有卷积图像的输入x的最大函数(x,o)。ReLU函数将矩阵x内所有负值都设为零,其余的值不变
f
(
x
)
=
m
a
x
(
0
,
x
)
f(x)=max(0,x)
f(x)=max(0,x)
优点:
- 非线性函数(单侧线性函数)
- 运算十分简单
- 不是”伪非线性函数“
- 右侧为单调函数
缺点:
- 造成神经元的”死亡”(还没了解过)
变种——Leaky ReLU函数
f
(
x
)
=
m
a
x
(
α
x
,
x
)
f(x)=max(\alpha x,x)
f(x)=max(αx,x)
softmax
把一些输入映射为0-1之间的实数,并且归一化保证和为1
假设我们有一个数组array,V_i是V中的第i个元素,那么这个元素所对应的softmax值S_i为:
S
i
=
e
j
∑
j
e
j
S_i=\frac{e^j}{\sum_je^j}
Si=∑jejej
2.zeropadding
"""
Zero-padding layer for 2D input (e.g. picture).
This layer can add rows and columns of zeros
at the top, bottom, left and right side of an image tensor.
"""
zeropadding字面意思理解就是“0填充”,就是对要处理的图片进行填充,让图片维度增加。
3.upsampling
"""
Upsampling layer for 2D inputs.
Repeats the rows and columns of the data
by size[0] and size[1] respectively.
"""
上采样函数的作用类似于zeropadding,作用也是让一张张图片变”胖“,起到放大图片的作用。常见的形式:预定义插值式、反卷积、Sub-layer。
假设我们有一个数组array,V_i是V中的第i个元素,那么这个元素所对应的softmax值S_i为:
S
i
=
e
j
∑
j
e
j
S_i=\frac{e^j}{\sum_je^j}
Si=∑jejej