1、pre-training
2、random initialization
3、Xavier initialization
因为Xavier的推导过程是基于2个假设的,其中一个是激活函数是线性的。这并不适用于ReLU激活函数。另一个是激活值关于0对称,这个不适用于sigmoid函数和ReLU函数,实际情况是有可能在sigmoid函数上获得较好的效果。
方法来源论文《Understanding the difficulty of training deep feedforward neural networks》。
4、He initialization
Xavier初始化的变种,适用于Relu,He initialization的思想是:在ReLU网络中,假定每一层有一半的神经元被激活,另一半为0,所以,要保持variance不变,只需要在Xavier的基础上再除以2,这个方法试用 RELU(without BN) 激活函数时,最好选用 He 初始化方法。这个初始化方法也主要是为relu激活函数设计的,
论文:Kaiming He et al., Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classfication
5、Batch Normalization
目的:使得每次输入参数分布相同
方式:通过BN公式计算
keras代码测试
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D,BatchNormalization,PReLU
from keras.layers import Conv2D, MaxPool2D, AveragePooling2D, Activation, Embedding
import keras.backend as K
from keras.callbacks import LearningRateScheduler
from keras.utils import np_utils
from keras.callbacks import LearningRateScheduler, ModelCheckpoint, TensorBoard
from keras.utils import plot_model
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model, Input
from keras.optimizers import SGD
from keras.applications.resnet50 import ResNet50
import tensorflow as tf
def ObtainLayerOutput(input_model,input_layer_name,input_data):
target_layer = Model(inputs=input_model.input, outputs=input_model.get_layer(input_layer_name).output)
layer_output = target_layer.predict(input_data)
return layer_output
def ObtainLayerWeightsAndBias(model,input_layer_name):
# 获得某一层的权重和偏置
weights = model.get_layer(input_layer_name).get_weights()
return weights
def My_Initializer():
m_Zeros = keras.initializers.Zeros()
m_Ones = keras.initializers.Ones()
m_Constant = keras.initializers.Constant(value=1.1)
m_RandomNormal = keras.initializers.RandomNormal(mean=0, stddev=2.0, seed=0)
return m_RandomNormal
def func_model():
# 定义一个8-16-2的感知器
IN = keras.layers.Input(shape=(3,))
m_kerrnel_initial=My_Initializer()
m_beta_initializer = keras.initializers.Constant(value=0.1)
m_gamma_initializer = keras.initializers.Constant(value=0.2)
m_moving_mean_initializer = keras.initializers.Constant(value=0.3)
m_moving_variance_initializer = keras.initializers.Constant(value=0.4)
m_bn = BatchNormalization(beta_initializer=m_beta_initializer,
gamma_initializer=m_gamma_initializer,
moving_mean_initializer=m_moving_mean_initializer,
moving_variance_initializer=m_moving_variance_initializer)(IN)
m_bn = PReLU()(m_bn)
HIDDEN = keras.layers.Dense(5,use_bias=False, activation='relu',
kernel_initializer=m_kerrnel_initial,
bias_initializer='ones')(m_bn)
OUT = keras.layers.Dense(2, activation='sigmoid',
kernel_initializer='ones',
bias_initializer='zeros')(HIDDEN)
model1 = keras.models.Model(inputs=IN, outputs=OUT)
model1.summary()
return model1
"""
对BatchNormalization层进行测试
计算符合如下公式
output = (x - mean) / sqrt(var + epsilon) * gamma + beta`
"""
def TestBatchNormalization():
model = func_model()
input_data = np.ones((2, 3), dtype=np.float32)
input_data[0][0]=1.0
input_data[0][1]=2.0
input_data[0][2]=3.0
input_data[1][0] = 1.1
input_data[1][1] = 2.1
input_data[1][2] = 3.1
print("input_data", input_data, np.shape(input_data))
m_batch_normalization_1 = ObtainLayerWeightsAndBias(model, 'batch_normalization_1')
print("m_batch_normalization_1", m_batch_normalization_1, np.shape(m_batch_normalization_1))
m_output=ObtainLayerOutput(model,"batch_normalization_1",input_data)
print("m_output",m_output)
if __name__ == '__main__':
#2验证bath_normalization_1
TestBatchNormalization()
参考文献
(1)在网络中如何应用
https://blog.csdn.net/appleml/article/details/79166695
(2)原理解析,讲解较好
https://blog.csdn.net/hjimce/article/details/50866313
(3)讲解较好
https://www.cnblogs.com/hellcat/articles/7220040.html
6、初始化为0
一般只在训练线性回归/逻辑回归模型时才使用0初始化所有参数
导致所有输出全部都一样
参考:
为什么使用
(1)https://www.cnblogs.com/lky-learning/p/10830223.html
为什么在逻辑回归中可以使用0进行初始化
(2)https://www.jianshu.com/p/02b529634868
(3)
1、初始化方法优特点:https://blog.csdn.net/mzpmzk/article/details/79839047
https://www.cnblogs.com/WayneZeng/p/9290701.html
2、使用不同激活函数时候应该使用的初始化值策略
https://blog.csdn.net/shuibuzhaodeshiren/article/details/88697890
3、各种初始化方法讲解较为清晰