深度学习-------参数初始化

1、pre-training

2、random initialization

3、Xavier initialization

因为Xavier的推导过程是基于2个假设的,其中一个是激活函数是线性的。这并不适用于ReLU激活函数。另一个是激活值关于0对称,这个不适用于sigmoid函数和ReLU函数,实际情况是有可能在sigmoid函数上获得较好的效果。

方法来源论文《Understanding the difficulty of training deep feedforward neural networks》

4、He initialization

Xavier初始化的变种,适用于Relu,He initialization的思想是:在ReLU网络中,假定每一层有一半的神经元被激活,另一半为0,所以,要保持variance不变,只需要在Xavier的基础上再除以2,这个方法试用 RELU(without BN) 激活函数时,最好选用 He 初始化方法。这个初始化方法也主要是为relu激活函数设计的,

论文:Kaiming He et al., Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classfication

5、Batch Normalization

目的:使得每次输入参数分布相同

方式:通过BN公式计算

keras代码测试

import numpy as np

import matplotlib.pyplot as plt

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D,BatchNormalization,PReLU
from keras.layers import Conv2D, MaxPool2D, AveragePooling2D, Activation, Embedding
import keras.backend as K
from keras.callbacks import LearningRateScheduler
from keras.utils import np_utils
from keras.callbacks import LearningRateScheduler, ModelCheckpoint, TensorBoard

from keras.utils import plot_model
from keras.preprocessing.image import ImageDataGenerator

from keras.models import Model, Input
from keras.optimizers import SGD
from keras.applications.resnet50 import ResNet50
import tensorflow as tf

def ObtainLayerOutput(input_model,input_layer_name,input_data):
    target_layer = Model(inputs=input_model.input, outputs=input_model.get_layer(input_layer_name).output)
    layer_output = target_layer.predict(input_data)
    return layer_output

def ObtainLayerWeightsAndBias(model,input_layer_name):
    # 获得某一层的权重和偏置
    weights = model.get_layer(input_layer_name).get_weights()
    return weights

def My_Initializer():
    m_Zeros = keras.initializers.Zeros()

    m_Ones = keras.initializers.Ones()

    m_Constant = keras.initializers.Constant(value=1.1)

    m_RandomNormal = keras.initializers.RandomNormal(mean=0, stddev=2.0, seed=0)

    return m_RandomNormal

def func_model():
    # 定义一个8-16-2的感知器
    IN = keras.layers.Input(shape=(3,))

    m_kerrnel_initial=My_Initializer()
    m_beta_initializer = keras.initializers.Constant(value=0.1)
    m_gamma_initializer = keras.initializers.Constant(value=0.2)
    m_moving_mean_initializer = keras.initializers.Constant(value=0.3)
    m_moving_variance_initializer = keras.initializers.Constant(value=0.4)


    m_bn = BatchNormalization(beta_initializer=m_beta_initializer,
                              gamma_initializer=m_gamma_initializer,
                              moving_mean_initializer=m_moving_mean_initializer,
                              moving_variance_initializer=m_moving_variance_initializer)(IN)
    m_bn = PReLU()(m_bn)
    HIDDEN = keras.layers.Dense(5,use_bias=False, activation='relu',
                                kernel_initializer=m_kerrnel_initial,
                                bias_initializer='ones')(m_bn)
    OUT = keras.layers.Dense(2, activation='sigmoid',
                                kernel_initializer='ones',
                                bias_initializer='zeros')(HIDDEN)
    model1 = keras.models.Model(inputs=IN, outputs=OUT)
    model1.summary()

    return model1

"""
对BatchNormalization层进行测试
计算符合如下公式
output = (x - mean) / sqrt(var + epsilon) * gamma + beta`
"""
def TestBatchNormalization():
    model = func_model()
    input_data = np.ones((2, 3), dtype=np.float32)
    input_data[0][0]=1.0
    input_data[0][1]=2.0
    input_data[0][2]=3.0

    input_data[1][0] = 1.1
    input_data[1][1] = 2.1
    input_data[1][2] = 3.1

    print("input_data", input_data, np.shape(input_data))
    m_batch_normalization_1 = ObtainLayerWeightsAndBias(model, 'batch_normalization_1')
    print("m_batch_normalization_1", m_batch_normalization_1, np.shape(m_batch_normalization_1))
    m_output=ObtainLayerOutput(model,"batch_normalization_1",input_data)
    print("m_output",m_output)




if __name__ == '__main__':
    #2验证bath_normalization_1
    TestBatchNormalization()













参考文献

(1)在网络中如何应用

https://blog.csdn.net/appleml/article/details/79166695

(2)原理解析,讲解较好

https://blog.csdn.net/hjimce/article/details/50866313

(3)讲解较好

https://www.cnblogs.com/hellcat/articles/7220040.html

 

 

6、初始化为0

一般只在训练线性回归/逻辑回归模型时才使用0初始化所有参数

导致所有输出全部都一样

参考:

为什么使用

(1)https://www.cnblogs.com/lky-learning/p/10830223.html

为什么在逻辑回归中可以使用0进行初始化

(2)https://www.jianshu.com/p/02b529634868

(3)

 

 

 

 

1、初始化方法优特点:https://blog.csdn.net/mzpmzk/article/details/79839047

https://www.cnblogs.com/WayneZeng/p/9290701.html

2、使用不同激活函数时候应该使用的初始化值策略

https://blog.csdn.net/shuibuzhaodeshiren/article/details/88697890

3、各种初始化方法讲解较为清晰

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值