过拟和处理方法.md

数据层次

增加数据量

模型层次

模型简单一些,更改输入数据类型;

Dropout

注意事项:

  1. 构建placeholder 的时候 加入 training 这个 bool 变量 ; 在训练过程中加入 drop out 比例+training的变量
  2. training的过程中,设置为True; test 的过程中,设置为False.
  3. dropout 多用于fc layer 后,不用于cnn网络之后。后者会导致效果变差。

代码实现

注意看以下参考代码的注释
参考资料: https://mofanpy.com/tutorials/machine-learning/tensorflow/dropout/

"""
Know more, visit my Python tutorial page: https://morvanzhou.github.io/tutorials/
My Youtube Channel: https://www.youtube.com/user/MorvanZhou
Dependencies:
tensorflow: 1.1.0
matplotlib
numpy
"""
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

tf.set_random_seed(1)
np.random.seed(1)

# Hyper parameters
N_SAMPLES = 20
N_HIDDEN = 300
LR = 0.01

# training data
x = np.linspace(-1, 1, N_SAMPLES)[:, np.newaxis]
y = x + 0.3*np.random.randn(N_SAMPLES)[:, np.newaxis]

# test data
test_x = x.copy()
test_y = test_x + 0.3*np.random.randn(N_SAMPLES)[:, np.newaxis]

# show data
plt.scatter(x, y, c='magenta', s=50, alpha=0.5, label='train')
plt.scatter(test_x, test_y, c='cyan', s=50, alpha=0.5, label='test')
plt.legend(loc='upper left')
plt.ylim((-2.5, 2.5))
plt.show()

# tf placeholders
tf_x = tf.placeholder(tf.float32, [None, 1])
tf_y = tf.placeholder(tf.float32, [None, 1])
tf_is_training = tf.placeholder(tf.bool, None)  # to control dropout when training and testing

# overfitting net
o1 = tf.layers.dense(tf_x, N_HIDDEN, tf.nn.relu)
o2 = tf.layers.dense(o1, N_HIDDEN, tf.nn.relu)
o_out = tf.layers.dense(o2, 1)
o_loss = tf.losses.mean_squared_error(tf_y, o_out)
o_train = tf.train.AdamOptimizer(LR).minimize(o_loss)

# dropout net
d1 = tf.layers.dense(tf_x, N_HIDDEN, tf.nn.relu)
d1 = tf.layers.dropout(d1, rate=0.5, training=tf_is_training)   # drop out 50% of inputs
d2 = tf.layers.dense(d1, N_HIDDEN, tf.nn.relu)
d2 = tf.layers.dropout(d2, rate=0.5, training=tf_is_training)   # drop out 50% of inputs
d_out = tf.layers.dense(d2, 1)
d_loss = tf.losses.mean_squared_error(tf_y, d_out)
d_train = tf.train.AdamOptimizer(LR).minimize(d_loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

plt.ion()   # something about plotting

for t in range(500):
    sess.run([o_train, d_train], {tf_x: x, tf_y: y, tf_is_training: True})  # train, set is_training=True

    if t % 10 == 0:
        # plotting
        plt.cla()
        o_loss_, d_loss_, o_out_, d_out_ = sess.run(
            [o_loss, d_loss, o_out, d_out], {tf_x: test_x, tf_y: test_y, tf_is_training: False} # test, set is_training=False
        )
        plt.scatter(x, y, c='magenta', s=50, alpha=0.3, label='train'); plt.scatter(test_x, test_y, c='cyan', s=50, alpha=0.3, label='test')
        plt.plot(test_x, o_out_, 'r-', lw=3, label='overfitting'); plt.plot(test_x, d_out_, 'b--', lw=3, label='dropout(50%)')
        plt.text(0, -1.2, 'overfitting loss=%.4f' % o_loss_, fontdict={'size': 20, 'color':  'red'}); plt.text(0, -1.5, 'dropout loss=%.4f' % d_loss_, fontdict={'size': 20, 'color': 'blue'})
        plt.legend(loc='upper left'); plt.ylim((-2.5, 2.5)); plt.pause(0.1)

plt.ioff()
plt.show()

Batch Normalization

参考资料:

这两个资料供初始理解
https://mofanpy.com/tutorials/machine-learning/tensorflow/intro-batch-normalization/
https://mofanpy.com/tutorials/machine-learning/tensorflow/BN/
高阶理解:重视第一个和第二个回答
https://www.zhihu.com/question/38102762

使用范围

收敛速度慢,受初始解影响较大,梯度爆炸/梯度消失, 增加了泛化能力,训练更快,可以使用更高的学习率。
(1) 正常的处理图片的CNN模型都应该使用Batch Normalization。只要保证batch size较大(不低于32),并且打乱了输入样本的顺序。如果batch太小,则优先用Group Normalization替代。

(2)对于RNN等时序模型,有时候同一个batch内部的训练实例长度不一(不同长度的句子),则不同的时态下需要保存不同的统计量,无法正确使用BN层,只能使用Layer Normalization。

(3) 对于图像生成以及风格迁移类应用,使用Instance Normalization更加合适。

使用

  • 对于输入数据 和 普通的FC layer
tf.layers.batch_normalization(tf_x, training=tf_is_train)
# the momentum plays important rule. the default 0.99 is too high in this case!
x = tf.layers.batch_normalization(x, momentum=0.4, training=tf_is_train)

https://github.com/MorvanZhou/Tensorflow-Tutorial/blob/master/tutorial-contents/502_batch_normalization.py

  • 对于cnn 的 layer
    参考 resnet50 的搭建
def identity_block(input_tensor, kernel_size, filters, stage, block):
    """The identity block is the block that has no conv layer at shortcut.
    # Arguments
        input_tensor: input tensor
        kernel_size: defualt 3, the kernel size of middle conv layer at main path
        filters: list of integers, the filterss of 3 conv layer at main path
        stage: integer, current stage label, used for generating layer names
        block: 'a','b'..., current block label, used for generating layer names
    # Returns
        Output tensor for the block.
    """
    filters1, filters2, filters3 = filters
    
    if IMAGE_ORDERING == 'channels_last':
        bn_axis = 3
    else:
        bn_axis = 1

    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = Conv2D(filters1, (1, 1) , data_format=IMAGE_ORDERING , name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters2, kernel_size , data_format=IMAGE_ORDERING ,
               padding='same', name=conv_name_base + '2b')(x)
    x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters3 , (1, 1), data_format=IMAGE_ORDERING , name=conv_name_base + '2c')(x)
    x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)

    x = layers.add([x, input_tensor])
    x = Activation('relu')(x)
    return x

resnet32


import  tensorflow as tf
from    tensorflow import keras
from    tensorflow.keras import layers, Sequential



class BasicBlock(layers.Layer):

    def __init__(self, filter_num, stride=1):
        super(BasicBlock, self).__init__()

        self.conv1 = layers.Conv2D(filter_num, (3, 3), strides=stride, padding='same')
        self.bn1 = layers.BatchNormalization()
        self.relu = layers.Activation('relu')

        self.conv2 = layers.Conv2D(filter_num, (3, 3), strides=1, padding='same')
        self.bn2 = layers.BatchNormalization()

        if stride != 1:
            self.downsample = Sequential()
            self.downsample.add(layers.Conv2D(filter_num, (1, 1), strides=stride))
        else:
            self.downsample = lambda x:x



    def call(self, inputs, training=None):

        # [b, h, w, c]
        out = self.conv1(inputs)
        out = self.bn1(out,training=training)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out,training=training)

        identity = self.downsample(inputs)

        output = layers.add([out, identity])
        output = tf.nn.relu(output)

        return output


class ResNet(keras.Model):


    def __init__(self, layer_dims, num_classes=100): # [2, 2, 2, 2]
        super(ResNet, self).__init__()

        self.stem = Sequential([layers.Conv2D(64, (3, 3), strides=(1, 1)),
                                layers.BatchNormalization(),
                                layers.Activation('relu'),
                                layers.MaxPool2D(pool_size=(2, 2), strides=(1, 1), padding='same')
                                ])

        self.layer1 = self.build_resblock(64,  layer_dims[0])
        self.layer2 = self.build_resblock(128, layer_dims[1], stride=2)
        self.layer3 = self.build_resblock(256, layer_dims[2], stride=2)
        self.layer4 = self.build_resblock(512, layer_dims[3], stride=2)

        # output: [b, 512, h, w],
        self.avgpool = layers.GlobalAveragePooling2D()
        self.fc = layers.Dense(num_classes)





    def call(self, inputs, training=None):

        x = self.stem(inputs,training=training)

        x = self.layer1(x,training=training)
        x = self.layer2(x,training=training)
        x = self.layer3(x,training=training)
        x = self.layer4(x,training=training)

        # [b, c]
        x = self.avgpool(x)
        # [b, 100]
        x = self.fc(x)

        return x



    def build_resblock(self, filter_num, blocks, stride=1):

        res_blocks = Sequential()
        # may down sample
        res_blocks.add(BasicBlock(filter_num, stride))

        for _ in range(1, blocks):
            res_blocks.add(BasicBlock(filter_num, stride=1))

        return res_blocks


def resnet18():
    return ResNet([2, 2, 2, 2])


def resnet34():
    return ResNet([3, 4, 6, 3])

Weight Decay/ L2正则化

原理

作用: 权重衰减(L2正则化)可以避免模型过拟合问题。
思考: L2正则化项有让w变小的效果,但是为什么w变小可以防止过拟合呢?
原理: (1)从模型的复杂度上解释:更小的权值w,从某种意义上说,表示网络的复杂度更低,对数据的拟合更好(这个法则也叫做奥卡姆剃刀),而在实际应用中,也验证了这一点,L2正则化的效果往往好于未经正则化的效果。(2)从数学方面的解释:过拟合的时候,拟合函数的系数往往非常大,为什么?如下图所示,过拟合,就是拟合函数需要顾忌每一个点,最终形成的拟合函数波动很大。在某些很小的区间里,函数值的变化很剧烈。这就意味着函数在某些小区间里的导数值(绝对值)非常大,由于自变量值可大可小,所以只有系数足够大,才能保证导数值很大。而正则化是通过约束参数的范数使其不要太大,所以可以在一定程度上减少过拟合情况。
公式推导: 见参考链接 1,2

代码及实现方式见

核心思路: 1.创建一个正则化方法 2.将这个正则化方法应用到变量上
见参考链接 2

参考链接:

https://blog.csdn.net/program_developer/article/details/80867468
https://zhuanlan.zhihu.com/p/95883569

我自己的代码实现方式

思路,取出可训练的参数,之后计算其参数的l2 loss 加入到原来的loss中

# 原本的loss
loss = tf.losses.mean_squared_error(y, out)
# l2 loss
loss_regularization = []
for p in net.trainable_variables: # net 为按照 keras 自定义layer定义的模型
     loss_regularization.append(tf.nn.l2_loss(p)) # 加入l2_loss 
loss_regularization = tf.reduce_sum(tf.stack(loss_regularization))
loss = loss + 1e-4 * loss_regularization # 1e-4 为scale

Learning Rate 衰减

原理

在训练模型的时候,通常会遇到这种情况:我们平衡模型的训练速度和损失(loss)后选择了相对合适的学习率(learning rate),但是训练集的损失下降到一定的程度后就不在下降了,比如training loss一直在0.7和0.9之间来回震荡,不能进一步下降。如下图所示:

学习率衰减(learning rate decay) 就是一种可以平衡这两者之间矛盾的解决方案。学习率衰减的基本思想是:学习率随着训练的进行逐渐衰减。
学习率衰减基本有两种实现方法:
线性衰减。例如:每过5个epochs学习率减半。
指数衰减。例如:随着迭代轮数的增加学习率自动发生衰减,每过5个epochs将学习率乘以0.9998。具体算法如下:
decayed_learning_rate=learning_rate*decay_rate^(global_step/decay_steps)
其中decayed_learning_rate为每一轮优化时使用的学习率,learning_rate为事先设定的初始学习率,decay_rate为衰减系数,decay_steps为衰减速度。

代码实现

  • 参考链接:
    https://www.cnblogs.com/baby-lily/p/10962574.html
  • 关键参数
tf.train.exponential_decay(),
learning_rate, 初始的学习率的值

global_step, 迭代步数变量

decay_steps, 带迭代多少次进行衰减

decay_rate, 迭代decay_steps次衰减的值

staircase=False, 默认为False,为True则不衰减

tf.train.exponential_decay(initial_learning_rate, global_step=global_step, decay_steps=1000, decay_rate=0.9)表示没经过1000次的迭代,学习率变为原来的0.9。

增大批次处理样本的数量也可以起到退化学习率的作用。

  • 演示代码

思路:

引入 learning_rate = tf.train.exponential_decay(...)

以控制optimizer 的learning_rate具体数值opt = tf.train.GradientDescentOptimizer(learning_rate)

global_step作为计数器,add_global以在训练过程中作为计数器增加operator

最终实现 过程中learning rate decay的效果。


import tensorflow as tf
import numpy as np

global_step = tf.Variable(tf.constant(0))
initial_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(initial_learning_rate,
                                           global_step=global_step,
                                           decay_steps=10,
                                           decay_rate=0.5)

opt = tf.train.GradientDescentOptimizer(learning_rate)
add_global = global_step.assign_add(1)

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(learning_rate))

    for i in range(50):
        g, rate = sess.run([add_global, learning_rate])
        print(g, rate)

结果


0.1
1 0.0933033
2 0.08705506
3 0.08122524
4 0.07578582
5 0.070710674
...
46 0.004123463
47 0.0038473257
48 0.003589682
49 0.0033492916
50 0.003125

  • 结合训练代码

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics
import os, sys
#os.chdir('../')

#----------constants ---------------
R1, R2 = 0.6, 0.4 # loss for lm and llc
LR = 1e-4
ILR = 1e-3
NTrain = 5*int(1e4)

format_print_sub_real = lambda x: "{:^10}".format('%0.2f'%(x))
format_print_sub_str = lambda x: "{:^10}".format(x)
format_print = lambda x, ifReal: ''.join(list(map(format_print_sub_real, x))) if ifReal else ''.join(list(map(format_print_sub_str, x)))
#-------------------------
from db import train_batch, test_batch
from fcnet import CNNSplit

myNetwork = CNNSplit()


inputs = tf.placeholder(tf.float32, shape=[None, 2 * 5 * 240+5], name='x')
outputs = tf.placeholder(tf.float32, shape=[None, 2], name='y')
y1, y2 = outputs[:, :1], outputs[:, 1:]

out1, out2 = myNetwork(inputs)

#loss
loss1 = tf.losses.mean_squared_error(y1, out1)
loss2 = tf.losses.mean_squared_error(y2, out2)
Joint_Loss = R1 * loss1 + R2 * loss2

#optimizer
global_step = tf.Variable(tf.constant(0)) # learning rate counter
add_global = global_step.assign_add(1) # learning rate counter add operator
learning_rate = tf.train.exponential_decay(ILR,
global_step=global_step,
decay_steps=1000,
decay_rate=0.95) # learning rate decay

JL_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(Joint_Loss) # introduce learning rate
L1_op = tf.train.AdamOptimizer().minimize(loss1)
L2_op = tf.train.AdamOptimizer().minimize(loss2)

# accuracy estimator
MAPE1 = tf.reduce_mean(tf.abs((y1 - out1) / y1))
MAPE2 = tf.reduce_mean(tf.abs((y2 - out2) / y2))

# training

record = []
record_acc_best = [sys.maxsize, sys.maxsize, sys.maxsize, sys.maxsize]

with tf.Session() as session:
session.run(tf.initialize_all_variables())
for i in range(NTrain+1):
bx, by = session.run([train_batch])[0]
_, l1, l2, Jl, _ = session.run([JL_op, loss1, loss2, Joint_Loss, add_global],
feed_dict={inputs: bx, outputs: by}) # learning rate add one operator

if i % 100 == 0:
#bx, by = session.run([train_batch])[0]
mape1, mape2, l1, l2, Jl = session.run([MAPE1, MAPE2,loss1, loss2, Joint_Loss],
feed_dict={inputs: bx, outputs: by})
bx_, by_ = session.run([test_batch])[0]
mape1_, mape2_, l1_, l2_, Jl_ = session.run([MAPE1, MAPE2,loss1, loss2, Joint_Loss],
feed_dict={inputs: bx_, outputs: by_})
if mape1 <= record_acc_best[0] and mape2 <= record_acc_best[1]:
record_acc_best[0] = mape1
record_acc_best[1] = mape2
if mape1_ <= record_acc_best[2] and mape2_ <= record_acc_best[3]:
record_acc_best[2] = mape1_
record_acc_best[3] = mape2_
results = [mape1, mape2, l1, l2, Jl , mape1_, mape2_, l1_, l2_, Jl_]
record.append(np.array(results))
step_i, lri = session.run([global_step, learning_rate]) # learning rate presentation
print('step %d, %d learning rate %0.4f \n %s \n trainging results: best %0.2f, %0.2f \n %s \n testing results: best %0.2f %0.2f \n %s \n'
%(i, step_i, lri,
format_print(['mape1', 'mape2', 'l1', 'l2', 'Joint Loss'], False),
record_acc_best[0], record_acc_best[1],
format_print(results[:5], True),
record_acc_best[2], record_acc_best[3],
format_print(results[5:], True) ))
pd.DataFrame(record, columns=['mape1', 'mape2', 'l1', 'l2', 'Joint Loss', 'mape1_', 'mape2_', 'l1_', 'l2_', 'Joint Loss_']).to_csv('input/results_tfonly.csv')

earlystopping 机制

一般来说机器学习的训练次数会设置到很大,如果模型的表现没有进一步提升,那么训练可以停止了,继续训练很可能会导致过拟合keras.callbacks.EarlyStopping就是用来提前结束训练的。

代码实现

似乎需要在keras的框架下,或者是在tensorflow 里的keras 框架下。

https://blog.csdn.net/zhangpeterx/article/details/90897439

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值