一、交叉验证(判断是否过拟合)
在训练集的数据中拿出部分(可以是10%)的数据作为validation数据,用来测试调整训练模型和参数。
(1)自主实现
import tensorflow as tf
import os
from tensorflow import keras
from tensorflow.keras import datasets,layers,optimizers,Sequential,metrics
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' #这句话是tensorflow少打印出一些信息
bach_num = 128
(x,y),(x_test,y_test) = datasets.cifar10.load_data()
def preprocess(x, y):
x = tf.cast(x, dtype=tf.float32) / 255.
y = tf.cast(y, dtype=tf.int32)
return x,y
for epoch in range(500):
idx = tf.range(6000)
idx = tf.random.shuffle(idx)
x_train,y_train = tf.gather(x,idx[:50000]),tf.gather(y,idx[:50000])
x_val,y_val = tf.gather(x,idx[-10000:]),tf.gather(y,idx[-10000])
db_train = tf.data.Dataset.from_tensor_slices((x_train,y_train))
db_train = db_train.map(preprocess).shuffle(50000).batch(bach_num)
db_val = tf.data.Dataset.from_tensor_slices((x_val,y_val))
db_val = db_val.map(preprocess).shuffle(10000).batch(bach_num)
(2)tf里的动态切割
network.fit(db_train_val,epochs=6,validation_split=0.1,validation_freq=2)
二、防止过拟合的方法
(1)通过约束:
约束使weigth-decay,降低w的复杂度,通过用低纬度+某个值代替高纬度
regularization的分类
L1-regularization**
一个是一维的
L2-regularization
一个是二维的
regularization的实现
在model定义的时候增加
l2_model = keras.model.Sequential([
keras.layers.Dense(16,kernel_regularizer=keras.regularizers,l2(0.001),activation=tf.nn.relu,input_shape=(NUM_WORDS,)) #表示对w进行l2约束
,keras.layers.Dense(16,kernel_regularizer = keras.regularizers,l2(0.001),activation = tf.nn.relu) #表示对w进行l2约束
,keras.layers.Dense(1,activation=tf.nn.sigmoid)
])
(2)通过动量加快收敛
通过设置优化器的动量,促进函数的收敛
# import lib
import keras
from tensorflow.keras import datasets, layers, optimizers
#momentum为动量,增加收敛的速度
optimizer = keras.optimizers.SGD(learning_rate=0.02, momentum=0.9)
optimizer = keras.optimizers.RMSprop(learning_rate=0.02, momentum=0.9)
#有的优化器自动设置了momentum,比如Adam优化器
optimizer = keras.optimizers.Adam(learning_rate=0.02, beta_1=0.9, beta_2=0.999)
(3)通过动态设置learning_rate促进达到最优解
import keras
from tensorflow.keras import datasets, layers, optimizers
#momentum为动量,增加收敛的速度
optimizer = keras.optimizers.SGD(learning_rate=0.02, momentum=0.9)
#动态设置学习率
for epoch in range(100):
# change learning rate
optimizer.learning_rate = 0.2*(100-epoch)/100
#方法二:学习曲率呈楼梯状下降
# if epoch < 50:
# optimizer.learning_rate = 0.1
# if epoch > 50:
# optimizer.learning_rate = 0.01
(4)在程序出现过拟合时提前结束运行:early stoping
当程序表现为过拟合,可能过拟合一小段时间,结束运行,将当前表现最好的模型保留下来
(5)降低w的复杂度:dropout
这里降低w的复杂度是通过断掉上一层与下一层连接的部分线
相当于减少了一些噪声点
通过增加dropout层
import keras
import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers,Sequential
model = Sequential([
layers.Dense(256, activation='relu') # [b, 784] => [b, 256]
,layers.Dropout(0.5) #每一层与下一层连的线有百分之五十的可能性会断掉,如果设置为0.2,说明有0.2的可能会被drop掉
,layers.Dense(128, activation=tf.nn.relu)# [b, 256] => [b, 128]
,layers.Dropout(0.5)
,layers.Dense(64, activation=tf.nn.relu)# [b, 128] => [b, 64]
,layers.Dense(32, activation=tf.nn.relu) # [b, 64] => [b, 32]
,layers.Dense(10) # [b, 32] => [b, 10], 330 = 32*10 + 10
])
dropout在训练层和测试层设置不同
#dropout在训练层和测试层设置不同
for step,(x,y) in enumerate(db):
with tf.GradientTape() as tape:
x = tf.reshape(x,(-1,28*28))
out = network(x,training = True)
#test
out = network(x,training = False)