tensorflow2笔记:更快的优化器和学习率调度

本文介绍了多种优化器在深度学习中的作用,包括动量优化、Nesterov加速梯度、AdaGrad、RMSProp、Adam和Nadam。通过实例展示了它们如何影响模型训练,以及在不同数据集上的表现。此外,还探讨了学习率调度的两种策略:幂调度和指数调度,解释了它们如何调整学习率以改善模型性能。最后,提到了性能调度方法ReduceLROnPlateau,用于根据验证误差动态调整学习率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

预先导入数据

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
housing=fetch_california_housing()
scaler=StandardScaler()
x_data=scaler.fit_transform(housing.data)
x_train_full,x_test,y_train_full,y_test=train_test_split(x_data,housing.target)
x_train,x_valid,y_train,y_valid=train_test_split(x_train_full,y_train_full)

更快的优化器

动量优化

简介:动量优化非常关心先前的梯度是什么,在每次迭代时,都会从动量向量m(乘以学习率)中减去局部梯度,并通过添加该向量来更新权重
换句话说,梯度是用于加速度而不是速度。有一个超参数β,称为动量,0(高摩擦)和1(无摩擦),典型设为0.9
代码形式:optimizer=tf.keras.optimizers.SGD(learing_rate=0.001,momentum=0.9)

input_=tf.keras.layers.Input(shape=x_train.shape[1:])
hidden1=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(input_)
hidden2=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(hidden1)
concat=tf.keras.layers.Concatenate()([input_,hidden2])
output=tf.keras.layers.Dense(1)(concat)
model=tf.keras.Model(inputs=[input_],outputs=[output])
model.compile(loss=tf.keras.losses.mean_squared_error,optimizer=tf.keras.optimizers.SGD(learning_rate=0.001,momentum=0.9))
earlystop=tf.keras.callbacks.EarlyStopping(patience=5,restore_best_weights=True)
model.fit(x_train,y_train,epochs=20,validation_data=(x_valid,y_valid),callbacks=[earlystop])
model.evaluate(x_test,y_test)
Epoch 1/20
363/363 [==============================] - 4s 4ms/step - loss: 0.7202 - val_loss: 0.4987
Epoch 2/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4806 - val_loss: 0.5406
Epoch 3/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4525 - val_loss: 0.6880
Epoch 4/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4374 - val_loss: 0.8081
Epoch 5/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4272 - val_loss: 0.9074
Epoch 6/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4191 - val_loss: 1.0105
162/162 [==============================] - 0s 2ms/step - loss: 0.5232





0.5231648087501526

Nesterov加速梯度

简介:动量方向指向更精确一点的方向(比较动量优化,主要还是速度),因此使用在该方向上测得的更远的梯度而不是原始位置上的梯度会稍微准确一些
代码形式:optimizer=tf.keras.optimizers.SGD(learing_rate=0.001,momentum=0.9,nesterov=True)

input_=tf.keras.layers.Input(shape=x_train.shape[1:])
hidden1=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(input_)
hidden2=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(hidden1)
concat=tf.keras.layers.Concatenate()([input_,hidden2])
output=tf.keras.layers.Dense(1)(concat)
model=tf.keras.Model(inputs=[input_],outputs=[output])
model.compile(loss=tf.keras.losses.mean_squared_error,optimizer=tf.keras.optimizers.SGD(learning_rate=0.001,momentum=0.9,nesterov=True))
earlystop=tf.keras.callbacks.EarlyStopping(patience=5,restore_best_weights=True)
model.fit(x_train,y_train,epochs=20,validation_data=(x_valid,y_valid),callbacks=[earlystop])
model.evaluate(x_test,y_test)
Epoch 1/20
363/363 [==============================] - 2s 4ms/step - loss: 0.8130 - val_loss: 0.6193
Epoch 2/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4884 - val_loss: 0.9071
Epoch 3/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4568 - val_loss: 1.0054
Epoch 4/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4414 - val_loss: 0.8095
Epoch 5/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4278 - val_loss: 0.9270
Epoch 6/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4181 - val_loss: 0.8231
162/162 [==============================] - 0s 2ms/step - loss: 0.7288





0.7287701964378357

AdaGrad

简介:更早地纠正其方向,使它更多地指向全局最优解,该算法会降低学习率,但是对于陡峭的维度,它的执行速度比对缓慢下降的维度执行速度要快。
这称为自适应学习率。
但是在训练神经网络时,它往往会停止的太早

RMSProp

简介:改进AdaGrad,只是累加最近迭代中的梯度(而不是自训练开始以来所有的梯度)
代码形式:optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.001,rho=0.9) rho是衰减率

input_=tf.keras.layers.Input(shape=x_train.shape[1:])
hidden1=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(input_)
hidden2=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(hidden1)
concat=tf.keras.layers.Concatenate()([input_,hidden2])
output=tf.keras.layers.Dense(1)(concat)
model=tf.keras.Model(inputs=[input_],outputs=[output])
model.compile(loss=tf.keras.losses.mean_squared_error,optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.001,rho=0.9))
earlystop=tf.keras.callbacks.EarlyStopping(patience=5,restore_best_weights=True)
model.fit(x_train,y_train,epochs=20,validation_data=(x_valid,y_valid),callbacks=[earlystop])
model.evaluate(x_test,y_test)
Epoch 1/20
363/363 [==============================] - 3s 6ms/step - loss: 1.3624 - val_loss: 0.5928
Epoch 2/20
363/363 [==============================] - 2s 5ms/step - loss: 0.4587 - val_loss: 1.0408
Epoch 3/20
363/363 [==============================] - 2s 5ms/step - loss: 0.4327 - val_loss: 1.3618
Epoch 4/20
363/363 [==============================] - 2s 5ms/step - loss: 0.4210 - val_loss: 1.1774
Epoch 5/20
363/363 [==============================] - 2s 5ms/step - loss: 0.4092 - val_loss: 0.9459
Epoch 6/20
363/363 [==============================] - 2s 5ms/step - loss: 0.3976 - val_loss: 0.7450
162/162 [==============================] - 0s 2ms/step - loss: 0.6693





0.6692954301834106

Adam和Nadam

简介:Adam结合了动量优化和RMSProp,而Nadam除此之外还结合了Nesterov加速梯度
动量超参数β1,和衰减超参数β2
代码形式:optimizer=tf.keras.optimizers.Adam(learning_rate=0.001,beta_1=0.9,beta_2=0.999) 或者optimizer=‘nadam’

input_=tf.keras.layers.Input(shape=x_train.shape[1:])
hidden1=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(input_)
hidden2=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(hidden1)
concat=tf.keras.layers.Concatenate()([input_,hidden2])
output=tf.keras.layers.Dense(1)(concat)
model=tf.keras.Model(inputs=[input_],outputs=[output])
model.compile(loss=tf.keras.losses.mean_squared_error,optimizer=tf.keras.optimizers.Adam(learning_rate=0.001,beta_1=0.9,beta_2=0.999))
earlystop=tf.keras.callbacks.EarlyStopping(patience=5,restore_best_weights=True)
model.fit(x_train,y_train,epochs=20,validation_data=(x_valid,y_valid),callbacks=[earlystop])
model.evaluate(x_test,y_test)
Epoch 1/20
363/363 [==============================] - 2s 5ms/step - loss: 1.0718 - val_loss: 0.5356
Epoch 2/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4623 - val_loss: 0.6578
Epoch 3/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4236 - val_loss: 0.8912
Epoch 4/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4103 - val_loss: 0.8751
Epoch 5/20
363/363 [==============================] - 1s 4ms/step - loss: 0.4002 - val_loss: 0.7164
Epoch 6/20
363/363 [==============================] - 1s 3ms/step - loss: 0.3950 - val_loss: 0.6637
162/162 [==============================] - 0s 2ms/step - loss: 0.6390





0.6390238404273987

学习率调度

幂调度

简介:η=η0 / (1+t/s)^c s是表示经过几个步骤就削减一次学习率
代码形式:tf.keras.optimizers.SGD(learning_rate=0.01,decay=1e-4) decay是s的倒数

input_=tf.keras.layers.Input(shape=x_train.shape[1:])
hidden1=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(input_)
hidden2=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(hidden1)
concat=tf.keras.layers.Concatenate()([input_,hidden2])
output=tf.keras.layers.Dense(1)(concat)
model=tf.keras.Model(inputs=[input_],outputs=[output])
model.compile(loss=tf.keras.losses.mean_squared_error,optimizer=tf.keras.optimizers.SGD(learning_rate=0.1,decay=1.0/5))
earlystop=tf.keras.callbacks.EarlyStopping(patience=5,restore_best_weights=True)
model.fit(x_train,y_train,epochs=100,validation_data=(x_valid,y_valid),callbacks=[earlystop])
model.evaluate(x_test,y_test)
Epoch 1/100
363/363 [==============================] - 2s 4ms/step - loss: 0.7297 - val_loss: 0.5184
Epoch 2/100
363/363 [==============================] - 1s 3ms/step - loss: 0.5085 - val_loss: 0.5207
Epoch 3/100
363/363 [==============================] - 1s 3ms/step - loss: 0.5014 - val_loss: 0.5247
Epoch 4/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4976 - val_loss: 0.5301
Epoch 5/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4951 - val_loss: 0.5346
Epoch 6/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4932 - val_loss: 0.5385
162/162 [==============================] - 0s 2ms/step - loss: 0.5323





0.5323387384414673

指数调度

简介:将学习率设置为η=η0 * (0.1)^(t/s)
代码形式:需要设置第一个参数为epoch的函数,例如(注意,因为模型不会保存epoch,所以中断后再次fit,epoch会从0开始,会导致很高的学习率,从而损害模型权重)

def exponential_decay_fn(epoch):
return 0.01 * 0.1**(epoch/20)

如果加上第二个参数lr(当前轮次学习率) (模型会保存当前学习率)

def exponential_decay_fn(epoch,lr):
return lr * 0.1**(1/20)

设置好函数后,再包装,加到fit的callbacks中
lr_scheduler=tf.keras.callbacks.LearningRateScheduler(exponential_decay_fn)

def exponential_decay_fn(epoch,lr):
    return lr * 0.1**(1/20)

input_=tf.keras.layers.Input(shape=x_train.shape[1:])
hidden1=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(input_)
hidden2=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(hidden1)
concat=tf.keras.layers.Concatenate()([input_,hidden2])
output=tf.keras.layers.Dense(1)(concat)
model=tf.keras.Model(inputs=[input_],outputs=[output])
model.compile(loss=tf.keras.losses.mean_squared_error,optimizer=tf.keras.optimizers.SGD(learning_rate=0.1))
earlystop=tf.keras.callbacks.EarlyStopping(patience=5,restore_best_weights=True)
lr_scheduler=tf.keras.callbacks.LearningRateScheduler(exponential_decay_fn)
model.fit(x_train,y_train,epochs=100,validation_data=(x_valid,y_valid),callbacks=[earlystop,lr_scheduler])
model.evaluate(x_test,y_test)
Epoch 1/100
363/363 [==============================] - 2s 4ms/step - loss: 0.7013 - val_loss: 0.5680 - lr: 0.0891
Epoch 2/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4494 - val_loss: 0.8012 - lr: 0.0794
Epoch 3/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4455 - val_loss: 0.6980 - lr: 0.0708
Epoch 4/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3911 - val_loss: 0.5272 - lr: 0.0631
Epoch 5/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4049 - val_loss: 0.7173 - lr: 0.0562
Epoch 6/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3757 - val_loss: 0.4580 - lr: 0.0501
Epoch 7/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3707 - val_loss: 0.4423 - lr: 0.0447
Epoch 8/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3581 - val_loss: 0.3817 - lr: 0.0398
Epoch 9/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3564 - val_loss: 0.3636 - lr: 0.0355
Epoch 10/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3478 - val_loss: 0.3511 - lr: 0.0316
Epoch 11/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3465 - val_loss: 0.3424 - lr: 0.0282
Epoch 12/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3419 - val_loss: 0.3430 - lr: 0.0251
Epoch 13/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3406 - val_loss: 0.3661 - lr: 0.0224
Epoch 14/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3387 - val_loss: 0.3295 - lr: 0.0200
Epoch 15/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3362 - val_loss: 0.3353 - lr: 0.0178
Epoch 16/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3345 - val_loss: 0.3349 - lr: 0.0158
Epoch 17/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3322 - val_loss: 0.3305 - lr: 0.0141
Epoch 18/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3302 - val_loss: 0.3289 - lr: 0.0126
Epoch 19/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3298 - val_loss: 0.3300 - lr: 0.0112
Epoch 20/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3285 - val_loss: 0.3325 - lr: 0.0100
Epoch 21/100
363/363 [==============================] - 1s 4ms/step - loss: 0.3278 - val_loss: 0.3313 - lr: 0.0089
Epoch 22/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3265 - val_loss: 0.3277 - lr: 0.0079
Epoch 23/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3254 - val_loss: 0.3280 - lr: 0.0071
Epoch 24/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3249 - val_loss: 0.3301 - lr: 0.0063
Epoch 25/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3244 - val_loss: 0.3331 - lr: 0.0056
Epoch 26/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3238 - val_loss: 0.3294 - lr: 0.0050
Epoch 27/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3232 - val_loss: 0.3353 - lr: 0.0045
162/162 [==============================] - 0s 2ms/step - loss: 0.3525





0.352518230676651

性能调度

简介:每N步测量一次验证误差(就像提前停止一样),当误差停止下降时,将学习率降低λ倍
代码形式:lr_scheduler=tf.keras.callbacks.ReduceLROnPlateau(patience=5,factor=0.5),当误差停止时,学习率乘以factor
注意:如果与EarlyStopping合用,应该让EarlyStopping的patience参数大于这里的patience

input_=tf.keras.layers.Input(shape=x_train.shape[1:])
hidden1=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(input_)
hidden2=tf.keras.layers.Dense(30,activation='elu',kernel_initializer='he_normal')(hidden1)
concat=tf.keras.layers.Concatenate()([input_,hidden2])
output=tf.keras.layers.Dense(1)(concat)
model=tf.keras.Model(inputs=[input_],outputs=[output])
model.compile(loss=tf.keras.losses.mean_squared_error,optimizer=tf.keras.optimizers.SGD(learning_rate=0.01))
earlystop=tf.keras.callbacks.EarlyStopping(patience=10,restore_best_weights=True)
lr_scheduler=tf.keras.callbacks.ReduceLROnPlateau(patience=5,factor=0.5)
model.fit(x_train,y_train,epochs=100,validation_data=(x_valid,y_valid),callbacks=[earlystop,lr_scheduler])
model.evaluate(x_test,y_test)
Epoch 1/100
363/363 [==============================] - 2s 4ms/step - loss: 0.6987 - val_loss: 0.5213 - lr: 0.0100
Epoch 2/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4792 - val_loss: 0.4761 - lr: 0.0100
Epoch 3/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4501 - val_loss: 0.7067 - lr: 0.0100
Epoch 4/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4379 - val_loss: 0.4819 - lr: 0.0100
Epoch 5/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4179 - val_loss: 0.4650 - lr: 0.0100
Epoch 6/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4107 - val_loss: 0.4540 - lr: 0.0100
Epoch 7/100
363/363 [==============================] - 1s 3ms/step - loss: 0.4021 - val_loss: 0.4185 - lr: 0.0100
Epoch 8/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3979 - val_loss: 0.4117 - lr: 0.0100
Epoch 9/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3901 - val_loss: 0.4172 - lr: 0.0100
Epoch 10/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3876 - val_loss: 0.4162 - lr: 0.0100
Epoch 11/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3829 - val_loss: 0.4365 - lr: 0.0100
Epoch 12/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3787 - val_loss: 0.4338 - lr: 0.0100
Epoch 13/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3780 - val_loss: 0.4382 - lr: 0.0100
Epoch 14/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3674 - val_loss: 0.4577 - lr: 0.0050
Epoch 15/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3639 - val_loss: 0.4702 - lr: 0.0050
Epoch 16/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3639 - val_loss: 0.4746 - lr: 0.0050
Epoch 17/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3620 - val_loss: 0.4996 - lr: 0.0050
Epoch 18/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3604 - val_loss: 0.5001 - lr: 0.0050
162/162 [==============================] - 0s 2ms/step - loss: 0.4385





0.4384787380695343

另一种学习率调度实现,除了以上函数包装成callbacks以外

通过tf.keras.optimizers.schedules中的类实现,加到compile中的learning_rate参数

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

起名大废废

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值