keras中LearningRate选择

Time-Based Learning Rate Schedule

Keras has a time-based learning rate schedule built in.

The stochastic gradient descent optimization algorithm implementation in the SGD class has an argument called decay. This argument is used in the time-based learning rate decay schedule equation as follows:

 

1

LearningRate = LearningRate * 1/(1 + decay * epoch)

When the decay argument is zero (the default), this has no effect on the learning rate.

 

1

2

LearningRate = 0.1 * 1/(1 + 0.0 * 1)

LearningRate = 0.1

When the decay argument is specified, it will decrease the learning rate from the previous epoch by the given fixed amount.

For example, if we use the initial learning rate value of 0.1 and the decay of 0.001, the first 5 epochs will adapt the learning rate as follows:

 

1

2

3

4

5

6

Epoch Learning Rate

1 0.1

2 0.0999000999

3 0.0997006985

4 0.09940249103

5 0.09900646517

Extending this out to 100 epochs will produce the following graph of learning rate (y axis) versus epoch (x axis):

Time-Based Learning Rate Schedule

Time-Based Learning Rate Schedule

You can create a nice default schedule by setting the decay value as follows:

 

1

2

3

Decay = LearningRate / Epochs

Decay = 0.1 / 100

Decay = 0.001

The example below demonstrates using the time-based learning rate adaptation schedule in Keras.

It is demonstrated on the Ionosphere binary classification problem. This is a small dataset that you can download from the UCI Machine Learning repository. Place the data file in your working directory with the filename ionosphere.csv.

The ionosphere dataset is good for practicing with neural networks because all of the input values are small numerical values of the same scale.

A small neural network model is constructed with a single hidden layer with 34 neurons and using the rectifier activation function. The output layer has a single neuron and uses the sigmoid activation function in order to output probability-like values.

The learning rate for stochastic gradient descent has been set to a higher value of 0.1. The model is trained for 50 epochs and the decay argument has been set to 0.002, calculated as 0.1/50. Additionally, it can be a good idea to use momentum when using an adaptive learning rate. In this case we use a momentum value of 0.8.

The complete example is listed below.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

# Time Based Learning Rate Decay

from pandas import read_csv

import numpy

from keras.models import Sequential

from keras.layers import Dense

from keras.optimizers import SGD

from sklearn.preprocessing import LabelEncoder

# fix random seed for reproducibility

seed = 7

numpy.random.seed(seed)

# load dataset

dataframe = read_csv("ionosphere.csv", header=None)

dataset = dataframe.values

# split into input (X) and output (Y) variables

X = dataset[:,0:34].astype(float)

Y = dataset[:,34]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

Y = encoder.transform(Y)

# create model

model = Sequential()

model.add(Dense(34, input_dim=34, kernel_initializer='normal', activation='relu'))

model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

# Compile model

epochs = 50

learning_rate = 0.1

decay_rate = learning_rate / epochs

momentum = 0.8

sgd = SGD(lr=learning_rate, momentum=momentum, decay=decay_rate, nesterov=False)

model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

# Fit the model

model.fit(X, Y, validation_split=0.33, epochs=epochs, batch_size=28, verbose=2)

The model is trained on 67% of the dataset and evaluated using a 33% validation dataset.

Running the example shows a classification accuracy of 99.14%. This is higher than the baseline of 95.69% without the learning rate decay or momentum.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

...

Epoch 45/50

0s - loss: 0.0622 - acc: 0.9830 - val_loss: 0.0929 - val_acc: 0.9914

Epoch 46/50

0s - loss: 0.0695 - acc: 0.9830 - val_loss: 0.0693 - val_acc: 0.9828

Epoch 47/50

0s - loss: 0.0669 - acc: 0.9872 - val_loss: 0.0616 - val_acc: 0.9828

Epoch 48/50

0s - loss: 0.0632 - acc: 0.9830 - val_loss: 0.0824 - val_acc: 0.9914

Epoch 49/50

0s - loss: 0.0590 - acc: 0.9830 - val_loss: 0.0772 - val_acc: 0.9828

Epoch 50/50

0s - loss: 0.0592 - acc: 0.9872 - val_loss: 0.0639 - val_acc: 0.9828

 

Drop-Based Learning Rate Schedule

Another popular learning rate schedule used with deep learning models is to systematically drop the learning rate at specific times during training.

Often this method is implemented by dropping the learning rate by half every fixed number of epochs. For example, we may have an initial learning rate of 0.1 and drop it by 0.5 every 10 epochs. The first 10 epochs of training would use a value of 0.1, in the next 10 epochs a learning rate of 0.05 would be used, and so on.

If we plot out the learning rates for this example out to 100 epochs you get the graph below showing learning rate (y axis) versus epoch (x axis).

Drop Based Learning Rate Schedule

Drop Based Learning Rate Schedule

We can implement this in Keras using a the LearningRateScheduler callback when fitting the model.

The LearningRateScheduler callback allows us to define a function to call that takes the epoch number as an argument and returns the learning rate to use in stochastic gradient descent. When used, the learning rate specified by stochastic gradient descent is ignored.

In the code below, we use the same example before of a single hidden layer network on the Ionosphere dataset. A new step_decay() function is defined that implements the equation:

 

1

LearningRate = InitialLearningRate * DropRate^floor(Epoch / EpochDrop)

Where InitialLearningRate is the initial learning rate such as 0.1, the DropRate is the amount that the learning rate is modified each time it is changed such as 0.5, Epoch is the current epoch number and EpochDrop is how often to change the learning rate such as 10.

Notice that we set the learning rate in the SGD class to 0 to clearly indicate that it is not used. Nevertheless, you can set a momentum term in SGD if you want to use momentum with this learning rate schedule.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

# Drop-Based Learning Rate Decay

import pandas

from pandas import read_csv

import numpy

import math

from keras.models import Sequential

from keras.layers import Dense

from keras.optimizers import SGD

from sklearn.preprocessing import LabelEncoder

from keras.callbacks import LearningRateScheduler

 

# learning rate schedule

def step_decay(epoch):

initial_lrate = 0.1

drop = 0.5

epochs_drop = 10.0

lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))

return lrate

 

# fix random seed for reproducibility

seed = 7

numpy.random.seed(seed)

# load dataset

dataframe = read_csv("ionosphere.csv", header=None)

dataset = dataframe.values

# split into input (X) and output (Y) variables

X = dataset[:,0:34].astype(float)

Y = dataset[:,34]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

Y = encoder.transform(Y)

# create model

model = Sequential()

model.add(Dense(34, input_dim=34, kernel_initializer='normal', activation='relu'))

model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

# Compile model

sgd = SGD(lr=0.0, momentum=0.9, decay=0.0, nesterov=False)

model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

# learning schedule callback

lrate = LearningRateScheduler(step_decay)

callbacks_list = [lrate]

# Fit the model

model.fit(X, Y, validation_split=0.33, epochs=50, batch_size=28, callbacks=callbacks_list, verbose=2)

Running the example results in a classification accuracy of 99.14% on the validation dataset, again an improvement over the baseline for the model on the problem.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

...

Epoch 45/50

0s - loss: 0.0546 - acc: 0.9830 - val_loss: 0.0634 - val_acc: 0.9914

Epoch 46/50

0s - loss: 0.0544 - acc: 0.9872 - val_loss: 0.0638 - val_acc: 0.9914

Epoch 47/50

0s - loss: 0.0553 - acc: 0.9872 - val_loss: 0.0696 - val_acc: 0.9914

Epoch 48/50

0s - loss: 0.0537 - acc: 0.9872 - val_loss: 0.0675 - val_acc: 0.9914

Epoch 49/50

0s - loss: 0.0537 - acc: 0.9872 - val_loss: 0.0636 - val_acc: 0.9914

Epoch 50/50

0s - loss: 0.0534 - acc: 0.9872 - val_loss: 0.0679 - val_acc: 0.9914

 

Tips for Using Learning Rate Schedules

This section lists some tips and tricks to consider when using learning rate schedules with neural networks.

  • Increase the initial learning rate. Because the learning rate will very likely decrease, start with a larger value to decrease from. A larger learning rate will result in a lot larger changes to the weights, at least in the beginning, allowing you to benefit from the fine tuning later.
  • Use a large momentum. Using a larger momentum value will help the optimization algorithm to continue to make updates in the right direction when your learning rate shrinks to small values.
  • Experiment with different schedules. It will not be clear which learning rate schedule to use so try a few with different configuration options and see what works best on your problem. Also try schedules that change exponentially and even schedules that respond to the accuracy of your model on the training or test datasets.

Summary

In this post you discovered learning rate schedules for training neural network models.

After reading this post you learned:

  • How to configure and use a time-based learning rate schedule in Keras.
  • How to develop your own drop-based learning rate schedule in Keras.

Do you have any questions about learning rate schedules for neural networks or about this post? Ask your question in the comments and I will do my best to answer.

 

参考文献:https://machinelearningmastery.com/using-learning-rate-schedules-deep-learning-models-python-keras/

可以使用Keras的回调函数来实现在每个epoch结束时更改Adam优化器的learning_rate。具体步骤如下: 1. 创建一个回调函数类,继承自`tf.keras.callbacks.Callback`。在类实现`on_epoch_end`方法,该方法会在每个epoch结束时被调用。 2. 在`on_epoch_end`方法根据需要更新Adam优化器的learning_rate。 下面是一个示例代码: ```python import tensorflow as tf from tensorflow.keras.callbacks import Callback class AdamLearningRateScheduler(Callback): def __init__(self, initial_lr, epoch_decay): super().__init__() self.initial_lr = initial_lr self.epoch_decay = epoch_decay def on_epoch_end(self, epoch, logs=None): lr = self.initial_lr / (1 + self.epoch_decay * epoch) tf.keras.backend.set_value(self.model.optimizer.lr, lr) print("Learning rate for epoch {} is {}".format(epoch+1, lr)) ``` 在上面的示例代码,`AdamLearningRateScheduler`类继承自`tf.keras.callbacks.Callback`。它包含两个参数:`initial_lr`和`epoch_decay`,分别代表初始学习率和每个epoch的学习率下降率。 在`on_epoch_end`方法,我们首先计算当前epoch的learning_rate,然后使用`tf.keras.backend.set_value`方法将新的学习率设置为Adam优化器的学习率,最后打印出当前epoch的学习率。 接下来,我们可以在Keras模型使用这个回调函数。例如: ```python from tensorflow.keras.optimizers import Adam model = ... # 定义模型 initial_lr = 0.001 epoch_decay = 0.001 adam = Adam(learning_rate=initial_lr) model.compile(optimizer=adam, ...) lr_scheduler = AdamLearningRateScheduler(initial_lr, epoch_decay) model.fit(..., callbacks=[lr_scheduler]) ``` 在上面的代码,我们首先定义了一个Adam优化器,并将其作为模型的优化器。然后,我们创建了一个`AdamLearningRateScheduler`实例,并将其作为回调函数传递给`fit`方法。在训练过程,每个epoch结束时,`AdamLearningRateScheduler`会根据当前epoch更新Adam优化器的学习率
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值