关于Keras里常用各种优化器的实验

11 篇文章 3 订阅
2 篇文章 0 订阅

前言

一直以来只知道根据例程里的使用SGD优化器,也不知道有什么好处和速度,为此就列出Keras里常用的优化器,分别对比一下他们效果。
本实验有SGD,Adagrad,Adam,RMSprop,Adadelta,Adamax,Nadam的实验对比。
本人只合适小白学习者看,反正我也不懂什么公式的。

其他数据以及神经网络结构

为了验证这个实验,使用的方法是控制变量法(物理专业是不是很熟悉?)去进行实验。
数据:Keras自带的mnist数据集
神经网络结构:基础LeNet结构
Keras版本:
Python版本:
统一代码:(注:搭建神经网络有多种方法,这只是其中一种,其它方法请参考Keras添加网络层的N种方法

import keras
from keras.datasets import mnist
from keras.layers import Conv2D, MaxPool2D, Dropout, Flatten, Dense, Activation, Input
from keras.models import Sequential, Model
from keras.losses import categorical_crossentropy
from keras.utils import to_categorical
from keras.optimizers import SGD, Adagrad, Adam, RMSprop, Adadelta, Adamax, Nadam
import numpy as np

(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = np.expand_dims(X_train, 3)
X_test = np.expand_dims(X_test, 3)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

def net(input_shape, output_class):
    zi = Input(input_shape)
    z = Conv2D(20, (3, 3))(zi)
    z = MaxPool2D((2, 2))(z)
    z = Activation('tanh')(z)
    z = Conv2D(30, (3, 3))(z)
    z = MaxPool2D((2, 2))(z)
    z = Activation('tanh')(z)
    z = Dropout(0.5)(z)
    z = Flatten()(z)
    z = Dense(1000)(z)
    z = Dropout(0.5)(z)
    z = Dense(output_class)(z)
    zo = Activation('softmax')(z)
    model = Model(inputs=zi, output=zo)
    model.compile(SGD(0.01, 0.9, True), categorical_crossentropy, ['accuracy'])  # 不同的实验改变这里的参数
    return model
model = net(X_train.shape[1:], y_train.shape[1])
model.fit(X_train, y_train, batch_size=1000, epochs=2, validation_data=(X_test, y_test))

优化器之SGD

  1. 使用SGD,默认参数,两轮fit后train的accuracy为0.6777,test的为0.8754,再fit两轮train为0.8293,test为0.918
Epoch 1/2
60000/60000 [==============================] - 35s 583us/step - loss: 1.9640 - accuracy: 0.3472 - val_loss: 0.9992 - val_accuracy: 0.7992
Epoch 2/2
60000/60000 [==============================] - 40s 667us/step - loss: 1.0139 - accuracy: 0.6777 - val_loss: 0.5757 - val_accuracy: 0.8754
Epoch 1/2
60000/60000 [==============================] - 35s 589us/step - loss: 0.7026 - accuracy: 0.7824 - val_loss: 0.4138 - val_accuracy: 0.9007
Epoch 2/2
60000/60000 [==============================] - 39s 648us/step - loss: 0.5542 - accuracy: 0.8293 - val_loss: 0.3311 - val_accuracy: 0.9181
  1. SGD,有动量为0.9,Nesterov为True, 两轮fit后train为0.9107,test为0.9573。
Epoch 1/2
60000/60000 [==============================] - 36s 606us/step - loss: 0.8629 - accuracy: 0.7203 - val_loss: 0.2172 - val_accuracy: 0.9370
Epoch 2/2
60000/60000 [==============================] - 37s 623us/step - loss: 0.2851 - accuracy: 0.9107 - val_loss: 0.1457 - val_accuracy: 0.9573

结论:SGD训练是慢,只要不是遇到鞍点,多训练一点还是可以到达收敛的。

优化器之Adagrad

  1. Adagrad,默认值,两轮fit后train的accuracy为0.9127,test为0.9569。
Epoch 1/2
60000/60000 [==============================] - 37s 617us/step - loss: 1.7784 - accuracy: 0.8039 - val_loss: 0.1961 - val_accuracy: 0.9422
Epoch 2/2
60000/60000 [==============================] - 38s 631us/step - loss: 0.2839 - accuracy: 0.9127 - val_loss: 0.1501 - val_accuracy: 0.9569

结论:这个优化器Adagrad也没有什么参数可以调的。只有一个learning_rate,其他你别动。

In [27]: Adagrad??
Init signature: Adagrad(learning_rate=0.01, **kwargs)
Source:
class Adagrad(Optimizer):
    """Adagrad optimizer.

    Adagrad is an optimizer with parameter-specific learning rates,
    which are adapted relative to how frequently a parameter gets
    updated during training. The more updates a parameter receives,
    the smaller the learning rate.

    It is recommended to leave the parameters of this optimizer
    at their default values.

    # Arguments
        learning_rate: float >= 0. Initial learning rate.

    # References
        - [Adaptive Subgradient Methods for Online Learning and Stochastic
           Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
    """

优化器之Adam

  1. Adam,使用默认值,两轮fit后train的accuracy为0.9348,test为0.9691,其实第一轮时test就到达0.9548了。
Epoch 1/2
60000/60000 [==============================] - 36s 599us/step - loss: 0.6294 - accuracy: 0.8043 - val_loss: 0.1628 - val_accuracy: 0.9548
Epoch 2/2
60000/60000 [==============================] - 46s 764us/step - loss: 0.2073 - accuracy: 0.9348 - val_loss: 0.1057 - val_accuracy: 0.9691
  1. Adam,使用learning_rate=0.01,其它默认,一轮fit后test达到0.9592,两轮test就到0.9725。
Epoch 1/2
60000/60000 [==============================] - 36s 593us/step - loss: 2.3847 - accuracy: 0.7794 - val_loss: 0.1803 - val_accuracy: 0.9592
Epoch 2/2
60000/60000 [==============================] - 38s 629us/step - loss: 0.2549 - accuracy: 0.9329 - val_loss: 0.0984 - val_accuracy: 0.9725
  1. Adam,使用learning_rate=0.01,amsgrad=True,结果同上
Epoch 1/2
60000/60000 [==============================] - 37s 621us/step - loss: 4.3511 - accuracy: 0.7465 - val_loss: 0.2714 - val_accuracy: 0.9545
Epoch 2/2
60000/60000 [==============================] - 44s 733us/step - loss: 0.3458 - accuracy: 0.9337 - val_loss: 0.1064 - val_accuracy: 0.9719

结论:这个优化器可以很快收敛。

优化器之RMSprop

  1. RMSprop,使用默认值,两轮fit后train的accuracy为0.9333,test为0.9673,其实一轮时test到达0.9579了。
Epoch 1/2
60000/60000 [==============================] - 36s 594us/step - loss: 0.7515 - accuracy: 0.8130 - val_loss: 0.1454 - val_accuracy: 0.9579
Epoch 2/2
60000/60000 [==============================] - 37s 625us/step - loss: 0.2154 - accuracy: 0.9333 - val_loss: 0.1007 - val_accuracy: 0.9673
  1. RMSprop,使用learning_rate=0.01,两轮test才到达0.9563。
Epoch 1/2
60000/60000 [==============================] - 36s 594us/step - loss: 21.2518 - accuracy: 0.6875 - val_loss: 8.6292 - val_accuracy: 0.5309
Epoch 2/2
60000/60000 [==============================] - 42s 699us/step - loss: 6.1957 - accuracy: 0.8132 - val_loss: 0.8259 - val_accuracy: 0.9563
  1. RMSprop,使用learning_rate默认值,rho=0.5,两轮后train为0.9364,test为0.9674,其实一轮时test到达0.9569了。
Epoch 1/2
60000/60000 [==============================] - 35s 590us/step - loss: 0.5887 - accuracy: 0.8204 - val_loss: 0.1453 - val_accuracy: 0.9569
Epoch 2/2
60000/60000 [==============================] - 38s 627us/step - loss: 0.2050 - accuracy: 0.9364 - val_loss: 0.1077 - val_accuracy: 0.9674

结论:它的默认参数已经很完美了。

优化器之Adadelta

  1. Adadelta,使用默认值,两轮fit后train的accuracy为0.9351,test为0.9694,其实一轮时test到达0.9566了。
Epoch 1/2
60000/60000 [==============================] - 37s 618us/step - loss: 0.6233 - accuracy: 0.8176 - val_loss: 0.1516 - val_accuracy: 0.9566
Epoch 2/2
60000/60000 [==============================] - 38s 639us/step - loss: 0.2077 - accuracy: 0.9351 - val_loss: 0.1007 - val_accuracy: 0.9694

结论:它的默认参数已经很完美了。

优化器之Adamax

  1. Adamax,使用默认值,两轮fit后train的accuracy为0.9251,test为0.9638,其实一轮时test到达0.9488了。
Epoch 1/2
60000/60000 [==============================] - 36s 600us/step - loss: 0.6072 - accuracy: 0.8148 - val_loss: 0.1794 - val_accuracy: 0.9488
Epoch 2/2
60000/60000 [==============================] - 37s 619us/step - loss: 0.2410 - accuracy: 0.9251 - val_loss: 0.1234 - val_accuracy: 0.9638
  1. Adamax,使用learning_rate=0.01,一轮test达到0.9482,两轮test达到0.9616。
Epoch 1/2
60000/60000 [==============================] - 36s 598us/step - loss: 1.9103 - accuracy: 0.7729 - val_loss: 0.2179 - val_accuracy: 0.9482
Epoch 2/2
60000/60000 [==============================] - 36s 606us/step - loss: 0.3143 - accuracy: 0.9196 - val_loss: 0.1345 - val_accuracy: 0.9616

结论:它很快可以收敛。

优化器之Nadam

  1. Nadam,使用默认值,一轮fit后test的accuracy达到0.9621,两轮后test达到0.9692。
Epoch 1/2
60000/60000 [==============================] - 36s 606us/step - loss: 0.5319 - accuracy: 0.8454 - val_loss: 0.1266 - val_accuracy: 0.9621
Epoch 2/2
60000/60000 [==============================] - 39s 651us/step - loss: 0.1791 - accuracy: 0.9439 - val_loss: 0.0949 - val_accuracy: 0.9692

结论:参数完美。

后记

以后假如再写神经网络,不再会傻傻地只用SGD了。。。。。。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值