T11 优化器对比实验

任务:

探究不同优化器、以及不同参数配置对模型的影响

我的环境:

  • 语言环境:python 3.8
  • 编译器:jupyter notebook
  • 深度学习环境:Tensorflow(CPU)

一、数据处理工作

1. 数据导入

from tensorflow          import keras
import matplotlib.pyplot as plt
import pandas            as pd
import numpy             as np
import warnings,os,PIL,pathlib

warnings.filterwarnings("ignore")             #忽略警告信息
plt.rcParams['font.sans-serif']    = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False    # 用来正常显示负号

data_dir = "D:/BaiduNetdiskDownload/datasets/t6"
data_dir = pathlib.Path(data_dir)
import tensorflow as tf

2. 数据查看

image_count = len(list(data_dir.glob('*/*.jpg')))
print('图片总数为:',image_count)
图片总数为: 1800
im = list(data_dir.glob('Angelina Jolie/*.jpg'))
PIL.Image.open(str(im[0]))

在这里插入图片描述

3. 加载数据

使用image_dataset_from_directory方法将磁盘中的数据加载到tf.data.Dataset中

batch_size = 16
img_height = 336
img_width = 336

tf.keras.preprocessing.image_dataset_from_directory 介绍

  • directory: 数据所在目录。如果标签是inferred(默认),则它应该包含子目录,每个目录包含一个类的图像。否则,将忽略目录结构。

  • labels: inferred(标签从目录结构生成),或者是整数标签的列表/元组,其大小与目录中找到的图像文件的数量相同。标签应根据图像文件路径的字母顺序排序(通过Python中的os.walk(directory)获得)。

  • label_mode:
    int:标签将被编码成整数(使用的损失函数应为:sparse_categorical_crossentropy loss)。
    categorical:标签将被编码为分类向量(使用的损失函数应为:categorical_crossentropy loss)。
    binary:意味着标签(只能有2个)被编码为值为0或1的float32标量(例如:binary_crossentropy)。
    None:(无标签)。

  • class_names: 仅当labels为inferred时有效。这是类名称的明确列表(必须与子目录的名称匹配)。用于控制类的顺序(否则使用字母数字顺序)。

  • color_mode: grayscale、rgb、rgba之一。默认值:rgb。图像将被转换为1、3或者4通道。

  • batch_size: 数据批次的大小。默认值:32

  • image_size: 从磁盘读取数据后将其重新调整大小。默认:(256,256)。由于管道处理的图像批次必须具有相同的大小,因此该参数必须提供。

  • shuffle: 是否打乱数据。默认值:True。如果设置为False,则按字母数字顺序对数据进行排序。

  • seed: 用于shuffle和转换的可选随机种子。

  • validation_split: 0和1之间的可选浮点数,可保留一部分数据用于验证。

  • subset: training或validation之一。仅在设置validation_split时使用。

  • interpolation: 字符串,当调整图像大小时使用的插值方法。默认为:bilinear。支持bilinear, nearest, bicubic, area, lanczos3, lanczos5, gaussian, mitchellcubic。

  • follow_links: 是否访问符号链接指向的子目录。默认:False。

本次使用label_mode: categorical

标签将被编码为分类向量(使用的损失函数应为:categorical_crossentropy loss)

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="training",
    seed=12,
    image_size=(img_height, img_width),
    batch_size=batch_size)
Found 1800 files belonging to 17 classes.
Using 1440 files for training.
# class_names 按字母顺序对应目录名称,作为数据集标签
class_names = train_ds.class_names
print(class_names)
['Angelina Jolie', 'Brad Pitt', 'Denzel Washington', 'Hugh Jackman', 'Jennifer Lawrence', 'Johnny Depp', 'Kate Winslet', 'Leonardo DiCaprio', 'Megan Fox', 'Natalie Portman', 'Nicole Kidman', 'Robert Downey Jr', 'Sandra Bullock', 'Scarlett Johansson', 'Tom Cruise', 'Tom Hanks', 'Will Smith']
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="validation",
    seed=12,
    image_size=(img_height, img_width),
    batch_size=batch_size)
Found 1800 files belonging to 17 classes.
Using 360 files for validation.

4. 数据大小显示

for image_batch, labels_batch in train_ds:
    print(image_batch.shape)
    print(labels_batch.shape)
    break
(16, 336, 336, 3)
(16,)

image_batch:32张图片,图片形状为224 * 224 * 3(3代表彩色)。

label_batch:标签对应32张图片。

5. 配置数据集

shuffle():打乱数据

prefetch():预取数据,加速运行。

cache():将数据缓存到cache中,加速运行。

AUTOTUNE = tf.data.AUTOTUNE

def train_preprocessing(image,label):
    return (image/255.0,label)

train_ds = (
    train_ds.cache()
    .shuffle(1000)
    .map(train_preprocessing)    # 这里可以设置预处理函数
#     .batch(batch_size)           # 在image_dataset_from_directory处已经设置了batch_size
    .prefetch(buffer_size=AUTOTUNE)
)

val_ds = (
    val_ds.cache()
    .shuffle(1000)
    .map(train_preprocessing)    # 这里可以设置预处理函数
#     .batch(batch_size)         # 在image_dataset_from_directory处已经设置了batch_size
    .prefetch(buffer_size=AUTOTUNE)
)

6.数据可视化

plt.figure(figsize=(10, 8))  # 图形的宽为10高为5
plt.suptitle("数据展示")

for images, labels in train_ds.take(1):
    for i in range(15):
        plt.subplot(4, 5, i + 1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(False)

        # 显示图片
        plt.imshow(images[i])
        # 显示标签
        plt.xlabel(class_names[labels[i]-1])

plt.show()

在这里插入图片描述

二、CNN网络配置、编译、训练

from tensorflow.keras.layers import Dropout,Dense,BatchNormalization
from tensorflow.keras.models import Model

def create_model(optimizer='adam'):
    # 加载预训练模型
    vgg16_base_model = tf.keras.applications.vgg16.VGG16(weights='imagenet',
                                                                include_top=False,
                                                                input_shape=(img_width, img_height, 3),
                                                                pooling='avg')
    for layer in vgg16_base_model.layers:
        layer.trainable = False

    X = vgg16_base_model.output
    
    X = Dense(170, activation='relu')(X)
    X = BatchNormalization()(X)
    X = Dropout(0.5)(X)

    output = Dense(len(class_names), activation='softmax')(X)
    vgg16_model = Model(inputs=vgg16_base_model.input, outputs=output)

    vgg16_model.compile(optimizer=optimizer,
                        loss='sparse_categorical_crossentropy',
                        metrics=['accuracy'])
    return vgg16_model

model1 = create_model(optimizer=tf.keras.optimizers.Adam())
model2 = create_model(optimizer=tf.keras.optimizers.SGD())
model2.summary()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58889256/58889256 [==============================] - 29s 0us/step
Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 336, 336, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 336, 336, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 336, 336, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 168, 168, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 168, 168, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 168, 168, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 84, 84, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 84, 84, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 84, 84, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 84, 84, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 42, 42, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 42, 42, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 42, 42, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 42, 42, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 21, 21, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 21, 21, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 21, 21, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 21, 21, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 10, 10, 512)       0         
                                                                 
 global_average_pooling2d_1  (None, 512)               0         
  (GlobalAveragePooling2D)                                       
                                                                 
 dense_2 (Dense)             (None, 170)               87210     
                                                                 
 batch_normalization_1 (Bat  (None, 170)               680       
 chNormalization)                                                
                                                                 
 dropout_1 (Dropout)         (None, 170)               0         
                                                                 
 dense_3 (Dense)             (None, 17)                2907      
                                                                 
=================================================================
Total params: 14805485 (56.48 MB)
Trainable params: 90457 (353.35 KB)
Non-trainable params: 14715028 (56.13 MB)
_________________________________________________________________

3. 模型训练 model.fit()

NO_EPOCHS = 50

history_model1  = model1.fit(train_ds, epochs=NO_EPOCHS, verbose=1, validation_data=val_ds)
history_model2  = model2.fit(train_ds, epochs=NO_EPOCHS, verbose=1, validation_data=val_ds)
Epoch 1/50
90/90 [==============================] - 560s 6s/step - loss: 2.8445 - accuracy: 0.1368 - val_loss: 2.6800 - val_accuracy: 0.1083
Epoch 2/50
90/90 [==============================] - 581s 6s/step - loss: 2.0754 - accuracy: 0.3549 - val_loss: 2.4468 - val_accuracy: 0.2389
Epoch 3/50
90/90 [==============================] - 583s 6s/step - loss: 1.7755 - accuracy: 0.4313 - val_loss: 2.1898 - val_accuracy: 0.3889
Epoch 4/50
90/90 [==============================] - 577s 6s/step - loss: 1.5729 - accuracy: 0.5250 - val_loss: 2.0081 - val_accuracy: 0.3750
Epoch 5/50
90/90 [==============================] - 579s 6s/step - loss: 1.3976 - accuracy: 0.5639 - val_loss: 1.9737 - val_accuracy: 0.3722
Epoch 6/50
90/90 [==============================] - 578s 6s/step - loss: 1.2460 - accuracy: 0.6111 - val_loss: 1.9333 - val_accuracy: 0.3806
Epoch 7/50
90/90 [==============================] - 576s 6s/step - loss: 1.1514 - accuracy: 0.6458 - val_loss: 1.5494 - val_accuracy: 0.5028
Epoch 8/50
90/90 [==============================] - 576s 6s/step - loss: 1.0301 - accuracy: 0.6875 - val_loss: 1.4428 - val_accuracy: 0.5306
Epoch 9/50
90/90 [==============================] - 578s 6s/step - loss: 0.9374 - accuracy: 0.7215 - val_loss: 1.5923 - val_accuracy: 0.5000
Epoch 10/50
90/90 [==============================] - 616s 7s/step - loss: 0.8519 - accuracy: 0.7417 - val_loss: 1.8340 - val_accuracy: 0.4250
Epoch 11/50
90/90 [==============================] - 577s 6s/step - loss: 0.8224 - accuracy: 0.7403 - val_loss: 1.6645 - val_accuracy: 0.4778
Epoch 12/50
90/90 [==============================] - 575s 6s/step - loss: 0.7395 - accuracy: 0.7847 - val_loss: 1.5388 - val_accuracy: 0.5139
Epoch 13/50
90/90 [==============================] - 579s 6s/step - loss: 0.6929 - accuracy: 0.7861 - val_loss: 1.5925 - val_accuracy: 0.5389
Epoch 14/50
90/90 [==============================] - 577s 6s/step - loss: 0.6850 - accuracy: 0.7889 - val_loss: 2.0463 - val_accuracy: 0.4278
Epoch 15/50
90/90 [==============================] - 577s 6s/step - loss: 0.5949 - accuracy: 0.8118 - val_loss: 1.3914 - val_accuracy: 0.5833
Epoch 16/50
90/90 [==============================] - 576s 6s/step - loss: 0.5520 - accuracy: 0.8313 - val_loss: 2.3603 - val_accuracy: 0.4028
Epoch 17/50
90/90 [==============================] - 575s 6s/step - loss: 0.5604 - accuracy: 0.8278 - val_loss: 1.6669 - val_accuracy: 0.5056
Epoch 18/50
90/90 [==============================] - 579s 6s/step - loss: 0.5160 - accuracy: 0.8354 - val_loss: 1.9264 - val_accuracy: 0.4833
Epoch 19/50
90/90 [==============================] - 576s 6s/step - loss: 0.4712 - accuracy: 0.8618 - val_loss: 1.8838 - val_accuracy: 0.4917
Epoch 20/50
90/90 [==============================] - 615s 7s/step - loss: 0.4372 - accuracy: 0.8687 - val_loss: 1.5653 - val_accuracy: 0.5528
Epoch 21/50
90/90 [==============================] - 612s 7s/step - loss: 0.4237 - accuracy: 0.8778 - val_loss: 1.7245 - val_accuracy: 0.5056
Epoch 22/50
90/90 [==============================] - 625s 7s/step - loss: 0.4192 - accuracy: 0.8660 - val_loss: 1.5073 - val_accuracy: 0.5667
Epoch 23/50
90/90 [==============================] - 610s 7s/step - loss: 0.3823 - accuracy: 0.8903 - val_loss: 1.7911 - val_accuracy: 0.5194
Epoch 24/50
90/90 [==============================] - 599s 7s/step - loss: 0.3811 - accuracy: 0.8757 - val_loss: 1.8001 - val_accuracy: 0.5111
Epoch 25/50
90/90 [==============================] - 603s 7s/step - loss: 0.3615 - accuracy: 0.8986 - val_loss: 1.7550 - val_accuracy: 0.5389
Epoch 26/50
90/90 [==============================] - 611s 7s/step - loss: 0.3507 - accuracy: 0.8965 - val_loss: 2.0230 - val_accuracy: 0.5028
Epoch 27/50
90/90 [==============================] - 605s 7s/step - loss: 0.3462 - accuracy: 0.8951 - val_loss: 1.7636 - val_accuracy: 0.5444
Epoch 28/50
90/90 [==============================] - 594s 7s/step - loss: 0.3291 - accuracy: 0.9083 - val_loss: 2.1416 - val_accuracy: 0.5167
Epoch 29/50
90/90 [==============================] - 593s 7s/step - loss: 0.2895 - accuracy: 0.9194 - val_loss: 2.1473 - val_accuracy: 0.5056
Epoch 30/50
90/90 [==============================] - 594s 7s/step - loss: 0.3023 - accuracy: 0.9028 - val_loss: 2.7570 - val_accuracy: 0.4611
Epoch 31/50
90/90 [==============================] - 579s 6s/step - loss: 0.3208 - accuracy: 0.9021 - val_loss: 2.1075 - val_accuracy: 0.4972
Epoch 32/50
90/90 [==============================] - 588s 7s/step - loss: 0.2587 - accuracy: 0.9271 - val_loss: 1.7806 - val_accuracy: 0.5250
Epoch 33/50
90/90 [==============================] - 607s 7s/step - loss: 0.2794 - accuracy: 0.9111 - val_loss: 1.7796 - val_accuracy: 0.5889
Epoch 34/50
90/90 [==============================] - 604s 7s/step - loss: 0.2635 - accuracy: 0.9194 - val_loss: 2.0160 - val_accuracy: 0.5444
Epoch 35/50
90/90 [==============================] - 604s 7s/step - loss: 0.2761 - accuracy: 0.9083 - val_loss: 2.3916 - val_accuracy: 0.4750
Epoch 36/50
90/90 [==============================] - 613s 7s/step - loss: 0.2486 - accuracy: 0.9222 - val_loss: 3.0425 - val_accuracy: 0.4250
Epoch 37/50
90/90 [==============================] - 625s 7s/step - loss: 0.2234 - accuracy: 0.9278 - val_loss: 2.2691 - val_accuracy: 0.5083
Epoch 38/50
90/90 [==============================] - 681s 8s/step - loss: 0.2369 - accuracy: 0.9264 - val_loss: 2.1688 - val_accuracy: 0.5306
Epoch 39/50
90/90 [==============================] - 678s 8s/step - loss: 0.2253 - accuracy: 0.9292 - val_loss: 3.9337 - val_accuracy: 0.4194
Epoch 40/50
90/90 [==============================] - 56849s 639s/step - loss: 0.2516 - accuracy: 0.9153 - val_loss: 3.0538 - val_accuracy: 0.4611
Epoch 41/50
90/90 [==============================] - 598s 7s/step - loss: 0.2461 - accuracy: 0.9139 - val_loss: 2.2922 - val_accuracy: 0.5250
Epoch 42/50
90/90 [==============================] - 615s 7s/step - loss: 0.2183 - accuracy: 0.9340 - val_loss: 2.7955 - val_accuracy: 0.4778
Epoch 43/50
90/90 [==============================] - 598s 7s/step - loss: 0.2336 - accuracy: 0.9306 - val_loss: 2.4670 - val_accuracy: 0.5222
Epoch 44/50
90/90 [==============================] - 605s 7s/step - loss: 0.2315 - accuracy: 0.9319 - val_loss: 2.1869 - val_accuracy: 0.5472
Epoch 45/50
90/90 [==============================] - 596s 7s/step - loss: 0.2037 - accuracy: 0.9340 - val_loss: 2.6641 - val_accuracy: 0.5111
Epoch 46/50
90/90 [==============================] - 588s 7s/step - loss: 0.2270 - accuracy: 0.9271 - val_loss: 2.7298 - val_accuracy: 0.5028
Epoch 47/50
90/90 [==============================] - 636s 7s/step - loss: 0.2165 - accuracy: 0.9326 - val_loss: 1.9565 - val_accuracy: 0.5722
Epoch 48/50
90/90 [==============================] - 604s 7s/step - loss: 0.2006 - accuracy: 0.9354 - val_loss: 2.2244 - val_accuracy: 0.5583
Epoch 49/50
90/90 [==============================] - 579s 6s/step - loss: 0.1724 - accuracy: 0.9563 - val_loss: 1.8513 - val_accuracy: 0.5722
Epoch 50/50
90/90 [==============================] - 574s 6s/step - loss: 0.1927 - accuracy: 0.9417 - val_loss: 2.1781 - val_accuracy: 0.5500
Epoch 1/50
90/90 [==============================] - 570s 6s/step - loss: 3.1400 - accuracy: 0.0924 - val_loss: 2.8241 - val_accuracy: 0.0833
Epoch 2/50
90/90 [==============================] - 577s 6s/step - loss: 2.5398 - accuracy: 0.2167 - val_loss: 2.5999 - val_accuracy: 0.1417
Epoch 3/50
90/90 [==============================] - 578s 6s/step - loss: 2.2445 - accuracy: 0.2875 - val_loss: 2.3869 - val_accuracy: 0.2694
Epoch 4/50
90/90 [==============================] - 580s 6s/step - loss: 2.0507 - accuracy: 0.3382 - val_loss: 2.1916 - val_accuracy: 0.3417
Epoch 5/50
90/90 [==============================] - 592s 7s/step - loss: 1.9049 - accuracy: 0.3722 - val_loss: 2.0505 - val_accuracy: 0.3333
Epoch 6/50
90/90 [==============================] - 603s 7s/step - loss: 1.7958 - accuracy: 0.4097 - val_loss: 1.8496 - val_accuracy: 0.4056
Epoch 7/50
90/90 [==============================] - 574s 6s/step - loss: 1.7238 - accuracy: 0.4465 - val_loss: 1.8490 - val_accuracy: 0.3972
Epoch 8/50
90/90 [==============================] - 573s 6s/step - loss: 1.6183 - accuracy: 0.4819 - val_loss: 1.7628 - val_accuracy: 0.4500
Epoch 9/50
90/90 [==============================] - 570s 6s/step - loss: 1.5404 - accuracy: 0.4896 - val_loss: 1.7146 - val_accuracy: 0.4389
Epoch 10/50
90/90 [==============================] - 589s 7s/step - loss: 1.4720 - accuracy: 0.5222 - val_loss: 1.7327 - val_accuracy: 0.4361
Epoch 11/50
90/90 [==============================] - 597s 7s/step - loss: 1.4247 - accuracy: 0.5389 - val_loss: 1.6533 - val_accuracy: 0.4667
Epoch 12/50
90/90 [==============================] - 574s 6s/step - loss: 1.3573 - accuracy: 0.5722 - val_loss: 1.5535 - val_accuracy: 0.5056
Epoch 13/50
90/90 [==============================] - 573s 6s/step - loss: 1.3106 - accuracy: 0.5667 - val_loss: 1.5113 - val_accuracy: 0.5167
Epoch 14/50
90/90 [==============================] - 577s 6s/step - loss: 1.2598 - accuracy: 0.5903 - val_loss: 1.5496 - val_accuracy: 0.4972
Epoch 15/50
90/90 [==============================] - 575s 6s/step - loss: 1.2133 - accuracy: 0.6097 - val_loss: 1.4764 - val_accuracy: 0.5167
Epoch 16/50
90/90 [==============================] - 573s 6s/step - loss: 1.2073 - accuracy: 0.6028 - val_loss: 1.5214 - val_accuracy: 0.5111
Epoch 17/50
90/90 [==============================] - 574s 6s/step - loss: 1.1413 - accuracy: 0.6444 - val_loss: 1.4551 - val_accuracy: 0.5389
Epoch 18/50
90/90 [==============================] - 577s 6s/step - loss: 1.0776 - accuracy: 0.6451 - val_loss: 1.4343 - val_accuracy: 0.5500
Epoch 19/50
90/90 [==============================] - 597s 7s/step - loss: 1.0901 - accuracy: 0.6521 - val_loss: 1.4370 - val_accuracy: 0.5528
Epoch 20/50
90/90 [==============================] - 578s 6s/step - loss: 1.0289 - accuracy: 0.6674 - val_loss: 1.4498 - val_accuracy: 0.5333
Epoch 21/50
90/90 [==============================] - 574s 6s/step - loss: 1.0286 - accuracy: 0.6764 - val_loss: 1.4535 - val_accuracy: 0.5444
Epoch 22/50
90/90 [==============================] - 584s 6s/step - loss: 1.0231 - accuracy: 0.6743 - val_loss: 1.5206 - val_accuracy: 0.5000
Epoch 23/50
90/90 [==============================] - 619s 7s/step - loss: 0.9692 - accuracy: 0.6924 - val_loss: 1.6515 - val_accuracy: 0.4889
Epoch 24/50
90/90 [==============================] - 621s 7s/step - loss: 1.0004 - accuracy: 0.6764 - val_loss: 1.8658 - val_accuracy: 0.4528
Epoch 25/50
90/90 [==============================] - 587s 7s/step - loss: 0.9440 - accuracy: 0.6972 - val_loss: 1.5016 - val_accuracy: 0.5250
Epoch 26/50
90/90 [==============================] - 597s 7s/step - loss: 0.9007 - accuracy: 0.7063 - val_loss: 1.5148 - val_accuracy: 0.5556
Epoch 27/50
90/90 [==============================] - 590s 7s/step - loss: 0.9023 - accuracy: 0.7007 - val_loss: 1.6740 - val_accuracy: 0.4917
Epoch 28/50
90/90 [==============================] - 578s 6s/step - loss: 0.8649 - accuracy: 0.7319 - val_loss: 1.6031 - val_accuracy: 0.5000
Epoch 29/50
90/90 [==============================] - 596s 7s/step - loss: 0.8253 - accuracy: 0.7444 - val_loss: 1.6145 - val_accuracy: 0.5000
Epoch 30/50
90/90 [==============================] - 601s 7s/step - loss: 0.8269 - accuracy: 0.7431 - val_loss: 1.6707 - val_accuracy: 0.4917
Epoch 31/50
90/90 [==============================] - 587s 7s/step - loss: 0.8244 - accuracy: 0.7458 - val_loss: 1.4919 - val_accuracy: 0.5167
Epoch 32/50
90/90 [==============================] - 610s 7s/step - loss: 0.7788 - accuracy: 0.7583 - val_loss: 1.6788 - val_accuracy: 0.5083
Epoch 33/50
90/90 [==============================] - 601s 7s/step - loss: 0.7897 - accuracy: 0.7514 - val_loss: 1.5903 - val_accuracy: 0.5361
Epoch 34/50
90/90 [==============================] - 609s 7s/step - loss: 0.7520 - accuracy: 0.7542 - val_loss: 1.3669 - val_accuracy: 0.5500
Epoch 35/50
90/90 [==============================] - 591s 7s/step - loss: 0.7194 - accuracy: 0.7778 - val_loss: 1.4753 - val_accuracy: 0.5278
Epoch 36/50
90/90 [==============================] - 589s 7s/step - loss: 0.7018 - accuracy: 0.7812 - val_loss: 1.4999 - val_accuracy: 0.5389
Epoch 37/50
90/90 [==============================] - 604s 7s/step - loss: 0.6837 - accuracy: 0.7792 - val_loss: 1.3384 - val_accuracy: 0.5806
Epoch 38/50
90/90 [==============================] - 596s 7s/step - loss: 0.6799 - accuracy: 0.7979 - val_loss: 1.4455 - val_accuracy: 0.5750
Epoch 39/50
90/90 [==============================] - 626s 7s/step - loss: 0.6773 - accuracy: 0.7868 - val_loss: 1.5575 - val_accuracy: 0.5500
Epoch 40/50
90/90 [==============================] - 602s 7s/step - loss: 0.6369 - accuracy: 0.7993 - val_loss: 1.5304 - val_accuracy: 0.5250
Epoch 41/50
90/90 [==============================] - 577s 6s/step - loss: 0.6305 - accuracy: 0.8132 - val_loss: 1.4477 - val_accuracy: 0.5611
Epoch 42/50
90/90 [==============================] - 565s 6s/step - loss: 0.6462 - accuracy: 0.7972 - val_loss: 1.3822 - val_accuracy: 0.5694
Epoch 43/50
90/90 [==============================] - 566s 6s/step - loss: 0.6043 - accuracy: 0.8069 - val_loss: 1.5929 - val_accuracy: 0.5306
Epoch 44/50
90/90 [==============================] - 562s 6s/step - loss: 0.6051 - accuracy: 0.8139 - val_loss: 1.6310 - val_accuracy: 0.5250
Epoch 45/50
90/90 [==============================] - 554s 6s/step - loss: 0.6047 - accuracy: 0.8076 - val_loss: 1.4956 - val_accuracy: 0.5417
Epoch 46/50
90/90 [==============================] - 557s 6s/step - loss: 0.6049 - accuracy: 0.8208 - val_loss: 1.5802 - val_accuracy: 0.5333
Epoch 47/50
90/90 [==============================] - 557s 6s/step - loss: 0.5727 - accuracy: 0.8236 - val_loss: 1.5956 - val_accuracy: 0.5250
Epoch 48/50
90/90 [==============================] - 566s 6s/step - loss: 0.5756 - accuracy: 0.8285 - val_loss: 1.5809 - val_accuracy: 0.5222
Epoch 49/50
90/90 [==============================] - 573s 6s/step - loss: 0.5251 - accuracy: 0.8438 - val_loss: 1.5287 - val_accuracy: 0.5528
Epoch 50/50
90/90 [==============================] - 564s 6s/step - loss: 0.5463 - accuracy: 0.8299 - val_loss: 1.5915 - val_accuracy: 0.5389

三、模型评估

  1. 使用history.history[’ ']调用运行过程中的记录,包括损失和准确率
  • loss:训练集损失值
  • accuracy:训练集准确率
  • val_loss:测试集损失值
  • val_accruacy:测试集准确率
  1. 使用model.evaluate()函数进行模型评估:
    -输入为x,y,batch_size(整数,表示使用的批次大小),verbose。
  • verbose:整数,表示是否在测试时输出信息。0 表示不输出,1 表示输出进度条,2 表示每个批次输出一行信息。
  • 输出为loss,accuracy。

与t3没有区别。显示loss和accuracy随迭代轮数的变化。

from matplotlib.ticker import MultipleLocator
plt.rcParams['savefig.dpi'] = 300 #图片像素
plt.rcParams['figure.dpi']  = 300 #分辨率

acc1     = history_model1.history['accuracy']
acc2     = history_model2.history['accuracy']
val_acc1 = history_model1.history['val_accuracy']
val_acc2 = history_model2.history['val_accuracy']

loss1     = history_model1.history['loss']
loss2     = history_model2.history['loss']
val_loss1 = history_model1.history['val_loss']
val_loss2 = history_model2.history['val_loss']

epochs_range = range(len(acc1))

plt.figure(figsize=(16, 4))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, acc1, label='Training Accuracy-Adam')
plt.plot(epochs_range, acc2, label='Training Accuracy-SGD')
plt.plot(epochs_range, val_acc1, label='Validation Accuracy-Adam')
plt.plot(epochs_range, val_acc2, label='Validation Accuracy-SGD')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
# 设置刻度间隔,x轴每1一个刻度
ax = plt.gca()
ax.xaxis.set_major_locator(MultipleLocator(1))

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss1, label='Training Loss-Adam')
plt.plot(epochs_range, loss2, label='Training Loss-SGD')
plt.plot(epochs_range, val_loss1, label='Validation Loss-Adam')
plt.plot(epochs_range, val_loss2, label='Validation Loss-SGD')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
   
# 设置刻度间隔,x轴每1一个刻度
ax = plt.gca()
ax.xaxis.set_major_locator(MultipleLocator(1))

plt.show()

在这里插入图片描述

从上图中看出Adam对于训练集的预测效果更好,准确率更高且loss更低。但对于验证集,Adam的准确率较高但loss更大,Adam的抖动比较剧烈。

def test_accuracy_report(model):
    score = model.evaluate(val_ds, verbose=0)
    print('Loss function: %s, accuracy:' % score[0], score[1])
    
test_accuracy_report(model1)
Loss function: 2.178075075149536, accuracy: 0.550000011920929
test_accuracy_report(model2)
Loss function: 1.5914756059646606, accuracy: 0.5388888716697693

五、总结

优化器:一种算法,动态地调整梯度的大小和方向使模型更好or更快收敛。
参考30min读懂优化器

1. 梯度下降法

在这里插入图片描述

  1. 特点:不断寻找最陡最快的路径下山
  2. 缺点:训练速度慢(每次迭代输入全部样本),容易陷入局部最优解
  3. 改进的算法(改变每次输入的样本数据量)
    • 批量梯度下降BGD:下降方向为全体样本的总体平均梯度,速度慢,可能全局最优解。
    • 随机梯度下降SGD:下降方向为一个样本的梯度,速度很快,波动大但很好收敛,可能是局部最优解。

2.动量优化法

  1. 动量:p = mv(速度越小动量越小,速度越大动量越大)
  2. 思想: 梯度下降法容易在局部最优处来回振荡,动量法梯度更新方向=积累之前更新的方向+当前梯度微调。
  3. 优点:前后梯度一致时加速学习,不一致时抑制振荡,可以越过局部最优解。
  4. 缺点:多了一个超参数,增加计算量。

3. 自适应学习率优化算法

  1. 设计思路:传统算法中学习率为常数或仅随着训练次数调节,如果能够根据训练情况自适应调节学习率,可以提高训练速度和准确率。
  2. 解决方案:对每个参与训练的参数设置不同的学习率。 如果损失与某一指定参数的偏导的符号相同,那么学习率应该增加; 如果损失与该参数的偏导的符号不同,那么学习率应该减小。
  3. 算法:
    • AdaGrad:独立地适应所有模型参数的学习率,缩放每个参数反比于其所有梯度历史平均值总和的平方根。缺点:中后期更新量会越来越小趋近于0,很少用该方法。
    • Adadelta:不累计全部历史梯度,只关注过去一段时间窗口的下降梯度(解决了学习率急剧下降的问题)。缺点:后期在局部最小值附近抖动。
    • RMSprop:讲梯度平方和累加改为指数加权的平均移动,可以淡化遥远历史对当前的影响。适合处理非平稳目标,对RNN效果很好。
    • Adam:结合了之前的方法。优点:参数平稳,结合了Adagrad善于处理稀疏梯度和RMSprop善于处理非平稳目标的优点,节省成本和时间。缺点:动量滑动平均,可能会随着训练数据变化抖动剧烈。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值