tensorflow2.0之softmax激活解决多分类问题

softmax激活函数

  • softmax逻辑回归模型是logistic回归模型在多分类问题上的推广,在多分类问题中,类标签y可以取两个以上的值。在机器学习尤其是深度学习中,softmax是个非常常用而且比较重要的函数,尤其在多分类的场景中使用广泛。他把一些输入映射为0-1之间的实数,并且归一化保证和为1,因此多分类的概率之和也刚好为1。

  • Keras 是一个用 Python 编写的高级神经网络 API,它能够以 TensorFlow, CNTK, 或者 Theano 作为后端运行。
    Keras可以很明确的定义了层的概念,反过来层与层之间的参数反倒是用户不需要关心的对象,所以构建神经网络的方法对于普通开发者来说,相对tensorflow,Keras更易上手。
    并且Keras也是tensorflow官方在tensorflow2.0开始极力推荐使用的。

  • 下面我们就利用tf.keras实现一个多层感知机进行多分类,然后用此模型对fashion_mnist数据集进行训练预测。

  • fashion_mnist数据集和手写数字集类似,训练集6w张图片,测试集1w张图片,共有十个分类:衣服,鞋子,T恤等等;每张图片是一个28*28像素的图片,每个像素值都是0-255的灰度图。

  • 首先导入需要用的包

import numpy as np
import pandas as ps
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
  • 分割数据集,6w张图片和相应的标签用来训练,另外1w张图片和标签用作测试
## 分割fashion_mnist数据集
(train_image,train_label),(test_image,test_label) = tf.keras.datasets.fashion_mnist.load_data()
print(train_image.shape,train_label.shape)
print(test_image.shape,test_label.shape)

>>
(60000, 28, 28) (60000,)
(10000, 28, 28) (10000,)
  • 查看一下数据,可以看到训练集的第一张图片是鞋子,标签是9
plt.imshow(train_image[0])
train_label[0]
>> 9

在这里插入图片描述

train_image[0]              # 0-255的灰度图
>> array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
         ...
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   1,   1,   1,   0,
        200, 232, 232, 233, 229, 223, 223, 215, 213, 164, 127, 123, 196,
        229,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
        183, 225, 216, 223, 228, 235, 227, 224, 222, 224, 221, 223, 245,
        173,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
        193, 228, 218, 213, 198, 180, 212, 210, 211, 213, 223, 220, 243,
        202,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   1,   3,   0,  12,
        219, 220, 212, 218, 192, 169, 227, 208, 218, 224, 212, 226, 197,
        209,  52],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   6,   0,  99,
        244, 222, 220, 218, 203, 198, 221, 215, 213, 222, 220, 245, 119,
        167,  56],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   4,   0,   0,  55,
        236, 228, 230, 228, 240, 232, 213, 218, 223, 234, 217, 217, 209,
         92,   0],
       [  0,   0,   1,   4,   6,   7,   2,   0,   0,   0,   0,   0, 237,
        226, 217, 223, 222, 219, 222, 221, 216, 223, 229, 215, 218, 255,
         77,   0],
       [  0,   3,   0,   0,   0,   0,   0,   0,   0,  62, 145, 204, 228,
        207, 213, 221, 218, 208, 211, 218, 224, 223, 219, 215, 224, 244,
        159,   0],
       [  0,   0,   0,   0,  18,  44,  82, 107, 189, 228, 220, 222, 217,
        226, 200, 205, 211, 230, 224, 234, 176, 188, 250, 248, 233, 238,
        215,   0],
       [  0,  57, 187, 208, 224, 221, 224, 208, 204, 214, 208, 209, 200,
        159, 245, 193, 206, 223, 255, 255, 221, 234, 221, 211, 220, 232,
        246,   0],
       [  3, 202, 228, 224, 221, 211, 211, 214, 205, 205, 205, 220, 240,
         80, 150, 255, 229, 221, 188, 154, 191, 210, 204, 209, 222, 228,
        225,   0],
       ...
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]], dtype=uint8)
  • 由于图片的像素值是0-255的,所以我们对图片进行归一化只需对每张图片除以255,然后利用归一化的数据初始化模型,因为模型输出是10个分类所以最后一层需要用softmax激活。
# 对数据进行归一化
train_image = train_image/255
test_image = test_image/255


# 初始化线性模型
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape = (28,28)))                # 把二维的图片扁平化处理(28,28)→(784,)
model.add(tf.keras.layers.Dense(128, activation = "relu"))
# model.add(tf.keras.layers.Dense(64, activation = "relu"))
model.add(tf.keras.layers.Dense(10, activation = "softmax")) 
model.summary()

>> 
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 128)               100480    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
  • 下面对模型进行编译,然后对模型训练
# 当label是数字编码时,loss用sparse_categorical_crossentropy,独热编码时用categorical_crossentropy
model.compile(
              optimizer="adam",
              loss = 'sparse_categorical_crossentropy',  
              metrics = ['acc'])
              
#  模型的训练
history = model.fit(train_image,train_label,epochs=10)

>> 
Epoch 1/10
60000/60000 [==============================] - 3s 49us/sample - loss: 0.4967 - acc: 0.8255
Epoch 2/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.3736 - acc: 0.8658
Epoch 3/10
60000/60000 [==============================] - 3s 43us/sample - loss: 0.3374 - acc: 0.8773
Epoch 4/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.3118 - acc: 0.8858
Epoch 5/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.2958 - acc: 0.8913
Epoch 6/10
60000/60000 [==============================] - 2s 39us/sample - loss: 0.2806 - acc: 0.8963
Epoch 7/10
60000/60000 [==============================] - 2s 36us/sample - loss: 0.2691 - acc: 0.9012
Epoch 8/10
60000/60000 [==============================] - 2s 39us/sample - loss: 0.2567 - acc: 0.9047
Epoch 9/10
60000/60000 [==============================] - 2s 33us/sample - loss: 0.2440 - acc: 0.9092
Epoch 10/10
60000/60000 [==============================] - 2s 32us/sample - loss: 0.2370 - acc: 0.9116
  • 画图观察一下训练的效果
history.history.keys()            # 字典的形式读出训练的损失loss和精度acc
y_loss = history.history.get('loss')
y_acc = history.history.get('acc')

# plt.figure(figsize= (20,8), dpi = 80)
plt.plot(history.epoch, y_acc)
plt.show()

plt.plot(history.epoch, y_loss)
plt.show()

在这里插入图片描述在这里插入图片描述

  • 下面利用evaluate方法对模型效果进行评估
#  模型评估
print(model.evaluate(test_image,test_label))
model.evaluate(train_image,train_label)

>> 
10000/10000 [==============================] - 0s 36us/sample - loss: 0.3444 - acc: 0.8922
[0.3443740872859955, 0.8922]
60000/60000 [==============================] - 2s 31us/sample - loss: 0.1544 - acc: 0.9440
[0.15438867097496986, 0.9440333]

下面使用独热编码的形式重新训练模型

  • 独热编码的训练时的损失函数使用categorical_crossentropy
    调用模块tf.keras.utils.to_categorical(train_label)完成独热编码
# 独热编码的使用和模型优化
train_label_onehot = tf.keras.utils.to_categorical(train_label)
test_label_onehot = tf.keras.utils.to_categorical(test_label)
print(train_label)
print(train_label_onehot[-1],'\n')
train_label_onehot

>> 
[9 0 0 ... 3 0 5]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] 

array([[0., 0., 0., ..., 0., 0., 1.],
       [1., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
  • 重新训练一个新的线性模型
# 初始化线性模型
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape = (28,28)))                # 把二维的图片扁平化处理(28,28)→(784,)
model.add(tf.keras.layers.Dense(128, activation = "relu"))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation = "relu"))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation = "relu"))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10, activation = "softmax")) 

# 当label是数字编码时,loss用sparse_categorical_crossentropy,独热编码时用categorical_crossentropy
model.compile(
              optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss = 'categorical_crossentropy',  
              metrics = ['acc'])
              
model.summary()

>> 
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_2 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 128)               100480    
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 64)                8256      
_________________________________________________________________
dropout_4 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_9 (Dense)              (None, 64)                4160      
_________________________________________________________________
dropout_5 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_10 (Dense)             (None, 10)                650       
=================================================================
Total params: 113,546
Trainable params: 113,546
Non-trainable params: 0
_________________________________________________________________
  • 对编译好的模型进行训练
#  模型的训练
history = model.fit(train_image,  train_label_onehot, 
                    batch_size= 32, epochs=10,
                   validation_data=(test_image,test_label_onehot)) 

>>
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 4s 66us/sample - loss: 1.0157 - acc: 0.6175 - val_loss: 0.5866 - val_acc: 0.7851
Epoch 2/10
60000/60000 [==============================] - 4s 61us/sample - loss: 0.6970 - acc: 0.7470 - val_loss: 0.5170 - val_acc: 0.7965
Epoch 3/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.6337 - acc: 0.7724 - val_loss: 0.4967 - val_acc: 0.8118
Epoch 4/10
60000/60000 [==============================] - 4s 63us/sample - loss: 0.6070 - acc: 0.7844 - val_loss: 0.4941 - val_acc: 0.8302
Epoch 5/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.5929 - acc: 0.7926 - val_loss: 0.4668 - val_acc: 0.8285
Epoch 6/10
60000/60000 [==============================] - 4s 63us/sample - loss: 0.5782 - acc: 0.7977 - val_loss: 0.4591 - val_acc: 0.8297
Epoch 7/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.5650 - acc: 0.8035 - val_loss: 0.4814 - val_acc: 0.8290
Epoch 8/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.5580 - acc: 0.8100 - val_loss: 0.4490 - val_acc: 0.8410
Epoch 9/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.5463 - acc: 0.8142 - val_loss: 0.4525 - val_acc: 0.8334
Epoch 10/10
60000/60000 [==============================] - 4s 63us/sample - loss: 0.5355 - acc: 0.8159 - val_loss: 0.4362 - val_acc: 0.8410
  • 利用predict方法查看预测的情况,预测值和真是值一致,说明这个数预测正确
predict = model.predict(test_image)
print(np.argmax(predict[0]))              # 最大位是第9位,即预测为9
print(test_label[0])                      # 预测正确
>> 9
   9
  • 同理绘制acc的图像观察模型训练的效果
# 绘制acc和val_acc的图像
plt.plot(history.epoch, history.history.get('acc'), label = "acc")
plt.plot(history.epoch, history.history.get('val_acc'), label = "val_acc")
plt.legend()                

>> 
<matplotlib.legend.Legend at 0x1a52b9a5148>

在这里插入图片描述
可以看出acc和val_acc都一直在上升,说明模型训练不足,还有上升空间,可以多加一些训练轮数,或者加深网络层数。

  • 1
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值