0. 环境
Ubuntu 18.04,64bit,i3-6100,8G
Python 3.6 + tensorflow + keras
Ubuntu为了想知道参数值,特意安装了IDLE。安装后发现只支持Python 3,于是又使用pip3安装了一遍各种软件包。
1. 代码
# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
import argparse
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-o", "--output", required=True,
help="path to the output loss/accuracy plot")
args = vars(ap.parse_args())
# load the training and testing data, scale it into the range [0, 1],
# then reshape the design matrix
print("[INFO] loading CIFAR-10 data...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float") / 255.0
testX = testX.astype("float") / 255.0
trainX = trainX.reshape((trainX.shape[0], 3072))
testX = testX.reshape((testX.shape[0], 3072))
# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)
# initialize the label names for the CIFAR-10 dataset
labelNames = ["airplane", "automobile", "bird", "cat", "deer",
"dog", "frog", "horse", "ship", "truck"]
# define the 3072-1024-512-10 architecture using Keras
model = Sequential()
model.add(Dense(1024, input_shape=(3072,), activation="relu"))
model.add(Dense(512, activation="relu"))
model.add(Dense(10, activation="softmax"))
# train the model using SGD
print("[INFO] training network...")
sgd = SGD(0.01)
model.compile(loss="categorical_crossentropy", optimizer=sgd,
metrics=["accuracy"])
H = model.fit(trainX, trainY, validation_data=(testX, testY),
epochs=100, batch_size=32)
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=32)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1), target_names=labelNames))
# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 100), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 100), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 100), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, 100), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["output"])
2. 数据集
数据集总共有70K个样本。执行代码的过程中,Ubuntu环境下会自动把数据集下载到~/.keras/datasets/中。下载完可以用ls -hl 查看这个 压缩包的大小。是100多MB的。
3. 运行结果
[INFO] evaluating network...
precision recall f1-score support
airplane 0.65 0.63 0.64 1000
automobile 0.69 0.65 0.67 1000
bird 0.45 0.47 0.46 1000
cat 0.38 0.39 0.39 1000
deer 0.54 0.46 0.49 1000
dog 0.44 0.55 0.49 1000
frog 0.66 0.60 0.63 1000
horse 0.67 0.61 0.64 1000
ship 0.67 0.71 0.69 1000
truck 0.60 0.63 0.61 1000
avg / total 0.58 0.57 0.57 10000
损失函数曲线:
训练使用了50000个样本。每次迭代需要近一分钟。迭代了100次。平均精度只有58%。是难度比较大的一个训练集。
代码来源于Deep.Learning.for.Computer.Vision.with.Python.Starter.Bundle