此文章是本人看了吴恩达的深度学习与神经网络后所做的课后作业。
本人编程的过程中发现了吴恩达所提供的课后作业的某些地方有些许错误,或是某些库中的函数已被淘汰。
本文基于吴恩达课程所提供的代码进行了些许更新与改进。
并且在最后测试图片时发现,识别准确率着实很低(尽管在测试集中的准确率达到了70%, 但本人重新找了很多与猫无关的照片,发现都会被识别为猫,可见这一模型的鸡肋)
O、前期准备
0.1 训练集与测试集
https://download.csdn.net/download/Hubert321/12661358
0.2 所需标准库
本文所需标准库:
import numpy as np
import matplotlib.pyplot as plt
import h5py
import cv2
一、读取数据集
首先,你需要调用h5py库中的函数来读取h5文件中的数据(h5文件中存储了图片信息)
将train_catvnoncat.h5与test_catvnoncat.h5文件放置与工程文件一起即可。
#读取数据集
def load_dataset():
train_dataset = h5py.File('train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
#读取数据集
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
在本题所读取的训练数据集中,共有209张图片,在测试训练集中,共有50张图片。每张图片为64*64大小的RGB图片。
其中,train_set_x_orig 中存储了训练集中每张图片的信息,train_set_y中存储了训练集中每张图片”是否为猫“的信息。test_set_x_orig和test_set_y同理。classes中存储的是 [b’non-cat’ b’cat’]
所以,train_set_x_orig.shape = (209, 64, 64, 3), train_set_y.shape = (1,209), test_set_x_orig = (50,64,64,3), test_set_y = (1,50), classes = (2,)
二、图片预处理
首先要将每张图片的矩阵信息转化为向量,也就是将(64, 64, 3)的图片reshape成(64643)的形式。对于n张图片,也就是将(n, 64, 64, 3)的图片数据转换成(n, 64 * 64 * 3)的形式。
接着对图片进行标准化(像素值/255)
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.
X.reshape(n, -1)的含义是:将X转换成一个X.shape(n, m)的矩阵,m的值由numpy自行计算。
在此题中,train_set_x_flatten.shape = (209, 64643).T = (64643, 209), 同理,test_set_x_flatten.shape = (64643, 50)
三、Logistic回归预测
1)参数X, Y为所要输入的图片信息
2)w, b 是输入X, Y后你所要训练出的参数
3)learning_rate(学习率)和num_iterations(利用梯度下降法求参数w, b时迭代次数)是你自己所要设置的参数,你需要找到一个最佳参数来使得该模型的识别准确率最高。
1 Logistic Regression 相关公式
1.1 sigmoid函数
预测概率A = σ(z)
sigmoid函数:σ(z) = 1 / (1 + e ^ (-z) )
其中z = np.dot(w.T, X) - b
1.2 cost function
cost函数:J(w,b) = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))
dw = dJ/dw = 1 / m * (np.dot(X, (A - Y).T))
db = dJ/db = 1 / m * np.sum(A - Y)
其中,m是每张图片中的像素个数(在本文章中m = 64*64*3)
1.3 梯度下降法求w, b
for i in range(num_iterations):
w = w - learning_rate*dw
b = b - learning_rate*db
2 代码部分
2.1 sigmoid函数
其中,z = np.dot(w.T, X) - b
def sigmoid(z):
s = 1/(1 + np.exp(-z))
return s
2.2 初始化参数w, b
def initialize_paraments(dim):
w = np.zeros((dim, 1))
b = 0
return w, b
dim是你所要初始化w的size(也可以说是你所要训练的图片的数量,因为w的规模大小取决于图片的数量)
2.3 计算dw, db(为计算w和b做准备)
def propagate(w, b, X, Y):
m = X.shape[1]
A = sigmoid(np.dot(w.T, X) + b)
cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))
dw = 1 / m * (np.dot(X, (A - Y).T))
db = 1 / m * np.sum(A - Y)
grads = {"dw":dw,
"db":db}
return grads, cost
w, b是你利用函数initialize_paraments(dim)初始化后的结果,X, Y是训练集图片信息。
函数中的m = X.shape[1] 是每张图片中的像素个数(在本文章中m = 64*64*3)
2.4 训练出最佳参数w, b(梯度下降法)
w = w - learning_rate*dw
b = b - learning_rate*db
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
costs = []
for i in range(num_iterations):
grads, cost = propagate(w, b, X, Y)
dw = grads["dw"]
db = grads["db"]
w = w - learning_rate * grads["dw"]
b = b - learning_rate * grads["db"]
if i % 100 == 0:
costs.append(cost)
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
所输入的w, b是已经被初始化的参数。X, Y是图片信息。
num_iterations和learning_rate是你所要设置的参数(在本文章中,建议将这两个值分别设置为2000和0.005)
2.5 计算概率,预测结果
利用A = sigmoid(np.dot(w.T, X) + b)来计算概率
若概率大于50%,则判定为1
若小于等于50%,则判定为0
2.5.1 测试集图片预测
def predict(w, b, X):
m = X.shape[1]
Y_prediction = np.zeros((1, m))
w = w.reshape(X.shape[0], 1)
A = sigmoid(np.dot(w.T, X) + b)
for i in range(A.shape[1]):
if A[0, i] <= 0.5:
Y_prediction[0, i] = 0
else:
Y_prediction[0, i] = 1
assert(Y_prediction.shape == (1, m))
return Y_prediction
2.5.2 任意jpg图片预测
def isCat(w, b, img):
img = cv2.resize(img, (train_set_x_orig.shape[1], train_set_x_orig.shape[1]))
img = img.reshape(train_set_x_orig.shape[1] * train_set_x_orig.shape[1] * 3, 1)
img = img / 255
A = sigmoid(np.dot(w.T, img) + b)
if A <= 0.5:
A = 0
else:
A = 1
return A
2.6 测试训练集及测试集的准确度
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.005, print_cost = False):
w, b = initialize_paraments(X_train.shape[0])
paraments, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
w = paraments["w"]
b = paraments["b"]
Y_prediction_train = predict(w, b, X_train)
Y_prediction_test = predict(w, b, X_test)
# Print train/test Errors
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d
四、使用
d = model(train_set_x, train_set_y, test_set_x, test_set_y, 2000, 0.005, True)
w = d["w"]
b = d["b"]
print("请将你所要测试的图片放置入此工程文件夹中")
filename = input("请输入图片名称:")
img = cv2.imread(filename)
flag = isCat(w,b,img)
if(flag):
print("是猫")
else:
print("不是猫")
五、总结
利用Logistic回归来识别猫的准确率在本文所提供的训练集与测试集中的高达70%。
在用测试集测试后,本人搜集了很多图片来测试该模型,发现我所搜集的所有与猫无关的照片都被识别为猫,准确率实在很低,一度让我觉得我在读取图片或是代码出现了问题 。
此模型对于神经网络入门练练手还行,但不要将此模型投入使用。
该模型的完整代码链接:“https://download.csdn.net/download/Hubert321/12664195”
该文章若有错误,请评论!!!感激不尽!!!