[深度学习1] Logistic regression
Logistic regression
公式:
y
p
=
σ
(
w
T
x
+
b
)
y_p=\sigma(w^Tx+b)
yp=σ(wTx+b)
sigmoid函数:
σ
(
z
)
=
1
1
+
e
−
z
\sigma(z)=\frac{1}{1+e^{-z}}
σ(z)=1+e−z1
sigmoid函数的导数:
d
σ
(
z
)
/
d
z
=
σ
(
z
)
(
1
−
σ
(
z
)
)
d\sigma(z)/dz=\sigma(z)(1-\sigma(z))
dσ(z)/dz=σ(z)(1−σ(z))
损失函数
单个训练样本的损失函数:
L
(
y
p
,
y
)
=
−
y
l
o
g
(
y
p
)
−
(
1
−
y
)
l
o
g
(
1
−
y
p
)
L(y_p,y)=-ylog(y_p)-(1-y)log(1-y_p)
L(yp,y)=−ylog(yp)−(1−y)log(1−yp)
m个训练样本的损失函数:
J
(
w
,
b
)
=
1
m
∑
i
=
1
m
(
−
y
(
i
)
l
o
g
(
y
p
(
i
)
)
−
(
1
−
y
(
i
)
)
l
o
g
(
1
−
y
p
(
i
)
)
)
J(w,b)=\frac{1}{m}\sum_{i=1}^{m}(-y^{(i)}log(y_p^{(i)}) - (1-y^{(i)})log(1-y_p^{(i)}))
J(w,b)=m1i=1∑m(−y(i)log(yp(i))−(1−y(i))log(1−yp(i)))
梯度下降
参数更新:
w
=
w
−
α
∂
J
(
w
,
b
)
∂
(
w
)
w=w-\alpha\frac{\partial{J(w,b)}}{\partial(w)}
w=w−α∂(w)∂J(w,b)
b = b − α ∂ J ( w , b ) ∂ ( b ) b=b-\alpha\frac{\partial{J(w,b)}}{\partial(b)} b=b−α∂(b)∂J(w,b)
在Logistic regression中对dJ/dw
和 dJ/db
进行计算:
[公式中的 J
和 L
都表示损失函数]
根据链式求导法则:
d
L
d
z
=
d
L
d
a
∗
d
a
d
z
\frac{dL}{dz}=\frac{dL}{da}*\frac{da}{dz}
dzdL=dadL∗dzda
分别对两部分进行计算
d
L
d
a
=
−
y
/
a
+
(
1
−
y
)
/
(
1
−
a
)
\frac{dL}{da}=-y/a+(1-y)/(1-a)
dadL=−y/a+(1−y)/(1−a)
(对数函数求导)
d
a
d
z
=
a
∗
(
1
−
a
)
\frac{da}{dz}=a*(1-a)
dzda=a∗(1−a)
(sigmoid函数求导)
将两部分相乘:
d
z
=
d
L
d
z
=
(
−
y
a
+
1
−
y
1
−
a
)
∗
a
(
1
−
a
)
=
a
−
y
dz=\frac{dL}{dz}=(-\frac{y}{a}+\frac{1-y}{1-a})*a(1-a)=a-y
dz=dzdL=(−ay+1−a1−y)∗a(1−a)=a−y
只需要记住
d
z
=
a
−
y
dz=a-y
dz=a−y
如果一次使用m个样本进行参数更新:
d
w
1
=
1
m
∑
i
=
1
m
x
1
(
i
)
d
z
(
i
)
=
1
m
∑
i
=
1
m
x
1
(
i
)
(
a
(
i
)
−
y
(
i
)
)
dw_1=\frac{1}{m}\sum_{i=1}^{m}x_1^{(i)}dz^{(i)}=\frac{1}{m}\sum_{i=1}^{m}x_1^{(i)}(a^{(i)}-y^{(i)})
dw1=m1i=1∑mx1(i)dz(i)=m1i=1∑mx1(i)(a(i)−y(i))
d w 2 = 1 m ∑ i = 1 m x 2 ( i ) d z ( i ) = 1 m ∑ i = 1 m x 2 ( i ) ( a ( i ) − y ( i ) ) dw_2=\frac{1}{m}\sum_{i=1}^{m}x_2^{(i)}dz^{(i)}=\frac{1}{m}\sum_{i=1}^{m}x_2^{(i)}(a^{(i)}-y^{(i)}) dw2=m1i=1∑mx2(i)dz(i)=m1i=1∑mx2(i)(a(i)−y(i))
对w的各个参数的更新按照上式进行,这个公式的由来可以理解如下(这一下部分没找到参考资料, 但应该正确):
d
w
1
=
d
L
d
w
1
dw_1=\frac{dL}{dw_1}
dw1=dw1dL
接下来就是应用链式求导法则:
d
w
1
=
d
L
d
w
1
=
d
L
d
z
∗
d
z
d
w
1
dw_1=\frac{dL}{dw_1}=\frac{dL}{dz}*\frac{dz}{dw_1}
dw1=dw1dL=dzdL∗dw1dz
而根据
z
=
w
T
x
+
b
=
w
1
x
1
+
w
2
x
2
+
.
.
.
+
b
z=w^Tx+b=w_1x_1+w_2x_2+...+b
z=wTx+b=w1x1+w2x2+...+b
d z d w 1 = x 1 \frac{dz}{dw_1}=x_1 dw1dz=x1
接下来对b进行更新:
d
b
=
1
m
∑
i
=
1
m
d
z
(
i
)
=
1
m
∑
i
=
1
m
(
a
(
i
)
−
y
(
i
)
)
db=\frac{1}{m}\sum_{i=1}^{m}dz^{(i)}=\frac{1}{m}\sum_{i=1}^{m}(a^{(i)}-y^{(i)})
db=m1i=1∑mdz(i)=m1i=1∑m(a(i)−y(i))
向量化
如果使用 for 循环完成m个样本的参数更新, 计算效率会非常低, 所以需要进行向量化,提高程序运行速度。
Z
=
w
T
x
+
b
=
n
p
.
d
o
t
(
w
.
T
,
x
)
+
b
Z=w^Tx+b=np.dot(w.T,x)+b
Z=wTx+b=np.dot(w.T,x)+b
A = σ ( z ) A=\sigma(z) A=σ(z)
d z = A − Y dz=A-Y dz=A−Y
d
w
=
1
m
∗
X
∗
d
z
T
dw=\frac{1}{m}*X*dz^T
dw=m1∗X∗dzT
d
b
=
1
m
∗
n
p
.
s
u
m
(
d
Z
,
a
x
i
s
=
1
)
db=\frac{1}{m}*np.sum(dZ,axis=1)
db=m1∗np.sum(dZ,axis=1)
因为每个输入x都是一列, 所以 Z
和 A
都是一行, 需要按 axis=1 相加
w
=
w
−
α
∂
J
(
w
,
b
)
∂
(
w
)
w=w-\alpha\frac{\partial{J(w,b)}}{\partial(w)}
w=w−α∂(w)∂J(w,b)
b = b − α ∂ J ( w , b ) ∂ ( b ) b=b-\alpha\frac{\partial{J(w,b)}}{\partial(b)} b=b−α∂(b)∂J(w,b)
编程实现
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from scipy import ndimage
from lr_utils import load_dataset
#加载数据的函数,数据一斤存放在datasets文件夹下
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) #训练集特征
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) #训练集标签
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # 测试集特征
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # 测试集标签
classes = np.array(test_dataset["list_classes"][:]) # 分类的类别
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
# 通过上述函数加载数据
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
m_train = train_set_x_orig.shape[0] #训练集样本个数
m_test = test_set_x_orig.shape[0] #测试机样本个数
num_px = train_set_x_orig.shape[1] #每张图片的长或者高(因为给定的数据集是正方形的)
#将训练集和测试集的每张图片转为一列
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
#因为图片每个像素点的最大值为255,通过以下方式将数据进行归一化
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.
#定义sigmoid函数
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
# 将w和b 进行归一化,w是一列,维度会和每个flatten后的样本大小相同
def initialize_with_zeros(dim):
w = np.zeros((dim, 1))
b = 0
return w, b
# GRADED FUNCTION: propagate
def propagate(w, b, X, Y):
"""
进行前向传播,并返回dw db和cost
"""
m = X.shape[1]
A = sigmoid(np.dot(w.T, X) + b) # 前向传播
cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A)) # 损失函数
#计算反向传播的dw,db,
#dz=(A-Y) 当激活函数为sigmoid函数时成立
dw = 1 / m * np.dot(X, (A - Y).T)
db = 1 / m * np.sum(A - Y)
grads = {"dw": dw,
"db": db}
return grads, cost
# 参数优化
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=False):
costs = [] #存放cost 的列表,用于绘制损失函数曲线
for i in range(num_iterations):
grads, cost = propagate(w, b, X, Y)
dw = grads["dw"]
db = grads["db"]
# 参数更新
w = w - learning_rate * dw
b = b - learning_rate * db
if i % 100 == 0:
costs.append(cost)
if print_cost and i % 100 == 0:
print("Cost after iteration %i: %f" % (i, cost))
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
# 预测
def predict(w, b, X):
m = X.shape[1]
Y_prediction = np.zeros((1, m))
w = w.reshape(X.shape[0], 1)
A = sigmoid(np.dot(w.T, X) + b)
for i in range(A.shape[1]):
if A[0, i] <= 0.5:
Y_prediction[0, i] = 0
else:
Y_prediction[0, i] = 1
return Y_prediction
# 调用上述函数,组成一个统一的模型
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
w, b = initialize_with_zeros(X_train.shape[0])
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
w = parameters["w"]
b = parameters["b"]
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train": Y_prediction_train,
"w": w,
"b": b,
"learning_rate": learning_rate,
"num_iterations": num_iterations}
return d
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
# 绘制损失函数变化曲线
costs = np.squeeze(d['costs'])
plt.figure()
plt.plot(costs)
print(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()
参考资料
- 吴恩达深度学习教程