笔记
- 输入图片的格式
图片有三个通道,RGB,将图片作为input,要转换为一列向量,比如一个64 * 64大小的RGB图像,其输入x应为64 * 64 * 3维的列向量。
X矩阵表示数据集,由m个训练数据x组成,X的维度为n * m,n表示x的维度,m表示数据个数。
Y矩阵为标签矩阵,维度为1 * m。 - Logistic function
损失函数loss function 表示一个训练数据的损失。
cost function 表示整个训练集的损失。我们要做的是训练出w和b使cost function越小越好。
梯度下降 Gradient descent 用来找到使cost function最小的w和b。
w : = w − ∂ J ∂ w b : = b − ∂ J ∂ b w := w - \frac{\partial J}{\partial w} \qquad b := b - \frac{\partial J}{\partial b} w:=w−∂w∂Jb:=b−∂b∂J - 向量化
np.dot(a,b)
来进行点乘计算。
比较向量化和循环之间的时间差。
import numpy as np
import time
a = np.random.rand(1000000)
b = np.random.rand(1000000)
#for loop
c = 0
tic = time.time()
for i in range(1000000):
c += a[i]*b[i]
toc = time.time()
print(c) #250050.84975607513
print("time of for loop is " + str((toc-tic)*1000) + "ms") #time of for loop is :467.257022857666ms
#vectorize
c = 0
tic = time.time()
c = np.doc(a,b)
toc = time.time()
print(c) #250050.84975607155
print("time of vectorized is " + str((toc-tic)*1000) + "ms") #time of vectorized is :0.8158683776855469ms
结果相差了400多倍。
cpu和gpu上都有并行计算指令,也称作SIMD(single instruction multiple data)命令。
通过np.function 代替显示的循环指令,可以使python的numpy通过并行化大大提高运行速度。
GPU的并行计算能力比cpu更好。
Whenever possible, avoid explicit for-loops.
只要能通过向量化进行并行计算,就不要使用显式循环计算。
-
broadcasting
如果你有一个m * n的矩阵,让它加减乘除一个1 * n的矩阵,它会被复制m次,成为一个m * n的矩阵,然后再逐元素地进行加减乘除操作。同样地对m * 1的矩阵成立。
-
reshape要学会使用,能减少很多维度bug
在定义变量时不要定义出(5,)这种元组,当(5,)与(1,5)相乘时会出现维度error。当不确定现在的维度是不是前面那种形式时,用reshape来确定。
并且多使用assert函数,方便检查bug。
-
Logistic Regression 公式推导
-
梯度下降公式推导
-
复现logistic regression代码
import numpy as np
import matplotlib.pyplot as plt
import h5py
from PIL import Image
from lr_utils import load_dataset
#确定数据集的size
X_train,Y_train,X_test,Y_test,classes = load_dataset()
print("the size of X_train is: "+ str(X_train.shape))
print("the size of X_test is: "+ str(X_test.shape))
print("the size of Y_train is: "+ str(Y_train.shape))
print("the size of Y_test is: "+ str(Y_test.shape))
#————————————————————————————————————————————————————
#the size of X_train is: (209, 64, 64, 3)
#the size of X_test is: (50, 64, 64, 3)
#the size of Y_train is: (1, 209)
#the size of Y_test is: (1, 50)
#查看图像
plt.imshow(X_train[2])
#将image转化成向量,X.shape=[h*w*c,m]
def img2vec(X):
m = X.shape[0]
h = X.shape[1]
w = X.shape[2]
c = X.shape[3]
vec = X.reshape((m,h*w*c)).T #这里注意,由于X的第一维是m,所以reshape时也要让第一维是m,之后再转置
vec = vec/255 #Normalization
return vec
#定义sigmoid函数
def sigmoid(x):
s = 1/(1+np.exp(-x))
return s
#计算cost function和梯度
def cost_grad_function(w,b,X,Y):
m = X.shape[1]
A = sigmoid(np.dot(w.T,X)+b).reshape(1,-1)
J = -1/m*(np.dot(np.log(A),Y.T)+np.dot(np.log(1-A),(1-Y).T))
dw = 1/m*np.dot(X,(A-Y).T)
db = 1/m*np.sum(A-Y)
return A,J,dw,db
#进行训练
def train(X,Y,num_iter=1000,lr=0.001):
X = img2vec(X)
w = np.zeros((X.shape[0],1))
b = 0.0
for i in range(num_iter):
A,J,dw,db = cost_grad_function(w,b,X,Y)
w = w - lr*dw
b = b - lr*db
if i%100 == 0:
print("iter=%d, cost=%s" %(i,str(J)))
return w,b
#————————————————————————————————————————————————
#iter=0, cost=[[0.69314718]]
#iter=100, cost=[[0.58450836]]
#iter=200, cost=[[0.46694904]]
#iter=300, cost=[[0.37600687]]
#iter=400, cost=[[0.33146329]]
#iter=500, cost=[[0.30327307]]
#iter=600, cost=[[0.27987959]]
#iter=700, cost=[[0.26004214]]
#iter=800, cost=[[0.24294068]]
#iter=900, cost=[[0.22800422]]
#iter=1000, cost=[[0.21481951]]
#iter=1100, cost=[[0.20307819]]
#iter=1200, cost=[[0.19254428]]
#iter=1300, cost=[[0.18303334]]
#iter=1400, cost=[[0.17439859]]
#iter=1500, cost=[[0.1665214]]
#iter=1600, cost=[[0.15930452]]
#iter=1700, cost=[[0.15266732]]
#iter=1800, cost=[[0.14654224]]
#iter=1900, cost=[[0.14087208]]
#计算精度和预测值
def predict(X_test,Y_test,w,b):
y_pre = np.zeros((1,Y_test.shape[1]))
X_test = img2vec(X_test)
A,J,dw,db = cost_grad_function(w,b,X_test,Y_test)
for i in range(A.shape[1]):
if A[0,i]>0.5: y_pre[0,i] = 1
else: y_pre[0,i] = 0
accuracy = 1.-np.sum(abs(y_pre-Y_test))/float(Y_test.shape[1])
return accuracy,y_pre
accuracy,y_pre = predict(X_test,Y_test,w,b)
print(accuracy)
#————————————————————
#0.7