深度学习DeepLearning.ai系列课程学习总结:4. Logistic代码实战

转载过程中,图片丢失,代码显示错乱。

为了更好的学习内容,请访问原创版本:

http://www.missshi.cn/api/view/blog/59aa08fee519f50d04000170

Ps:初次访问由于js文件较大,请耐心等候(8s左右)



本节课中,我们将学习如何利用Python的来Logistic。
这是第一节Python代码内容,接下来我们将从一些基本的Python编程开始讲述。


使用numpy构建基本函数

numpy是Python在科学计算中最常用的库。接下来我们将要学习一些numpy中包含的常用函数。


练习1:利用np.exp()实现sigmod函数:

在利用np.exp()函数之前,我们首先使用math.exp()函数来实现sigmod函数,并将二者对比来突出np.exp()的优点。

其中,


  
  
1
2
3
4
5
6
7
8
9
10
11
import math
def basic_sigmod ( x ):
"""
#sigmod
"""
s = 1.0 / ( 1 + 1 / math. exp ( x ))
return s
print basic_sigmod ( 3 )
#0.9525741268224334

上述描述了如何对一个标量执行sigmod函数,而在深度学习的应用中,我们通过是对向量或者矩阵来执行sigmod运算。

如何执行将该函数用于矢量或者矩阵,那么系统会抛出异常:


  
  
1
print basic_sigmod ([ 3, 2, 1 ])

而如果使用的是np.exp函数的话,如果输入的是一个矢量或者矩阵,那么对应的输出也会是矢量或矩阵,即针对每个元素进行指数计算。


  
  
1
2
3
4
import numpy as np
x = np. array ([ 1, 2, 3 ])
print np. exp ( x )
#[2.718281837.389056120.08553692]

此外,对于numpy array类型的变量,其加减乘除的方法也统一被改写。

以下面的例子为例:


  
  
1
2
3
x = np. array ([ 1, 2, 3 ])
print x + 3
#[456]

接下来,我们来实现一个真正的、可用于矢量或矩阵的sigmod函数:

其需求如下:


  
  
1
2
3
4
5
6
7
8
9
10
11
12
import numpy as np
def sigmod ( x ):
"""
#sigmod
"""
s = 1.0 / ( 1 + 1 / np. exp ( x ))
return s
x = np. array ([ 1, 2, 3 ])
print np. exp ( x )
#[0.73105858,0.88079708,0.95257413]


练习2:计算sigmod函数的导数

在之前的理论课程中,我们学习到了sigmod函数的导数公式如下:

接下来,我们通过Python代码进行实现:


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def sigmoid_derivative ( x ):
"""
Computethegradient(alsocalledtheslopeorderivative)ofthesigmoidfunctionwithrespecttoitsinputx.
Youcanstoretheoutputofthesigmoidfunctionintovariablesandthenuseittocalculatethegradient.
Arguments:
x--Ascalarornumpyarray
Return:
ds--Yourcomputedgradient.
"""
s = 1.0 / ( 1 + 1 / np. exp ( x ))
ds = s * ( 1 - s )
return ds
x = np. array ([ 1, 2, 3 ])
print "sigmoid_derivative(x)=" + str ( sigmoid_derivative ( x ))
#sigmoid_derivative(x)=[0.196611930.104993590.04517666]


练习3:将一副图像转为为一个向量

在numpy中,有两个常用的函数:np.shape和np.reshape()。

其中,X.shape可以用于查看当前矩阵的维度。

X.reshape()可以用于修改矩阵的维度或形状。

例如,对于一副彩色图像,其通常是由一个三维矩阵组成的(RGB三个通道)。然而,在深度学习的应用中,我们通常需要将其转换为一个矢量,其长度为3*length*width。

即我们需要将一个三维的矩阵转换为一个一维的向量。

接下来,我们需要实现一个image2vector函数,其输入为一个三维矩阵(length, height, 3),输出为一个矢量。


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def image2vector ( image ):
"""
Argument:
image--anumpyarrayofshape(length,height,depth)
Returns:
v--avectorofshape(length*height*depth,1)
"""
v = image. reshape (( image. shape [ 0 ] * image. shape [ 1 ] * image. shape [ 2 ], 1 ))
return v
image = np. array ([[[ 0.67826139, 0.29380381 ],
[ 0.90714982, 0.52835647 ],
[ 0.4215251 , 0.45017551 ]],
[[ 0.92814219, 0.96677647 ],
[ 0.85304703, 0.52351845 ],
[ 0.19981397, 0.27417313 ]],
[[ 0.60659855, 0.00533165 ],
[ 0.10820313, 0.49978937 ],
[ 0.34144279, 0.94630077 ]]])
print "image2vector(image)=" + str ( image2vector ( image ))
#[[0.67826139][0.29380381][0.90714982][0.52835647][0.4215251][0.45017551][0.92814219][0.96677647][0.85304703][0.52351845][0.19981397][0
     .27417313][0.60659855][0.00533165][0.10820313][0.49978937][0.34144279][0.94630077]]


练习4:按行归一化

在深度学习中,常用的一个技巧是需要对我们的数据进行归一化。

通过,在对数据进行归一化后,梯度下降算法的收敛速度会明显加快。

接下来,我们需要对一个矩阵进行按行归一化,归一化后的结果是每一个的长度为1。

例如:

  


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def normalizeRows ( x ):
"""
Implementafunctionthatnormalizeseachrowofthematrixx(tohaveunitlength).
Argument:
x--Anumpymatrixofshape(n,m)
Returns:
x--Thenormalized(byrow)numpymatrix.Youareallowedtomodifyx.
"""
x_norm = np. linalg. norm ( x, axis = 1, keepdims = True ) #
x = x / x_norm #numpy广
return x
x = np. array ([
[ 0, 3, 4 ],
[ 1, 6, 4 ]])
print "normalizeRows(x)=" + str ( normalizeRows ( x ))
#normalizeRows(x)=[[0.0.60.8][0.137360560.824163380.54944226]]

在上面的代码中,我们利用了广播的特性,接下来我们主要学习一下广播的使用。


练习5:广播的使用及softmax函数的实现

广播是numpy中一个非常强大的功能,它可以帮助我们对不同维度的矩阵、向量、标量之前快速计算。

接下来,我们需要实现一个softmax函数,其定义如下:


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def softmax ( x ):
"""Calculatesthesoftmaxforeachrowoftheinputx.
Yourcodeshouldworkforarowvectorandalsoformatricesofshape(n,m).
Argument:
x--Anumpymatrixofshape(n,m)
Returns:
s--Anumpymatrixequaltothesoftmaxofx,ofshape(n,m)
"""
x_exp = np. exp ( x ) #(n,m)
x_sum = np. sum ( x_exp, axis = 1, keepdims = True ) #(n,1)
s = x_exp / x_sum #(n,m)广
return s
x = np. array ([
[ 9, 2, 5, 0, 0 ],
[ 7, 5, 0, 0 , 0 ]])
print "softmax(x)=" + str ( softmax ( x ))
#softmax(x)=[[9.80897665e-018.94462891e-041.79657674e-021.21052389e-041.21052389e-04][8.78679856e-011.18916387e-018.01252314e-048.01252314e-048.01252314e-04]]


矢量化

在深度学习中,我们通常会处理大数据量的数据集。

因此=,计算速度可能会成为整个训练过程中的瓶颈。

为了保证我们计算的效率,我们需要对进行过程矢量化。

接下来,我们对比一下是否使用矢量化对于点乘、外积和按元素相乘等操作来说,计算效率的比较。

首先,利用原生方法的实现过程如下:


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import time
x1 = [ 9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0 ]
x2 = [ 9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0 ]
###CLASSICDOTPRODUCTOFVECTORSIMPLEMENTATION###
tic = time. process_time ( )
dot = 0
for i in range ( len ( x1 )):
dot += x1 [ i ] * x2 [ i ]
toc = time. process_time ( )
print ( "dot-----Computationtime=" + str ( 1000 * ( toc - tic )) + "ms" )
###CLASSICOUTERPRODUCTIMPLEMENTATION###
tic = time. process_time ( )
outer = np. zeros (( len ( x1 ), len ( x2 ))) #wecreatealen(x1)*len(x2)matrixwithonlyzeros
for i in range ( len ( x1 )):
for j in range ( len ( x2 )):
outer [ i, j ] = x1 [ i ] * x2 [ j ]
toc = time. process_time ( )
print ( "outer-----Computationtime=" + str ( 1000 * ( toc - tic )) + "ms" )
###CLASSICELEMENTWISEIMPLEMENTATION###
tic = time. process_time ( )
mul = np. zeros ( len ( x1 ))
for i in range ( len ( x1 )):
mul [ i ] = x1 [ i ] * x2 [ i ]
toc = time. process_time ( )
print ( "elementwisemultiplication-----Computationtime=" + str ( 1000 * ( toc - tic )) + "ms" )
###CLASSICGENERALDOTPRODUCTIMPLEMENTATION###
W = np. random. rand ( 3, len ( x1 )) #Random3*len(x1)numpyarray
tic = time. process_time ( )
gdot = np. zeros ( W. shape [ 0 ])
for i in range ( W. shape [ 0 ]):
for j in range ( len ( x1 )):
gdot [ i ] += W [ i, j ] * x1 [ j ]
toc = time. process_time ( )
print ( "gdot-----Computationtime=" + str ( 1000 * ( toc - tic )) + "ms" )
#dot-----Computationtime=0.17002099999974263ms
#outer-----Computationtime=0.34057500000006513ms
#elementwisemultiplication-----Computationtime=0.1940779999998199ms
#gdot-----Computationtime=0.2362039999999066ms

接下来,利用矢量化实现的结果如下:


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
x1 = [ 9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0 ]
x2 = [ 9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0 ]
###VECTORIZEDDOTPRODUCTOFVECTORS###
tic = time. process_time ( )
dot = np. dot ( x1, x2 )
toc = time. process_time ( )
print ( "dot-----Computationtime=" + str ( 1000 * ( toc - tic )) + "ms" )
###VECTORIZEDOUTERPRODUCT###
tic = time. process_time ( )
outer = np. outer ( x1, x2 )
toc = time. process_time ( )
print ( "outer-----Computationtime=" + str ( 1000 * ( toc - tic )) + "ms" )
###VECTORIZEDELEMENTWISEMULTIPLICATION###
tic = time. process_time ( )
mul = np. multiply ( x1, x2 )
toc = time. process_time ( )
print ( "elementwisemultiplication-----Computationtime=" + str ( 1000 * ( toc - tic )) + "ms" )
###VECTORIZEDGENERALDOTPRODUCT###
tic = time. process_time ( )
dot = np. dot ( W, x1 )
toc = time. process_time ( )
print ( "gdot-----Computationtime=" + str ( 1000 * ( toc - tic )) + "ms" )
#dot-----Computationtime=0.16546899999991815ms
#outer-----Computationtime=0.14168100000011563ms
#elementwisemultiplication-----Computationtime=0.10738799999998605ms
#gdot-----Computationtime=0.38393900000022185ms

从上述结果中,我们可以看到矢量化的代码明显简单了很多。

同时,运行时间也有了一定程度的降低。降低的幅度不大主要是由于数据量较小的原因,随着数据量的增大,减小的幅度也会越来越明显。


练习1:L1误差函数的实现

我们需要使用numpy函数来实现L1误差函数:

其中,L1误差函数的定义如下:

^y表示估计值,y表示真实值。


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
def L1 ( yhat, y ):
"""
Arguments:
yhat--vectorofsizem(predictedlabels)
y--vectorofsizem(truelabels)
Returns:
loss--thevalueoftheL1lossfunctiondefinedabove
"""
loss = np. sum ( np. abs ( y - yhat ))
return loss
yhat = np. array ([ .9, 0.2, 0.1, .4, .9 ])
y = np. array ([ 1, 0, 0, 1, 1 ])
print "L1=" + str ( L1 ( yhat, y ))
#L1=1.1


练习2:L2误差函数的实现

L2误差函数的定义如下:


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
def L2 ( yhat, y ):
"""
Arguments:
yhat--vectorofsizem(predictedlabels)
y--vectorofsizem(truelabels)
Returns:
loss--thevalueoftheL2lossfunctiondefinedabove
"""
loss = np. sum ( np. power (( y - yhat ), 2 ))
return loss
yhat = np. array ([ .9, 0.2, 0.1, .4, .9 ])
y = np. array ([ 1, 0, 0, 1, 1 ])
print "L2=" + str ( L2 ( yhat, y ))
#L2=0.43


Logistic的实现

接下来的内容中,我们将实现一个完成Logistic函数。包括:初始化、计算代价函数和梯度、使用梯度下降算法进行优化等并把他们整合成为一个函数。

本实验用于通过训练来判断一副图像是否为猫。

在这个过程中,我们将会用到如下库:

numpy:Python科学计算中最重要的库

h5py:Python与H5文件交互的库

mathplotlib:Python画图的库

PIL:Python图像相关的库

scipy:Python科学计算相关的库

在程序的开头,我们首先需要引入相关的库:


  
  
1
2
3
4
5
6
7
8
import numpy as np
import matplotlib. pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
% matplotlib inline #matplotlib

在训练之前,首先需要读取数据,读取数据的代码如下:


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def load_dataset ( ):
"""
#
"""
train_dataset = h5py. File ( 'datasets/train_catvnoncat.h5', "r" ) #H5
train_set_x_orig = np. array ( train_dataset [ "train_set_x" ] [: ]) #yourtrainsetfeatures
train_set_y_orig = np. array ( train_dataset [ "train_set_y" ] [: ]) #yourtrainsetlabels
test_dataset = h5py. File ( 'datasets/test_catvnoncat.h5', "r" )
test_set_x_orig = np. array ( test_dataset [ "test_set_x" ] [: ]) #yourtestsetfeatures
test_set_y_orig = np. array ( test_dataset [ "test_set_y" ] [: ]) #yourtestsetlabels
classes = np. array ( test_dataset [ "list_classes" ] [: ]) #thelistofclasses
train_set_y_orig = train_set_y_orig. reshape (( 1, train_set_y_orig. shape [ 0 ])) #reshape
test_set_y_orig = test_set_y_orig. reshape (( 1, test_set_y_orig. shape [ 0 ]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset ( )

数据说明:

对于训练集的标签而言,对于猫,标记为1,否则标记为0。

每一个图像的维度都是(num_px, num_px, 3),其中,长宽相同,3表示是RGB图像。

train_set_x_orig和test_set_x_orig中,包含_orig是由于我们稍候需要对图像进行预处理,预处理后的变量将会命名为train_set_x和train_set_y。

train_set_x_orig中的每一个元素对于这一副图像,我们可以用如下代码将图像显示出来:


  
  
1
2
3
4
index = 25
plt. imshow ( train_set_x_orig [ index ])
print "y = " + str ( train_set_y [:, index ]) + ", it's a '" + classes [ np. squeeze ( train_set_y [:, index ])]. decode ( "utf-8" ) + "' picture."
# y = [1], it's a 'cat' picture.

接下来,我们需要根据图像集来计算出训练集的大小、测试集的大小以及图片的大小:


  
  
1
2
3
4
5
m_train = train_set_x_orig. shape [ 0 ]
m_test = test_set_x_orig. shape [ 0 ]
num_px = train_set_x_orig. shape [ 1 ]
print ( m_train, m_test, num_px )
# 209, 50, 64

接下来,我们需要对将每幅图像转为一个矢量,即矩阵的一列。

最终,整个训练集将会转为一个矩阵,其中包括num_px*numpy*3行,m_train列。


  
  
1
2
train_set_x_flatten = train_set_x_orig. reshape ( train_set_x_orig. shape [ 0 ], - 1 ). T
test_set_x_flatten = test_set_x_orig. reshape ( test_set_x_orig. shape [ 0 ], - 1 ). T

Ps:其中X_flatten = X.reshape(X.shape[0], -1).T可以将一个维度为(a,b,c,d)的矩阵转换为一个维度为(bcd, a)的矩阵。

接下来,我们需要对图像值进行归一化。

由于图像的原始值在0到255之间,最简单的方式是直接除以255即可。


  
  
1
2
train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.

接下来,我们来看一下Logistic的结构:

对于每个训练样本x,其误差函数的计算方式如下:

而整体的代价函数计算如下:

接下来,我们将按照如下步骤来实现Logistic:

1. 定义模型结构

2. 初始化模型参数

3. 循环

    3.1 前向传播

    3.2 反向传递

    3.3 更新参数

4. 整合成为一个完整的模型


Step1:实现sigmod函数


  
  
1
2
3
4
5
6
7
8
9
10
11
12
def sigmoid ( z ):
"""
Compute the sigmoid of z
Arguments:
z -- A scalar or numpy array of any size.
Return:
s -- sigmoid(z)
"""
s = 1.0 / ( 1 + 1 / np. exp ( z ))
return s

Step2:初始化参数


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def initialize_with_zeros ( dim ):
"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
Argument:
dim -- size of the w vector we want (or number of parameters in this case)
Returns:
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias)
"""
w = np. zeros (( dim, 1 ))
b = 0
return w, b

Step3:前向传播与反向传播

Ps:计算公式如下:(具体计算公式来源请查看之前的理论课)


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def propagate ( w, b, X, Y ):
"""
Implement the cost function and its gradient for the propagation explained above
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""
m = X. shape [ 1 ]
# FORWARD PROPAGATION (FROM X TO COST)
A = sigmoid ( np. dot ( w. T, X ) + b ) # compute activation
cost = - 1 / m * np. sum ( Y * np. log ( A ) + ( 1 - Y ) * np. log ( 1 - A )) # compute cost
# BACKWARD PROPAGATION (TO FIND GRAD)
dw = 1 / m * np. dot ( X, ( A - Y ). T )
db = 1 / m * np. sum ( A - Y )
cost = np. squeeze ( cost )
grads = { "dw": dw,
"db": db }
return grads, cost

Step4:更新参数

更新参数的公式如下:

完整代码如下:


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def optimize ( w, b, X, Y, num_iterations, learning_rate, print_cost = False ):
"""
This function optimizes w and b by running a gradient descent algorithm
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of shape (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- True to print the loss every 100 steps
Returns:
params -- dictionary containing the weights w and bias b
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
Tips:
You basically need to write down two steps and iterate through them:
1) Calculate the cost and the gradient for the current parameters. Use propagate().
2) Update the parameters using gradient descent rule for w and b.
"""
costs = [ ]
for i in range ( num_iterations ): # num_iterations
# Cost and gradient calculation
grads, cost = propagate ( w, b, X, Y )
# Retrieve derivatives from grads
dw = grads [ "dw" ]
db = grads [ "db" ]
# update rule
w = w - learning_rate * dw
b = b - learning_rate * db
# Record the costs
if i % 100 == 0:
costs. append ( cost )
# Print the cost every 100 training examples
if print_cost and i % 100 == 0:
print ( "Cost after iteration %i: %f" % ( i, cost ))
params = { "w": w,
"b": b }
grads = { "dw": dw,
"db": db }
return params, grads, costs

Step5:利用训练好的模型对测试集进行预测:

计算公式如下:

当输入大于0.5时,我们认为其预测认为结果是猫,否则不是猫。


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def predict ( w, b, X ):
'''
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
'''
m = X. shape [ 1 ]
Y_prediction = np. zeros (( 1, m ))
w = w. reshape ( X. shape [ 0 ], 1 )
# Compute vector "A" predicting the probabilities of a cat being present in the picture
A = sigmoid ( np. dot ( w. T, X ) + b )
for i in range ( A. shape [ 1 ]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
if A [ 0 ] [ i ] > 0.5:
Y_prediction [ 0 ] [ i ] = 1
else:
Y_prediction [ 0 ] [ i ] = 0
return Y_prediction

Step5:将以上功能整合到一个模型中:


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def model ( X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False ):
"""
Builds the logistic regression model by calling the function you've implemented previously
Arguments:
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to true to print the cost every 100 iterations
Returns:
d -- dictionary containing information about the model.
"""
# initialize parameters with zeros
w, b = initialize_with_zeros ( X_train. shape [ 0 ])
# Gradient descent
parameters, grads, costs = optimize ( w, b, X_train, Y_train, num_iterations, learning_rate, print_cost )
# Retrieve parameters w and b from dictionary "parameters"
w = parameters [ "w" ]
b = parameters [ "b" ]
# Predict test/train set examples
Y_prediction_test = predict ( w, b, X_test )
Y_prediction_train = predict ( w, b, X_train )
# Print train/test Errors
print ( "train accuracy: {} %". format ( 100 - np. mean ( np. abs ( Y_prediction_train - Y_train )) * 100 ))
print ( "test accuracy: {} %". format ( 100 - np. mean ( np. abs ( Y_prediction_test - Y_test )) * 100 ))
d = { "costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations }
return d

测试一下该模型吧:


  
  
1
d = model ( train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True )

此时,观察打印结果,我们可以发现我们的测试准确率已经可以达到70.0%。

而对于训练集,其准确性达到了99%。这表明了我们的模型有着一定的过拟合,不过不要着急,我们会在后续的内容中来解决这一问题。


使用如下代码,我们可以挑选其中的一些图片来看我们的预测结果:


  
  
1
2
3
4
# Example of a picture that was wrongly classified.
index = 14
plt. imshow ( test_set_x [:, index ]. reshape (( num_px, num_px, 3 )))
print ( "y = " + str ( test_set_y [ 0, index ]) + ", you predicted that it is a \" " + classes [ d [ "Y_prediction_test" ] [ 0, index ]]. decode ( "utf-8" ) + " \" picture." )

此外,我们还可以画出我们的代价函数变化曲线:


  
  
1
2
3
4
5
6
7
# Plot learning curve (with costs)
costs = np. squeeze ( d [ 'costs' ])
plt. plot ( costs )
plt. ylabel ( 'cost' )
plt. xlabel ( 'iterations (per hundreds)' )
plt. title ( "Learning rate =" + str ( d [ "learning_rate" ]))
plt. show ( )

之前的理论课程中,我们已经提及过学习速率对于最终的结果有着较大影响,现在,我们来用实验让大家有一个直观的了解。


  
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
learning_rates = [ 0.01, 0.001, 0.0001 ]
models = { }
for i in learning_rates:
print ( "learning rate is: " + str ( i ))
models [ str ( i )] = model ( train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False )
print ( ' \n ' + "-------------------------------------------------------" + ' \n ' )
for i in learning_rates:
plt. plot ( np. squeeze ( models [ str ( i )] [ "costs" ]), label = str ( models [ str ( i )] [ "learning_rate" ]))
plt. ylabel ( 'cost' )
plt. xlabel ( 'iterations' )
legend = plt. legend ( loc = 'upper center', shadow = True )
frame = legend. get_frame ( )
frame. set_facecolor ( '0.90' )
plt. show ( )

分析:不同的学习速率会导致不同的预测结果。较小的学习速度收敛速度较慢,而过大的学习速度可能导致震荡或无法收敛。


如果你希望用一副你自己的图像,而不是训练集或测试集中的图像,那么该如何实现呢?


  
  
1
2
3
4
5
6
7
8
9
10
11
12
## START CODE HERE ## (PUT YOUR IMAGE NAME)
my_image = "my_image.jpg" # change this to the name of your image file
## END CODE HERE ##
# We preprocess the image to fit your algorithm.
fname = "images/" + my_image
image = np. array ( ndimage. imread ( fname, flatten = False )) #
my_image = scipy. misc. imresize ( image, size = ( num_px, num_px )). reshape (( 1, num_px * num_px * 3 )). T #
my_predicted_image = predict ( d [ "w" ], d [ "b" ], my_image ) #
plt. imshow ( image )
print ( "y = " + str ( np. squeeze ( my_predicted_image )) + ", your algorithm predicts a \" " + classes [ int ( np. squeeze ( my_predicted_image )), ]. decode ( "utf-8" ) + " \" picture." )


更多更详细的内容,请访问原创网站:

http://www.missshi.cn/api/view/blog/59aa08fee519f50d04000170

Ps:初次访问由于js文件较大,请耐心等候(8s左右)


  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值