转载过程中,图片丢失,代码显示错乱。
为了更好的学习内容,请访问原创版本:
http://www.missshi.cn/api/view/blog/59aa08fee519f50d04000170
Ps:初次访问由于js文件较大,请耐心等候(8s左右)
本节课中,我们将学习如何利用Python的来Logistic。
这是第一节Python代码内容,接下来我们将从一些基本的Python编程开始讲述。
使用numpy构建基本函数
numpy是Python在科学计算中最常用的库。接下来我们将要学习一些numpy中包含的常用函数。
练习1:利用np.exp()实现sigmod函数:
在利用np.exp()函数之前,我们首先使用math.exp()函数来实现sigmod函数,并将二者对比来突出np.exp()的优点。
其中,
1
2
3
4
5
6
7
8
9
10
11
import
math
def
basic_sigmod
(
x
):
"""
#计算单个标量的sigmod函数
"""
s
=
1.0
/
(
1
+
1
/
math.
exp
(
x
))
return
s
print
basic_sigmod
(
3
)
#0.9525741268224334
上述描述了如何对一个标量执行sigmod函数,而在深度学习的应用中,我们通过是对向量或者矩阵来执行sigmod运算。
如何执行将该函数用于矢量或者矩阵,那么系统会抛出异常:
1
print
basic_sigmod
([
3,
2,
1
])
而如果使用的是np.exp函数的话,如果输入的是一个矢量或者矩阵,那么对应的输出也会是矢量或矩阵,即针对每个元素进行指数计算。
1
2
3
4
import
numpy
as
np
x
=
np.
array
([
1,
2,
3
])
print
np.
exp
(
x
)
#[2.718281837.389056120.08553692]
此外,对于numpy array类型的变量,其加减乘除的方法也统一被改写。
以下面的例子为例:
1
2
3
x
=
np.
array
([
1,
2,
3
])
print
x
+
3
#[456]
接下来,我们来实现一个真正的、可用于矢量或矩阵的sigmod函数:
其需求如下:
1
2
3
4
5
6
7
8
9
10
11
12
import
numpy
as
np
def
sigmod
(
x
):
"""
#sigmod函数,可用于矢量和矩阵
"""
s
=
1.0
/
(
1
+
1
/
np.
exp
(
x
))
return
s
x
=
np.
array
([
1,
2,
3
])
print
np.
exp
(
x
)
#[0.73105858,0.88079708,0.95257413]
练习2:计算sigmod函数的导数
在之前的理论课程中,我们学习到了sigmod函数的导数公式如下:
接下来,我们通过Python代码进行实现:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def
sigmoid_derivative
(
x
):
"""
Computethegradient(alsocalledtheslopeorderivative)ofthesigmoidfunctionwithrespecttoitsinputx.
Youcanstoretheoutputofthesigmoidfunctionintovariablesandthenuseittocalculatethegradient.
Arguments:
x--Ascalarornumpyarray
Return:
ds--Yourcomputedgradient.
"""
s
=
1.0
/
(
1
+
1
/
np.
exp
(
x
))
ds
=
s
*
(
1
-
s
)
return
ds
x
=
np.
array
([
1,
2,
3
])
print
"sigmoid_derivative(x)="
+
str
(
sigmoid_derivative
(
x
))
#sigmoid_derivative(x)=[0.196611930.104993590.04517666]
练习3:将一副图像转为为一个向量
在numpy中,有两个常用的函数:np.shape和np.reshape()。
其中,X.shape可以用于查看当前矩阵的维度。
X.reshape()可以用于修改矩阵的维度或形状。
例如,对于一副彩色图像,其通常是由一个三维矩阵组成的(RGB三个通道)。然而,在深度学习的应用中,我们通常需要将其转换为一个矢量,其长度为3*length*width。
即我们需要将一个三维的矩阵转换为一个一维的向量。
接下来,我们需要实现一个image2vector函数,其输入为一个三维矩阵(length, height, 3),输出为一个矢量。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def
image2vector
(
image
):
"""
Argument:
image--anumpyarrayofshape(length,height,depth)
Returns:
v--avectorofshape(length*height*depth,1)
"""
v
=
image.
reshape
((
image.
shape
[
0
]
*
image.
shape
[
1
]
*
image.
shape
[
2
],
1
))
return
v
image
=
np.
array
([[[
0.67826139,
0.29380381
],
[
0.90714982,
0.52835647
],
[
0.4215251
,
0.45017551
]],
[[
0.92814219,
0.96677647
],
[
0.85304703,
0.52351845
],
[
0.19981397,
0.27417313
]],
[[
0.60659855,
0.00533165
],
[
0.10820313,
0.49978937
],
[
0.34144279,
0.94630077
]]])
print
"image2vector(image)="
+
str
(
image2vector
(
image
))
#[[0.67826139][0.29380381][0.90714982][0.52835647][0.4215251][0.45017551][0.92814219][0.96677647][0.85304703][0.52351845][0.19981397][0
.27417313][0.60659855][0.00533165][0.10820313][0.49978937][0.34144279][0.94630077]]
练习4:按行归一化
在深度学习中,常用的一个技巧是需要对我们的数据进行归一化。
通过,在对数据进行归一化后,梯度下降算法的收敛速度会明显加快。
接下来,我们需要对一个矩阵进行按行归一化,归一化后的结果是每一个的长度为1。
例如:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def
normalizeRows
(
x
):
"""
Implementafunctionthatnormalizeseachrowofthematrixx(tohaveunitlength).
Argument:
x--Anumpymatrixofshape(n,m)
Returns:
x--Thenormalized(byrow)numpymatrix.Youareallowedtomodifyx.
"""
x_norm
=
np.
linalg.
norm
(
x,
axis
=
1,
keepdims
=
True
)
#计算每一行的长度,得到一个列向量
x
=
x
/
x_norm
#利用numpy的广播,用矩阵与列向量相除。
return
x
x
=
np.
array
([
[
0,
3,
4
],
[
1,
6,
4
]])
print
"normalizeRows(x)="
+
str
(
normalizeRows
(
x
))
#normalizeRows(x)=[[0.0.60.8][0.137360560.824163380.54944226]]
在上面的代码中,我们利用了广播的特性,接下来我们主要学习一下广播的使用。
练习5:广播的使用及softmax函数的实现
广播是numpy中一个非常强大的功能,它可以帮助我们对不同维度的矩阵、向量、标量之前快速计算。
接下来,我们需要实现一个softmax函数,其定义如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def
softmax
(
x
):
"""Calculatesthesoftmaxforeachrowoftheinputx.
Yourcodeshouldworkforarowvectorandalsoformatricesofshape(n,m).
Argument:
x--Anumpymatrixofshape(n,m)
Returns:
s--Anumpymatrixequaltothesoftmaxofx,ofshape(n,m)
"""
x_exp
=
np.
exp
(
x
)
#(n,m)
x_sum
=
np.
sum
(
x_exp,
axis
=
1,
keepdims
=
True
)
#(n,1)
s
=
x_exp
/
x_sum
#(n,m)广播的作用
return
s
x
=
np.
array
([
[
9,
2,
5,
0,
0
],
[
7,
5,
0,
0
,
0
]])
print
"softmax(x)="
+
str
(
softmax
(
x
))
#softmax(x)=[[9.80897665e-018.94462891e-041.79657674e-021.21052389e-041.21052389e-04][8.78679856e-011.18916387e-018.01252314e-048.01252314e-048.01252314e-04]]
矢量化
在深度学习中,我们通常会处理大数据量的数据集。
因此=,计算速度可能会成为整个训练过程中的瓶颈。
为了保证我们计算的效率,我们需要对进行过程矢量化。
接下来,我们对比一下是否使用矢量化对于点乘、外积和按元素相乘等操作来说,计算效率的比较。
首先,利用原生方法的实现过程如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import
time
x1
=
[
9,
2,
5,
0,
0,
7,
5,
0,
0,
0,
9,
2,
5,
0,
0
]
x2
=
[
9,
2,
2,
9,
0,
9,
2,
5,
0,
0,
9,
2,
5,
0,
0
]
###CLASSICDOTPRODUCTOFVECTORSIMPLEMENTATION###
tic
=
time.
process_time
(
)
dot
=
0
for
i
in
range
(
len
(
x1
)):
dot
+=
x1
[
i
]
*
x2
[
i
]
toc
=
time.
process_time
(
)
print
(
"dot-----Computationtime="
+
str
(
1000
*
(
toc
-
tic
))
+
"ms"
)
###CLASSICOUTERPRODUCTIMPLEMENTATION###
tic
=
time.
process_time
(
)
outer
=
np.
zeros
((
len
(
x1
),
len
(
x2
)))
#wecreatealen(x1)*len(x2)matrixwithonlyzeros
for
i
in
range
(
len
(
x1
)):
for
j
in
range
(
len
(
x2
)):
outer
[
i,
j
]
=
x1
[
i
]
*
x2
[
j
]
toc
=
time.
process_time
(
)
print
(
"outer-----Computationtime="
+
str
(
1000
*
(
toc
-
tic
))
+
"ms"
)
###CLASSICELEMENTWISEIMPLEMENTATION###
tic
=
time.
process_time
(
)
mul
=
np.
zeros
(
len
(
x1
))
for
i
in
range
(
len
(
x1
)):
mul
[
i
]
=
x1
[
i
]
*
x2
[
i
]
toc
=
time.
process_time
(
)
print
(
"elementwisemultiplication-----Computationtime="
+
str
(
1000
*
(
toc
-
tic
))
+
"ms"
)
###CLASSICGENERALDOTPRODUCTIMPLEMENTATION###
W
=
np.
random.
rand
(
3,
len
(
x1
))
#Random3*len(x1)numpyarray
tic
=
time.
process_time
(
)
gdot
=
np.
zeros
(
W.
shape
[
0
])
for
i
in
range
(
W.
shape
[
0
]):
for
j
in
range
(
len
(
x1
)):
gdot
[
i
]
+=
W
[
i,
j
]
*
x1
[
j
]
toc
=
time.
process_time
(
)
print
(
"gdot-----Computationtime="
+
str
(
1000
*
(
toc
-
tic
))
+
"ms"
)
#dot-----Computationtime=0.17002099999974263ms
#outer-----Computationtime=0.34057500000006513ms
#elementwisemultiplication-----Computationtime=0.1940779999998199ms
#gdot-----Computationtime=0.2362039999999066ms
接下来,利用矢量化实现的结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
x1
=
[
9,
2,
5,
0,
0,
7,
5,
0,
0,
0,
9,
2,
5,
0,
0
]
x2
=
[
9,
2,
2,
9,
0,
9,
2,
5,
0,
0,
9,
2,
5,
0,
0
]
###VECTORIZEDDOTPRODUCTOFVECTORS###
tic
=
time.
process_time
(
)
dot
=
np.
dot
(
x1,
x2
)
toc
=
time.
process_time
(
)
print
(
"dot-----Computationtime="
+
str
(
1000
*
(
toc
-
tic
))
+
"ms"
)
###VECTORIZEDOUTERPRODUCT###
tic
=
time.
process_time
(
)
outer
=
np.
outer
(
x1,
x2
)
toc
=
time.
process_time
(
)
print
(
"outer-----Computationtime="
+
str
(
1000
*
(
toc
-
tic
))
+
"ms"
)
###VECTORIZEDELEMENTWISEMULTIPLICATION###
tic
=
time.
process_time
(
)
mul
=
np.
multiply
(
x1,
x2
)
toc
=
time.
process_time
(
)
print
(
"elementwisemultiplication-----Computationtime="
+
str
(
1000
*
(
toc
-
tic
))
+
"ms"
)
###VECTORIZEDGENERALDOTPRODUCT###
tic
=
time.
process_time
(
)
dot
=
np.
dot
(
W,
x1
)
toc
=
time.
process_time
(
)
print
(
"gdot-----Computationtime="
+
str
(
1000
*
(
toc
-
tic
))
+
"ms"
)
#dot-----Computationtime=0.16546899999991815ms
#outer-----Computationtime=0.14168100000011563ms
#elementwisemultiplication-----Computationtime=0.10738799999998605ms
#gdot-----Computationtime=0.38393900000022185ms
从上述结果中,我们可以看到矢量化的代码明显简单了很多。
同时,运行时间也有了一定程度的降低。降低的幅度不大主要是由于数据量较小的原因,随着数据量的增大,减小的幅度也会越来越明显。
练习1:L1误差函数的实现
我们需要使用numpy函数来实现L1误差函数:
其中,L1误差函数的定义如下:
^y表示估计值,y表示真实值。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import
numpy
as
np
def
L1
(
yhat,
y
):
"""
Arguments:
yhat--vectorofsizem(predictedlabels)
y--vectorofsizem(truelabels)
Returns:
loss--thevalueoftheL1lossfunctiondefinedabove
"""
loss
=
np.
sum
(
np.
abs
(
y
-
yhat
))
return
loss
yhat
=
np.
array
([
.9,
0.2,
0.1,
.4,
.9
])
y
=
np.
array
([
1,
0,
0,
1,
1
])
print
"L1="
+
str
(
L1
(
yhat,
y
))
#L1=1.1
练习2:L2误差函数的实现
L2误差函数的定义如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import
numpy
as
np
def
L2
(
yhat,
y
):
"""
Arguments:
yhat--vectorofsizem(predictedlabels)
y--vectorofsizem(truelabels)
Returns:
loss--thevalueoftheL2lossfunctiondefinedabove
"""
loss
=
np.
sum
(
np.
power
((
y
-
yhat
),
2
))
return
loss
yhat
=
np.
array
([
.9,
0.2,
0.1,
.4,
.9
])
y
=
np.
array
([
1,
0,
0,
1,
1
])
print
"L2="
+
str
(
L2
(
yhat,
y
))
#L2=0.43
Logistic的实现
接下来的内容中,我们将实现一个完成Logistic函数。包括:初始化、计算代价函数和梯度、使用梯度下降算法进行优化等并把他们整合成为一个函数。
本实验用于通过训练来判断一副图像是否为猫。
在这个过程中,我们将会用到如下库:
numpy:Python科学计算中最重要的库
h5py:Python与H5文件交互的库
mathplotlib:Python画图的库
PIL:Python图像相关的库
scipy:Python科学计算相关的库
在程序的开头,我们首先需要引入相关的库:
1
2
3
4
5
6
7
8
import
numpy
as
np
import
matplotlib.
pyplot
as
plt
import
h5py
import
scipy
from
PIL
import
Image
from
scipy
import
ndimage
%
matplotlib
inline
#设置matplotlib在行内显示图片
在训练之前,首先需要读取数据,读取数据的代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def
load_dataset
(
):
"""
#加载数据集
"""
train_dataset
=
h5py.
File
(
'datasets/train_catvnoncat.h5',
"r"
)
#读取H5文件
train_set_x_orig
=
np.
array
(
train_dataset
[
"train_set_x"
]
[:
])
#yourtrainsetfeatures
train_set_y_orig
=
np.
array
(
train_dataset
[
"train_set_y"
]
[:
])
#yourtrainsetlabels
test_dataset
=
h5py.
File
(
'datasets/test_catvnoncat.h5',
"r"
)
test_set_x_orig
=
np.
array
(
test_dataset
[
"test_set_x"
]
[:
])
#yourtestsetfeatures
test_set_y_orig
=
np.
array
(
test_dataset
[
"test_set_y"
]
[:
])
#yourtestsetlabels
classes
=
np.
array
(
test_dataset
[
"list_classes"
]
[:
])
#thelistofclasses
train_set_y_orig
=
train_set_y_orig.
reshape
((
1,
train_set_y_orig.
shape
[
0
]))
#对训练集和测试集标签进行reshape
test_set_y_orig
=
test_set_y_orig.
reshape
((
1,
test_set_y_orig.
shape
[
0
]))
return
train_set_x_orig,
train_set_y_orig,
test_set_x_orig,
test_set_y_orig,
classes
train_set_x_orig,
train_set_y,
test_set_x_orig,
test_set_y,
classes
=
load_dataset
(
)
数据说明:
对于训练集的标签而言,对于猫,标记为1,否则标记为0。
每一个图像的维度都是(num_px, num_px, 3),其中,长宽相同,3表示是RGB图像。
train_set_x_orig和test_set_x_orig中,包含_orig是由于我们稍候需要对图像进行预处理,预处理后的变量将会命名为train_set_x和train_set_y。
train_set_x_orig中的每一个元素对于这一副图像,我们可以用如下代码将图像显示出来:
1
2
3
4
index
=
25
plt.
imshow
(
train_set_x_orig
[
index
])
print
"y = "
+
str
(
train_set_y
[:,
index
])
+
", it's a '"
+
classes
[
np.
squeeze
(
train_set_y
[:,
index
])].
decode
(
"utf-8"
)
+
"' picture."
# y = [1], it's a 'cat' picture.
接下来,我们需要根据图像集来计算出训练集的大小、测试集的大小以及图片的大小:
1
2
3
4
5
m_train
=
train_set_x_orig.
shape
[
0
]
m_test
=
test_set_x_orig.
shape
[
0
]
num_px
=
train_set_x_orig.
shape
[
1
]
print
(
m_train,
m_test,
num_px
)
# 209, 50, 64
接下来,我们需要对将每幅图像转为一个矢量,即矩阵的一列。
最终,整个训练集将会转为一个矩阵,其中包括num_px*numpy*3行,m_train列。
1
2
train_set_x_flatten
=
train_set_x_orig.
reshape
(
train_set_x_orig.
shape
[
0
],
-
1
).
T
test_set_x_flatten
=
test_set_x_orig.
reshape
(
test_set_x_orig.
shape
[
0
],
-
1
).
T
Ps:其中X_flatten = X.reshape(X.shape[0], -1).T可以将一个维度为(a,b,c,d)的矩阵转换为一个维度为(b∗∗c∗∗d, a)的矩阵。
接下来,我们需要对图像值进行归一化。
由于图像的原始值在0到255之间,最简单的方式是直接除以255即可。
1
2
train_set_x
=
train_set_x_flatten
/
255.
test_set_x
=
test_set_x_flatten
/
255.
接下来,我们来看一下Logistic的结构:
对于每个训练样本x,其误差函数的计算方式如下:
而整体的代价函数计算如下:
接下来,我们将按照如下步骤来实现Logistic:
1. 定义模型结构
2. 初始化模型参数
3. 循环
3.1 前向传播
3.2 反向传递
3.3 更新参数
4. 整合成为一个完整的模型
Step1:实现sigmod函数
1
2
3
4
5
6
7
8
9
10
11
12
def
sigmoid
(
z
):
"""
Compute the sigmoid of z
Arguments:
z -- A scalar or numpy array of any size.
Return:
s -- sigmoid(z)
"""
s
=
1.0
/
(
1
+
1
/
np.
exp
(
z
))
return
s
Step2:初始化参数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def
initialize_with_zeros
(
dim
):
"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
Argument:
dim -- size of the w vector we want (or number of parameters in this case)
Returns:
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias)
"""
w
=
np.
zeros
((
dim,
1
))
b
=
0
return
w,
b
Step3:前向传播与反向传播
Ps:计算公式如下:(具体计算公式来源请查看之前的理论课)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def
propagate
(
w,
b,
X,
Y
):
"""
Implement the cost function and its gradient for the propagation explained above
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""
m
=
X.
shape
[
1
]
# FORWARD PROPAGATION (FROM X TO COST)
A
=
sigmoid
(
np.
dot
(
w.
T,
X
)
+
b
)
# compute activation
cost
=
-
1
/
m
*
np.
sum
(
Y
*
np.
log
(
A
)
+
(
1
-
Y
)
*
np.
log
(
1
-
A
))
# compute cost
# BACKWARD PROPAGATION (TO FIND GRAD)
dw
=
1
/
m
*
np.
dot
(
X,
(
A
-
Y
).
T
)
db
=
1
/
m
*
np.
sum
(
A
-
Y
)
cost
=
np.
squeeze
(
cost
)
grads
=
{
"dw":
dw,
"db":
db
}
return
grads,
cost
Step4:更新参数
更新参数的公式如下:
完整代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def
optimize
(
w,
b,
X,
Y,
num_iterations,
learning_rate,
print_cost
=
False
):
"""
This function optimizes w and b by running a gradient descent algorithm
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of shape (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- True to print the loss every 100 steps
Returns:
params -- dictionary containing the weights w and bias b
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
Tips:
You basically need to write down two steps and iterate through them:
1) Calculate the cost and the gradient for the current parameters. Use propagate().
2) Update the parameters using gradient descent rule for w and b.
"""
costs
=
[
]
for
i
in
range
(
num_iterations
):
#每次迭代循环一次, num_iterations为迭代次数
# Cost and gradient calculation
grads,
cost
=
propagate
(
w,
b,
X,
Y
)
# Retrieve derivatives from grads
dw
=
grads
[
"dw"
]
db
=
grads
[
"db"
]
# update rule
w
=
w
-
learning_rate
*
dw
b
=
b
-
learning_rate
*
db
# Record the costs
if
i
%
100
==
0:
costs.
append
(
cost
)
# Print the cost every 100 training examples
if
print_cost
and
i
%
100
==
0:
print
(
"Cost after iteration %i: %f"
%
(
i,
cost
))
params
=
{
"w":
w,
"b":
b
}
grads
=
{
"dw":
dw,
"db":
db
}
return
params,
grads,
costs
Step5:利用训练好的模型对测试集进行预测:
计算公式如下:
当输入大于0.5时,我们认为其预测认为结果是猫,否则不是猫。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def
predict
(
w,
b,
X
):
'''
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
'''
m
=
X.
shape
[
1
]
Y_prediction
=
np.
zeros
((
1,
m
))
w
=
w.
reshape
(
X.
shape
[
0
],
1
)
# Compute vector "A" predicting the probabilities of a cat being present in the picture
A
=
sigmoid
(
np.
dot
(
w.
T,
X
)
+
b
)
for
i
in
range
(
A.
shape
[
1
]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
if
A
[
0
]
[
i
]
>
0.5:
Y_prediction
[
0
]
[
i
]
=
1
else:
Y_prediction
[
0
]
[
i
]
=
0
return
Y_prediction
Step5:将以上功能整合到一个模型中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def
model
(
X_train,
Y_train,
X_test,
Y_test,
num_iterations
=
2000,
learning_rate
=
0.5,
print_cost
=
False
):
"""
Builds the logistic regression model by calling the function you've implemented previously
Arguments:
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to true to print the cost every 100 iterations
Returns:
d -- dictionary containing information about the model.
"""
# initialize parameters with zeros
w,
b
=
initialize_with_zeros
(
X_train.
shape
[
0
])
# Gradient descent
parameters,
grads,
costs
=
optimize
(
w,
b,
X_train,
Y_train,
num_iterations,
learning_rate,
print_cost
)
# Retrieve parameters w and b from dictionary "parameters"
w
=
parameters
[
"w"
]
b
=
parameters
[
"b"
]
# Predict test/train set examples
Y_prediction_test
=
predict
(
w,
b,
X_test
)
Y_prediction_train
=
predict
(
w,
b,
X_train
)
# Print train/test Errors
print
(
"train accuracy: {} %".
format
(
100
-
np.
mean
(
np.
abs
(
Y_prediction_train
-
Y_train
))
*
100
))
print
(
"test accuracy: {} %".
format
(
100
-
np.
mean
(
np.
abs
(
Y_prediction_test
-
Y_test
))
*
100
))
d
=
{
"costs":
costs,
"Y_prediction_test":
Y_prediction_test,
"Y_prediction_train" :
Y_prediction_train,
"w" :
w,
"b" :
b,
"learning_rate" :
learning_rate,
"num_iterations":
num_iterations
}
return
d
测试一下该模型吧:
1
d
=
model
(
train_set_x,
train_set_y,
test_set_x,
test_set_y,
num_iterations
=
2000,
learning_rate
=
0.005,
print_cost
=
True
)
此时,观察打印结果,我们可以发现我们的测试准确率已经可以达到70.0%。
而对于训练集,其准确性达到了99%。这表明了我们的模型有着一定的过拟合,不过不要着急,我们会在后续的内容中来解决这一问题。
使用如下代码,我们可以挑选其中的一些图片来看我们的预测结果:
1
2
3
4
# Example of a picture that was wrongly classified.
index
=
14
plt.
imshow
(
test_set_x
[:,
index
].
reshape
((
num_px,
num_px,
3
)))
print
(
"y = "
+
str
(
test_set_y
[
0,
index
])
+
", you predicted that it is a
\"
"
+
classes
[
d
[
"Y_prediction_test"
]
[
0,
index
]].
decode
(
"utf-8"
)
+
"
\"
picture."
)
此外,我们还可以画出我们的代价函数变化曲线:
1
2
3
4
5
6
7
# Plot learning curve (with costs)
costs
=
np.
squeeze
(
d
[
'costs'
])
plt.
plot
(
costs
)
plt.
ylabel
(
'cost'
)
plt.
xlabel
(
'iterations (per hundreds)'
)
plt.
title
(
"Learning rate ="
+
str
(
d
[
"learning_rate"
]))
plt.
show
(
)
之前的理论课程中,我们已经提及过学习速率对于最终的结果有着较大影响,现在,我们来用实验让大家有一个直观的了解。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
learning_rates
=
[
0.01,
0.001,
0.0001
]
models
=
{
}
for
i
in
learning_rates:
print
(
"learning rate is: "
+
str
(
i
))
models
[
str
(
i
)]
=
model
(
train_set_x,
train_set_y,
test_set_x,
test_set_y,
num_iterations
=
1500,
learning_rate
=
i,
print_cost
=
False
)
print
(
'
\n
'
+
"-------------------------------------------------------"
+
'
\n
'
)
for
i
in
learning_rates:
plt.
plot
(
np.
squeeze
(
models
[
str
(
i
)]
[
"costs"
]),
label
=
str
(
models
[
str
(
i
)]
[
"learning_rate"
]))
plt.
ylabel
(
'cost'
)
plt.
xlabel
(
'iterations'
)
legend
=
plt.
legend
(
loc
=
'upper center',
shadow
=
True
)
frame
=
legend.
get_frame
(
)
frame.
set_facecolor
(
'0.90'
)
plt.
show
(
)
分析:不同的学习速率会导致不同的预测结果。较小的学习速度收敛速度较慢,而过大的学习速度可能导致震荡或无法收敛。
如果你希望用一副你自己的图像,而不是训练集或测试集中的图像,那么该如何实现呢?
1
2
3
4
5
6
7
8
9
10
11
12
## START CODE HERE ## (PUT YOUR IMAGE NAME)
my_image
=
"my_image.jpg"
# change this to the name of your image file
## END CODE HERE ##
# We preprocess the image to fit your algorithm.
fname
=
"images/"
+
my_image
image
=
np.
array
(
ndimage.
imread
(
fname,
flatten
=
False
))
#读取图片
my_image
=
scipy.
misc.
imresize
(
image,
size
=
(
num_px,
num_px
)).
reshape
((
1,
num_px
*
num_px
*
3
)).
T
#放缩图像
my_predicted_image
=
predict
(
d
[
"w"
],
d
[
"b"
],
my_image
)
#预测
plt.
imshow
(
image
)
print
(
"y = "
+
str
(
np.
squeeze
(
my_predicted_image
))
+
", your algorithm predicts a
\"
"
+
classes
[
int
(
np.
squeeze
(
my_predicted_image
)),
].
decode
(
"utf-8"
)
+
"
\"
picture."
)
更多更详细的内容,请访问原创网站:
http://www.missshi.cn/api/view/blog/59aa08fee519f50d04000170
Ps:初次访问由于js文件较大,请耐心等候(8s左右)