本文主要介绍从logistic回归到双层神经网络,再到多层神经网络的主要步骤的代码实现,篇幅有限,暂不涉及原理与公式推导:
一、logistic回归
- logistic回归的过程如下所示(本文不再赘述):
- logistic回归的主要代码实现如下所示:
-
正向传播:
正向传播的公式如下:
Z = W T ⋅ X + b Z=W^T·X+b Z=WT⋅X+b,其中 Z . s h a p e = 1 × m , W . s h a p e = n × 1 , X . s h a p e = n × m , b . s h a p e = 1 × 1 Z.shape=1×m,W.shape=n×1,X.shape=n×m,b.shape=1×1 Z.shape=1×m,W.shape=n×1,X.shape=n×m,b.shape=1×1
A = s i g m o i d ( Z ) A=sigmoid(Z) A=sigmoid(Z),其中 A . s h a p e = 1 × m A.shape=1×m A.shape=1×m
代码实现如下:
A = sigmoid(np.dot(w.T,X)+b)
-
计算损失:
交叉熵损失函数公式如下:
c o s t = − 1 m ∗ ∑ i = 1 m ( y ( i ) l o g a ( i ) + ( 1 − y ( i ) ) l o g ( 1 − a ( i ) ) ) cost=-\frac{1}{m}*\sum_{i=1}^m(y^{(i)}loga^{(i)}+(1-y^{(i)})log(1-a^{(i)})) cost=−m1∗∑i=1m(y(i)loga(i)+(1−y(i))log(1−a(i)))
代码实现如下:
cost = (-1/m) * np.sum(Y * np.log(A) + (1-Y) * np.log(1-A))
-
反向传播:
反向传播的公式如下:
d A = − Y A + 1 − Y 1 − A dA=-\frac{Y}{A}+\frac{1-Y}{1-A} dA=−AY+1−A1−Y
d Z = A − Y dZ=A-Y dZ=A−Y
d W = 1 m X ⋅ d Z T dW=\frac{1}{m}X·dZ^T dW=m1X⋅dZT
d b = 1 m d Z db=\frac{1}{m}dZ db=m1dZ
代码实现如下:
dw = (1/m) * np.dot(X,(A-Y).T) db = (1/m) * np.sum(A-Y)
-
综上所述,一次完整的正向传播、计算损失、反向传播的过程代码如下:
def propagate(w,b,X,Y):
m = X.shape[1]
#正向传播
A = sigmoid(np.dot(w.T,X)+b) #A:(1*m)
cost = (-1/m) * np.sum(Y * np.log(A) + (1-Y) * np.log(1-A)) #误差累加再求平均
#反向传播
dw = (1/m) * np.dot(X,(A-Y).T)
db = (1/m) * np.sum(A-Y)
assert(dw.shape==w.shape)
assert(db.dtype==float)
assert(cost.shape==())
grads = {
'dw':dw,
'db':db
}
return (grads,cost)
迭代m次的代码实现如下:
def optimize(w,b,X,Y,num_iterations,learning_rate,print_cost=False):
costs = []
for i in range(num_iterations):
grads,cost = propagate(w,b,X,Y)
dw = grads['dw']
db = grads['db']
w = w - learning_rate * dw
b = b - learning_rate * db
if i%100 == 0:
costs.append(cost)
if print_cost and i%100==0:
print("迭代次数:%d ,误差值:%f" %(i,cost))
params = {
'w':w,
'b':b
}
grads = {
'dw':dw,
'db':db
}
return (params,grads,costs)
- 测试:
二、双层神经网络
注:此处的双层指一层隐藏层+输出层,不涉及输入层
- 为了便于理解,通过向量化将每一层多个神经元结点的参数 w i , b i w^{i},b^{i} wi,bi各自堆叠成参数矩阵 W , b W,b W,b,所以神经网络可想象成如下图所示:
首先,logistic回归可以看作是简单的无隐藏层的神经网络,再此基础上增加一层隐藏层即可实现双层神经网络,值得注意的是,隐藏层的激活函数为 R e l U RelU RelU函数,输出层的激活函数仍为 s i g m o i d sigmoid sigmoid函数
所以,上图中右图所示即为双层神经网络,其每一层的正向传播过程如下图所示:
注:linear_activation_cache是为了方便反向传播的计算
其每一层的反向传播过程如下图所示:
- 双层神经网络的代码实现:
-
正向传播:
正向传播的公式与logistic回归非常相似,如下:
Z [ l ] = W [ l ] ⋅ A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}·A^{[l-1]}+b^{[l]} Z[l]=W[l]⋅A[l−1]+b[l],其中 Z . s h a p e = n [ l ] × m , W . s h a p e = n [ l ] × n [ l − 1 ] , A [ l − 1 ] . s h a p e = n [ l − 1 ] × m , b . s h a p e = n [ l ] × 1 Z.shape=n^{[l]}×m,W.shape=n^{[l]}×n^{[l-1]},A^{[l-1]}.shape=n^{[l-1]}×m,b.shape=n^{[l]}×1 Z.shape=n[l]×m,W.shape=n[l]×n[l−1],A[l−1].shape=n[l−1]×m,b.shape=n[l]×1
A [ l ] = g ( Z [ l ] ) A^{[l]}=g(Z^{[l]}) A[l]=g(Z[l]),其中 A . s h a p e = n [ l ] × m A.shape=n^{[l]}×m A.shape=n[l]×m
结合上图正向传播的过程及在理解了logistic回归代码的基础上,我们很容易能写出前向传播的代码如下:
#单层单次正向传播 def linear_activation_forward(A_prev,W,b,activation): if activation == "sigmoid": Z,linear_cache = linear_forward(A_prev,W,b) A,activation_cache = sigmoid(Z) elif activation == "relu": Z,linear_cache = linear_forward(A_prev,W,b) A,activation_cache = relu(Z) cache = (linear_cache,activation_cache) return (A,cache)
-
计算损失:
公式与logistic回归一致,如下:
c o s t = − 1 m ∗ ∑ i = 1 m ( y ( i ) l o g a ( i ) + ( 1 − y ( i ) ) l o g ( 1 − a ( i ) ) ) cost=-\frac{1}{m}*\sum_{i=1}^m(y^{(i)}loga^{(i)}+(1-y^{(i)})log(1-a^{(i)})) cost=−m1∗∑i=1m(y(i)loga(i)+(1−y(i))log(1−a(i)))
代码实现如下:
def compute_cost(AL,Y): m = Y.shape[1] cost = (-1/m) * np.sum(Y * np.log(AL) + (1-Y) * np.log(1-AL)) return cost
-
反向传播:
输出层的输入为: d A [ 2 ] = − Y A [ 2 ] + 1 − Y 1 − A [ 2 ] dA^{[2]}=-\frac{Y}{A^{[2]}}+\frac{1-Y}{1-A^{[2]}} dA[2]=−A[2]Y+1−A[2]1−Y
输出层激活部分到线性计算部分的公式为: d Z [ 2 ] = d A [ 2 ] ∗ g [ 2 ] ’ ( Z [ 2 ] ) = A [ 2 ] − Y dZ^{[2]}=dA^{[2]}*g^{[2]’}(Z^{[2]})=A^{[2]}-Y dZ[2]=dA[2]∗g[2]’(Z[2])=A[2]−Y
输出层的输出为:
d W [ 2 ] = 1 m d Z [ 2 ] ⋅ A [ 1 ] T dW^{[2]}=\frac{1}{m}dZ^{[2]}·A^{[1]T} dW[2]=m1dZ[2]⋅A[1]T
d b [ 2 ] = 1 m d Z [ 2 ] db^{[2]}=\frac{1}{m}dZ^{[2]} db[2]=m1dZ[2]
d A [ 1 ] = W [ 2 ] T ⋅ d Z [ 2 ] dA^{[1]}=W^{[2]T}·dZ^{[2]} dA[1]=W[2]T⋅dZ[2]
隐藏层的输入为: d A [ 1 ] = W [ 2 ] T ⋅ d Z [ 2 ] dA^{[1]}=W^{[2]T}·dZ^{[2]} dA[1]=W[2]T⋅dZ[2]
隐藏层激活部分到线性计算部分的公式为: d Z [ 1 ] = d A [ 1 ] ∗ g [ 1 ] ’ ( Z [ 1 ] ) dZ^{[1]}=dA^{[1]}*g^{[1]’}(Z^{[1]}) dZ[1]=dA[1]∗g[1]’(Z[1])
隐藏层的反向传播公式与输出层的区别是因为激活函数不同,其公式如下所示:
d W [ 1 ] = 1 m d Z [ 1 ] ⋅ X T dW^{[1]}=\frac{1}{m}dZ^{[1]}·X^{T} dW[1]=m1dZ[1]⋅XT
d b [ 1 ] = 1 m d Z [ 1 ] db^{[1]}=\frac{1}{m}dZ^{[1]} db[1]=m1dZ[1]
d A [ 0 ] dA^{[0]} dA[0]无意义
注:隐藏层的反向传播公式与输出层的区别是因为激活函数不同
单次反向传播的代码实现如下:
#单次反向传播的线性部分 def linear_backward(dZ,cache): A_prev,W,b = cache m = dZ.shape[1] dW = (1/m) * np.dot(dZ,A_prev.T) db = (1/m) * np.sum(dZ,axis=1,keepdims=True) dA_prev = np.dot(W.T,dZ) return dA_prev,dW,db #单次反向传播的激活部分 def sigmoid_backward(dA,cache): Z = cache s = 1/(1+np.exp(-Z)) dZ = dA * s * (1-s) return dZ def relu_backward(dA,cache): Z = cache dZ = np.array(dA,copy=True) dZ[Z<0] = 0 dZ = dZ.reshape(Z.shape) return dZ #单层单次反向传播 def linear_activation_backward(dA,cache,activation="relu"): linear_cache,activation_cache = cache if(activation == "relu"): dZ = relu_backward(dA,activation_cache) dA_prev,dW,db = linear_backward(dZ,linear_cache) elif activation == "sigmoid": dZ = sigmoid_backward(dA,activation_cache) dA_prev,dW,db = linear_backward(dZ,linear_cache) return (dA_prev,dW,db)
-
综上所述,双层神经网络迭代m次反向传播的代码如下所示:
#搭建两层的神经网络 def two_layer_model(X,Y,layers_dims,learning_rate=0.0075,num_iterations=3000,print_cost=False,isPlot=True): np.random.seed(1) grads = {} costs = [] (n_x,n_h,n_y) = layers_dims """ 初始化参数 """ param = initialize_param(n_x, n_h, n_y) W1 = param["W1"] b1 = param["b1"] W2 = param["W2"] b2 = param["b2"] """ 开始进行迭代 """ for i in range(0,num_iterations): #前向传播 A1, cache1 = linear_activation_forward(X, W1, b1, "relu") A2, cache2 = linear_activation_forward(A1, W2, b2, "sigmoid") #计算成本 cost = compute_cost(A2,Y) #后向传播 ##初始化后向传播 dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2)) ##向后传播,输入:“dA2,cache2,cache1”。 输出:“dA1,dW2,db2;还有dA0(未使用),dW1,db1”。 dA1, dW2, db2 = linear_activation_backward(dA2, cache2, "sigmoid") dA0, dW1, db1 = linear_activation_backward(dA1, cache1, "relu") ##向后传播完成后的数据保存到grads grads["dW1"] = dW1 grads["db1"] = db1 grads["dW2"] = dW2 grads["db2"] = db2 #更新参数 param = update_param(param,grads,learning_rate) W1 = param["W1"] b1 = param["b1"] W2 = param["W2"] b2 = param["b2"] #打印成本值,如果print_cost=False则忽略 if i % 100 == 0: #记录成本 costs.append(cost) #是否打印成本值 if print_cost: print("第", i ,"次迭代,成本值为:" ,np.squeeze(cost)) #迭代完成,根据条件绘制图 if isPlot: plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() #返回parameters return param
-
测试:
三、多层神经网络
- 多层神经网络正向传播过程如下图所示:
反向传播过程如下图所示:
注:留意整个过程的输入输出,每一层的输入输出
- 多层神经网络的代码实现:
-
正向传播:
单层隐藏层正向传播的代码已在双层神经网络的前向传播的实现部分给出,此处只需根据层数进行迭代即可,代码实现如下:
def L_model_forward(X,param): caches = [] A = X L = len(param)//2 for i in range(1,L): A_prev = A A,cache = linear_activation_forward(A_prev,param["W"+str(i)],param["b"+str(i)],"relu") caches.append(cache) AL,cache = linear_activation_forward(A,param["W"+str(L)],param["b"+str(L)],"sigmoid") caches.append(cache) return AL,caches
-
计算损失:与双层神经网络的计算损失一致,不再重复
-
反向传播:
单层隐藏层反向传播的代码已在双层神经网络的前向传播的实现部分给出,此处只需根据层数进行迭代即可,代码实现如下:
def L_model_backward(AL,Y,caches): grads = {} L = len(caches) Y = Y.reshape(AL.shape) dAL = - Y/AL + (1-Y)/(1-AL) current_cache = caches[L-1] grads["dA"+str(L)],grads["dW"+str(L)],grads["db"+str(L)] = linear_activation_backward(dAL,current_cache,"sigmoid") for i in reversed(range(1,L)): current_cache = caches[i-1] dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(i + 1)], current_cache, "relu") grads["dA" + str(i)] = dA_prev_temp grads["dW" + str(i)] = dW_temp grads["db" + str(i)] = db_temp return grads
-
更新参数:
梯度下降法更新参数的代码实现如下:
def update_param(param, grads, learning_rate): L = len(param) // 2 #整除 for i in range(1,L+1): param["W" + str(i)] = param["W" + str(i)] - learning_rate * grads["dW" + str(i)] param["b" + str(i)] = param["b" + str(i)] - learning_rate * grads["db" + str(i)] return param
-
综上所述,实现多层神经网络的m次迭代的代码实现如下:
#多层神经网络 def L_layer_model(X, Y, layers_dims, learning_rate=0.0075, num_iterations=3000, print_cost=False,isPlot=True): np.random.seed(1) costs = [] param = initialize_param_deep(layers_dims) for i in range(0,num_iterations): AL , caches = L_model_forward(X,param) cost = compute_cost(AL,Y) grads = L_model_backward(AL,Y,caches) param = update_param(param,grads,learning_rate) #打印成本值,如果print_cost=False则忽略 if i % 100 == 0: #记录成本 costs.append(cost) #是否打印成本值 if print_cost: print("第", i ,"次迭代,成本值为:" ,np.squeeze(cost)) #迭代完成,根据条件绘制图 if isPlot: plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return param
- 测试:
可以看出在测试集上的表现:多层神经网络(78%)>双层神经网络(72%)>logistic回归(70%)