1.构建神经网络
为了解决异或问题,我们构建如图所示的神经网络
该神经网络一共有3层,其中
z
(
l
)
z^{(l)}
z(l)表示第l层未经激活函数之前的中间变量,
a
(
l
)
a^{(l)}
a(l)表示第l层输出,
θ
1
\theta_{1}
θ1表示输入层和隐藏层之间的权重矩阵,
θ
2
\theta_{2}
θ2表示隐藏层和输出层之间的权值矩阵。
2.代码
import numpy as np
#输入数据
X=np.array([[0,0],
[0,1],
[1,0],
[1,1]])
y=np.array([[0],[1],[1],[0]])
print(y.shape)
(4, 1)
sigmoid()函数为:
g
(
z
)
=
1
1
+
e
−
z
g(z)=\frac{1}{1+e^{-z}}
g(z)=1+e−z1
其导数为:
g
′
(
z
)
=
e
−
z
(
1
+
e
−
z
)
2
g^{'}(z)=\frac{e^{-z}}{(1+e^{-z})^{2}}
g′(z)=(1+e−z)2e−z
= 1 1 + e − z e − z 1 + e − z =\frac{1}{1+e^{-z}}\frac{e^{-z}}{1+e^{-z}} =1+e−z11+e−ze−z
= 1 1 + e − z ( 1 + e − z 1 + e − z − 1 1 + e − z ) =\frac{1}{1+e^{-z}}(\frac{1+e^{-z}}{1+e^{-z}}-\frac{1}{1+e^{-z}}) =1+e−z1(1+e−z1+e−z−1+e−z1)
= g ( z ) ( 1 − g ( z ) ) =g(z)(1-g(z)) =g(z)(1−g(z))
#sigmoid函数
def sigmoid(x):
return 1.0/(1+np.exp(-x))
#sigmoid函数的导数
def dsigmoid(x):
return np.multiply(sigmoid(x),1-sigmoid(x))
我们构建的神经网络中:
θ
1
\theta_{1}
θ1是一个4行3列的一个矩阵
θ
2
\theta_{2}
θ2是一个1行5列的一个矩阵
前向传播算法:
KaTeX parse error: Can't use function '$' in math mode at position 8: a^{(1)}$̲=x(加上偏置项x_{0}=1…
z
(
2
)
=
a
(
1
)
θ
1
z(2)=a^{(1)}\theta_{1}
z(2)=a(1)θ1
a
(
2
)
=
s
i
g
m
o
i
d
(
z
(
2
)
)
a^{(2)}=sigmoid(z^{(2)})
a(2)=sigmoid(z(2))
z
(
3
)
=
a
(
2
)
θ
2
(
a
(
2
)
加
上
偏
置
项
a
0
(
2
)
=
1
)
z^{(3)}=a^{(2)}\theta_{2}(a_{(2)}加上偏置项a^{(2)}_{0}=1)
z(3)=a(2)θ2(a(2)加上偏置项a0(2)=1)
r
e
s
u
l
t
=
a
(
3
)
=
s
i
g
m
o
i
d
(
z
(
3
)
)
result=a^{(3)}=sigmoid(z^{(3)})
result=a(3)=sigmoid(z(3))
def forward_propagate(X,theta1,theta2):
X=np.matrix(X)
theta1=np.matrix(theta1)
theta2=np.matrix(theta2)#将其都转化为矩阵
m=X.shape[0]
# print(X.shape)
# print(theta1.shape)
# print(theta2.shape)
#为输入层添加偏置
a1=np.concatenate((np.ones((m,1)),X),axis=1)
#print(a1.shape)
#计算z2
z2=a1*theta1.T
a2=np.concatenate((np.ones((m,1)),sigmoid(z2)),axis=1)
#z2
z3=a2*theta2.T
res=sigmoid(z3)#输出层
return a1,z2,a2,z3,res
在给参数模型参数设置初值时,在前面的线性回归,逻辑回归中我们都给参数设置了初值0,但这种设置初值的方法在神经网络中是行不通的。因为如果给参数全部都设置为一样的值0,那么我们由输入层得到的第二层的所有激活单元的值就会一样。所有的参数设置为同一个非0数也不可以。
#参数初始化
theta1=np.random.rand(4,3)
theta2=np.random.rand(1,5)
theta1,theta2
print(theta1.shape,' ',theta2.shape)
a1,z2,a2,z3,res=forward_propagate(X,theta1,theta2)
res
(4, 3) (1, 5)
matrix([[0.76925913],
[0.77569001],
[0.79155468],
[0.7967637 ]])
#代价函数
def cost(X,y,theta1,theta2):
X=np.matrix(X)
y=np.matrix(y)
theta1=np.matrix(theta1)
theta2=np.matrix(theta2)
a1,z2,a2,z3,res=forward_propagate(X,theta1,theta2)
#print(res)
J=0
#print(y.shape[0])
for i in range(y.shape[0]):
first=np.multiply(y[i,:],np.log(res[i,:]))
second=np.multiply(1-y[i,:],np.log(1-res[i,:]))
print(first,' ',second)
J+=-np.sum((first+second))
return J/y.shape[0]
cost=cost(X,y,theta1,theta2)
print(cost)
[[-0.]] [[-1.46645997]]
[[-0.2540023]] [[-0.]]
[[-0.23375632]] [[-0.]]
[[-0.]] [[-1.59338592]]
0.886901128052618
反向传播:
δ
(
3
)
=
a
(
3
)
−
y
\delta^{(3)}=a^{(3)}-y
δ(3)=a(3)−y
δ
(
2
)
=
(
θ
(
2
)
)
T
δ
3
g
′
(
z
(
2
)
)
\delta^{(2)}=(\theta^{(2)})^{T}\delta^{3}g^{'}(z^{(2)})
δ(2)=(θ(2))Tδ3g′(z(2))
∂
∂
θ
(
2
)
J
(
θ
)
=
a
(
2
)
δ
(
3
)
\frac{\partial}{\partial{\theta^{(2)}}}J(\theta)=a^{(2)}\delta^{(3)}
∂θ(2)∂J(θ)=a(2)δ(3)
∂
∂
θ
(
1
)
J
(
θ
)
=
a
(
l
)
δ
(
2
)
\frac{\partial}{\partial{\theta^{(1)}}}J(\theta)=a^{(l)}\delta^{(2)}
∂θ(1)∂J(θ)=a(l)δ(2)
反向传播算法:
反向传播算法步骤:
1.对神经网络的权值进行随机初始化
2:遍历所有样本
1.运用前向传播算法,得到预测值
a
L
=
h
θ
(
x
)
a^{L}=h_{\theta}(x)
aL=hθ(x).
2.运用反向传播算法,从输出层开始计算每层的误差,以此来求取偏导。输出层的误差即为预测值与真实值之间的差值:
δ
L
=
a
L
−
y
\delta^{L}=a^{L}-y
δL=aL−y,对于隐藏层中的每一层的误差,都通过上一层的误差来计算。得到
δ
l
=
(
θ
(
l
)
)
T
δ
l
+
1
.
∗
a
(
l
)
∗
(
1
−
a
(
l
)
,
a
0
(
l
)
=
1
\delta^{l}=(\theta^{(l)})^{T}\delta^{l+1}.*a^{(l)}*(1-a^{(l)},a^{(l)}_{0}=1
δl=(θ(l))Tδl+1.∗a(l)∗(1−a(l),a0(l)=1
依次求解并累加误差。
Δ
i
,
j
(
l
)
=
Δ
i
,
j
(
l
)
+
a
j
(
l
)
δ
i
l
+
1
\Delta^{(l)}_{i,j}=\Delta^{(l)}_{i,j}+a_{j}^{(l)}\delta^{l+1}_{i}
Δi,j(l)=Δi,j(l)+aj(l)δil+1,向量化实现:
Δ
l
=
∣
D
e
l
a
t
a
l
+
δ
l
+
1
(
a
(
l
)
)
T
\Delta^{l}=|Delata^{l}+\delta^{l+1}(a^{(l)})^{T}
Δl=∣Delatal+δl+1(a(l))T
3.遍历完所有样本后,求得偏导为:
∂
∂
θ
i
,
j
(
l
)
J
(
θ
)
=
D
i
,
j
(
l
)
\frac{\partial}{\partial \theta^{(l)}_{i,j}}J(\theta)=D^{(l)}_{i,j}
∂θi,j(l)∂J(θ)=Di,j(l)
def back_propagate(X,y,theta1,theta2):
X=np.matrix(X)
y=np.matrix(y)
theta1=np.matrix(theta1)
theta2=np.matrix(theta2)
a1,z2,a2,z3,res=forward_propagate(X,theta1,theta2)
Deleta_1=Deleta_2=0
for i in range(y.shape[0]):
deleta3=res[i]-y[i]#1*1
z2_i=np.concatenate((np.ones((1,1)),z2[i,:]),axis=1)
deleta2=np.multiply((theta2.T*deleta3.T).T,dsigmoid(z2_i))
Deleta_1+=(deleta2[:,1:]).T*a1[i]
Deleta_2+=(deleta3.T)*a2[i]
return Deleta_1,Deleta_2
Deleta_1,Deleta_2=back_propagate(X,y,theta1,theta2)
Deleta_1
Deleta_2
matrix([[1.13326752, 0.82401803, 0.77403078, 0.76294633, 0.84984931]])
lr=0.1#学习率
epochs=2000#迭代次数
#print(theta1,' ',theta2)
#cost(X,y,theta1,theta2)
for i in range(epochs+1):
Deleta_1,Deleta_2=back_propagate(X,y,theta1,theta2)
a1,z2,a2,z3,res=forward_propagate(X,theta1,theta2)
if i%50==0:
print('error:',np.mean(np.abs(res-y)))#计算误差
theta1=theta1-lr*Deleta_1#进行梯度下降,调整权值
theta2=theta2-lr*Deleta_2
a1,z2,a2,z3,res=forward_propagate(X,theta1,theta2)
print('theta1:',theta1)
print('theta2:',theta2)
res
#cost=cost(X,y,theta1,theta2)
error: 0.040183617427798815
error: 0.037792867515279736
error: 0.03565109495819674
error: 0.03372308789986214
error: 0.031979741909257675
error: 0.030396830901489855
error: 0.028954055102851477
error: 0.027634297830494314
error: 0.026423040868398265
error: 0.025307901217641898
error: 0.024278261430987456
error: 0.023324972631312165
error: 0.022440114380234615
error: 0.021616799315726496
error: 0.02084901327611314
error: 0.0201314837296932
error: 0.019459570918586792
error: 0.018829177335373613
error: 0.018236672078206877
error: 0.017678827345008367
error: 0.017152764882015652
error: 0.016655910634840936
error: 0.016185956189981573
error: 0.01574082586288858
error: 0.015318648501479008
error: 0.014917733243664221
error: 0.014536548603465642
error: 0.014173704369793188
error: 0.013827935890556024
error: 0.013498090386740928
error: 0.01318311499983097
error: 0.012882046324045687
error: 0.012594001214464417
error: 0.012318168694768972
error: 0.012053802815420003
error: 0.011800216335601915
error: 0.011556775121061647
error: 0.011322893165701568
error: 0.011098028157998123
error: 0.010881677524456151
error: 0.01067337489171166
theta1: [[-1.68701807 6.3911878 6.1133159 ]
[ 1.76099424 7.3679309 -4.98657228]
[-0.56086751 -1.58570626 5.23497725]
[-0.69598197 -1.56579716 5.46369586]]
theta2: [[ 3.62659633 11.38297911 -9.38942738 -4.88871387 -5.35806465]]
matrix([[0.00209311],
[0.98738079],
[0.98903377],
[0.0169986 ]])
3.np.insert 和np.concate nate
b=np.array([[3,4],
[3,4],])
e=np.array([[6,5],
[6,5],])
x=np.ones((4,1))
#print(x)
f=np.concatenate((b,b))
a=np.concatenate((b,b),axis=0)#不写axis默认按列拼接
c=np.concatenate((b,e),axis=1)#按列拼接
d=np.concatenate((b,e),axis=-1)#按列拼接
print(f)
b=np.array([[3,4],
[3,4],])
e=np.array([[6,5],
[6,5],])
x=np.ones((4,1))
#a=np.insert(arr,obj,values,axis)
#arr原始数组,可一可多
#obj插入元素的位置
#values 插入内容
#axis按行插入还是按列(0,行,1列)
#f=np.insert(b,0,e,axis=0)
'''[[6 5]
[6 5]
[3 4]
[3 4]]'''
f=np.insert(b,1,e,axis=0)
'''[[3 4]
[6 5]
[6 5]
[3 4]]'''
f=np.insert(b,0,e,axis=1)
'''[[6 6 3 4]
[5 5 3 4]]'''
f=np.insert(b,1,e,axis=1)
'''[[3 6 6 4]
[3 5 5 4]]'''
print(f)