文章目录
一、前置数学公式
神经网络是由一个一个神经元组成,其基本结构如下:
无数以链式结构组成在一起的神经元构成了神经网络,设置合理的
w
w
w和
b
b
b可以拟合任何数学模型,训练神经网络的本质就是利用梯度下降求解最佳
w
w
w和
b
b
b。由于神经网络颇为复杂,无法直接求解某个参数的梯度,所以主流的方法是采用反向传播(
B
P
BP
BP)的方法一步步前推,具体详见深度学习笔记(1)——神经网络详解及改进。
1、 c r o s s cross cross e n t r o p y entropy entropy交叉熵
L
(
p
,
q
)
=
−
∑
i
=
1
n
p
(
x
i
)
∗
log
q
(
x
i
)
L(p,q)=-\sum_{i=1}^{n} p(x_i)*\log q(x_i)
L(p,q)=−i=1∑np(xi)∗logq(xi)
交叉熵一般用作于分类问题的损失函数,其结果越小,
p
(
x
)
p(x)
p(x)和
q
(
x
)
q(x)
q(x)的分布越接近。
2、 s i g m o i d sigmoid sigmoid
a
=
σ
(
x
)
=
1
1
+
e
−
x
a=\sigma(x)=\frac{1}{1+e^{-x}}
a=σ(x)=1+e−x1
s
i
g
m
o
i
d
sigmoid
sigmoid作为最简单的激活函数,在
x
x
x趋近于负无穷时,
y
y
y趋近于0,
x
x
x趋近于正无穷时,
y
y
y趋近于1。
- 对
s
i
g
m
o
i
d
sigmoid
sigmoid求导
a ′ = σ ( x ) ( 1 − σ ( x ) ) a'=\sigma(x)(1-\sigma(x)) a′=σ(x)(1−σ(x)) - 损失函数
l
l
l对
z
z
z求导:
∂
l
/
∂
z
\partial l/\partial z
∂l/∂z
∂ l ∂ z i = σ ′ ( z ) ∑ w i ∂ l ∂ z i + 1 \frac{\partial l}{\partial z_i}=\sigma'(z)\sum w_i\frac{\partial l}{\partial z_{i+1}} ∂zi∂l=σ′(z)∑wi∂zi+1∂l
def sigmoid(z, driv=False): # driv 微分
if(driv == True): # 微分(输入z,返回dz)
return sigmoid(z)*(1-sigmoid(z))
return 1/(1+np.exp(-z))
3、 s o f t m a x softmax softmax
a
=
e
z
i
∑
j
n
e
z
j
a=\frac{e^{z_i}}{\sum_{j}^ne^{z_j}}
a=∑jnezjezi
s
o
f
t
m
a
x
softmax
softmax是指数除以指数之和,一般作为分类任务的输出层,输出结果可以认为是几个类别选择的概率。
例如三分类问题,
z
1
=
3
,
z
2
=
1
,
z
3
=
−
3
z_1=3,z_2=1,z_3=-3
z1=3,z2=1,z3=−3,经过
s
o
f
t
m
a
x
softmax
softmax后
a
1
=
0.88
,
a
2
=
0.12
,
a
3
=
0
a_1=0.88,a_2=0.12,a_3=0
a1=0.88,a2=0.12,a3=0。所以通俗得讲
s
o
f
t
m
a
x
softmax
softmax的作用就是对输出的最大值进行强化,并将其归一化至
[
0
,
1
]
[0,1]
[0,1]之间。
-
对 s o f t m a x softmax softmax求导
如果 i = j i=j i=j:
∂ a i ∂ z i = a i ( 1 − a i ) \frac{\partial a_i}{\partial z_i}=a_i(1-a_i) ∂zi∂ai=ai(1−ai)
如果 i ≠ j i\neq j i=j:
∂ a j ∂ z i = − a i a j \frac{\partial a_j}{\partial z_i}=-a_ia_j ∂zi∂aj=−aiaj -
损失函数 l l l对 z z z求导: ∂ l / ∂ z \partial l/\partial z ∂l/∂z
∂ l ∂ z i = ∂ l ∂ a j ∂ a j ∂ z i = a i − y i \frac{\partial l}{\partial z_i}=\frac{\partial l}{\partial a_j}\frac{\partial a_j}{\partial z_i}=a_i-y_i ∂zi∂l=∂aj∂l∂zi∂aj=ai−yi
def softmax(self, z=None, a=None, y=None, driv=False): # softmax层
if(driv==True): # 损失函数对z的微分
return a-y
size = z.shape[0]
log_out = np.exp(z) # 对输出结果取ln
log_sum = np.sum(log_out,axis=1).reshape(size,1) # sum(e^output)
return np.exp(z)/log_sum
二、代码编写
1、数据形式
-
训练集 x x x:
x x x是一个 s i z e ∗ d i m e n s i o n size*dimension size∗dimension的二维数组, s i z e size size代表训练集 x x x的数量, d i m e n s i o n dimension dimension代表训练集的维度。
x = [ x 1 1 x 2 1 . . . x d 1 x 1 2 x 2 2 . . . x d 2 . . . . . . . . . x 1 s x 2 s . . . x d s ] s i z e ∗ d i m e n s i o n x = \begin{bmatrix} x_1^1 & x_{2}^1 & ... & x_{d}^1 \\ x_{1}^2 & x_{2}^2 & ... & x_{d}^2 \\ ...&...&&...\\x_{1}^s&x_{2}^s&...&x_{d}^s \end{bmatrix}_{size*dimension} x=⎣⎢⎢⎡x11x12...x1sx21x22...x2s.........xd1xd2...xds⎦⎥⎥⎤size∗dimension ,其中 x j i x_j^i xji代表第 i i i组数据的第 j j j个输入值。 -
神经元值 a a a:
a a a是一个 l a y e r s ∗ s i z e ∗ u n i t s layers*size*units layers∗size∗units的三维数组, l a y e r s layers layers代表神经网络层数, s i z e size size代表训练集数量, u n i t s units units代表该层网络有几个神经元。
a [ i ] = [ a 1 1 a 2 1 . . . a u 1 a 1 2 a 2 2 . . . a u 2 . . . . . . . . . a 1 s a 2 s . . . a u s ] s i z e ∗ u n i t s a[i] =\begin{bmatrix} a_1^1 & a_{2}^1 & ... & a_u^1 \\ a_{1}^2 & a_{2}^2 & ... & a_u^2 \\ ...&...&&...\\a_{1}^s&a_{2}^s&...&a_u^s \end{bmatrix}_{size*units} a[i]=⎣⎢⎢⎡a11a12...a1sa21a22...a2s.........au1au2...aus⎦⎥⎥⎤size∗units,其中 a [ i ] [ j ] [ k ] a[i][j][k] a[i][j][k]代表第 i i i层网络,第 j j j组数据,第 k k k个神经元的值。例: a [ 1 ] = x a[1]=x a[1]=x,第1层网络就是输入层。 -
未经过激活函数的神经元 z z z
数据形式和 a a a相同。 -
损失函数 l l l对 z z z的微分 d z dz dz
数据形式和 a a a相同。
d z [ i ] = [ d z 1 1 d z 2 1 . . . d z u 1 d z 1 2 d z 2 2 . . . d z u 2 . . . . . . . . . d z 1 s d z 2 s . . . d z u s ] s i z e ∗ u n i t s dz[i] = \begin{bmatrix} dz_{1}^1 & dz^1_{2} & ... & dz^1_u \\ dz^2_{1} & dz^2_{2} & ... & dz^2_{u} \\ ...&...&&...\\dz^s_{1}&dz^s_{2}&...&dz^s_{u} \end{bmatrix}_{size*units} dz[i]=⎣⎢⎢⎡dz11dz12...dz1sdz21dz22...dz2s.........dzu1dzu2...dzus⎦⎥⎥⎤size∗units -
偏差 b b b
b b b是一个 l a y e r s ∗ u n i t s layers*units layers∗units的二维数组, b [ i ] [ j ] b[i][j] b[i][j]代表第 i i i层网络第 j j j个神经元上的偏差。 -
权重 w w w
w w w是一个 ( l a y e r s − 1 ) ∗ u n i t s 1 ∗ u n i t s 2 (layers-1)*units1*units2 (layers−1)∗units1∗units2的三维数组, w [ i ] w[i] w[i]表示第 i i i层和第 i + 1 i+1 i+1层之间的权值, u n i t s 1 units1 units1是第 i i i层的神经元个数, u n i t s 2 units2 units2是第 i + 1 i+1 i+1层的神经元个数。
w [ i ] = [ w 11 w 12 . . . w 1 u 2 w 21 w 22 . . . w 2 u 2 . . . . . . . . . w u 1 1 w u 1 2 . . . w u 1 u 2 ] u n i t s 1 ∗ u n i t s 2 w[i] = \begin{bmatrix} w_{11} & w_{12} & ... & w_{1u_{2}} \\ w_{21} & w_{22} & ... & w_{2u_{2}} \\ ...&...&&...\\w_{u_{1}1}&w_{u_{1}2}&...&w_{u_1u_2} \end{bmatrix}_{units1*units2} w[i]=⎣⎢⎢⎡w11w21...wu11w12w22...wu12.........w1u2w2u2...wu1u2⎦⎥⎥⎤units1∗units2, w [ i ] [ j ] [ k ] w[i][j][k] w[i][j][k]代表第 i i i层第 j j j个神经元和第 i + 1 i+1 i+1层第 k k k个神经元之间的权重。
1、前向传播
比较简单,每一层的值等于前一层的值矩阵乘权重加上
b
i
a
s
bias
bias,用下面这个三层网络(输入层+隐藏层+输出层)作例子。
z
=
[
x
1
1
x
1
2
.
.
.
x
1
s
]
∗
w
1
+
[
x
2
1
x
2
2
.
.
.
x
2
s
]
∗
w
2
+
b
z=\begin{bmatrix} x^1_1 \\ x^2_1 \\ ... \\ x^s_1\end{bmatrix}*w_1+\begin{bmatrix} x^1_2 \\ x^2_2 \\ ... \\ x^s_2\end{bmatrix}*w_2+b
z=⎣⎢⎢⎡x11x12...x1s⎦⎥⎥⎤∗w1+⎣⎢⎢⎡x21x22...x2s⎦⎥⎥⎤∗w2+b
用矩阵表达:
[
z
1
z
2
.
.
.
z
s
]
s
i
z
e
∗
1
=
[
x
1
1
(
a
1
1
)
x
2
1
(
a
2
1
)
x
1
2
(
a
1
2
)
x
2
2
(
a
2
2
)
.
.
.
.
.
.
x
1
s
(
a
1
s
)
x
2
s
(
a
2
s
)
]
s
i
z
e
∗
2
[
w
1
w
2
]
+
b
\begin{bmatrix} z^1 \\ z^2 \\ ... \\ z^s\end{bmatrix}_{size*1}=\begin{bmatrix} x^1_1(a^1_1) & x^1_2(a^1_2) \\x^2_1(a^2_1) & x^2_2(a^2_2) \\... & ...\\ x^s_1(a^s_1) & x^s_2(a^s_2) \end{bmatrix}_{size*2}\begin{bmatrix} w_1 \\ w_2\end{bmatrix}+b
⎣⎢⎢⎡z1z2...zs⎦⎥⎥⎤size∗1=⎣⎢⎢⎡x11(a11)x12(a12)...x1s(a1s)x21(a21)x22(a22)...x2s(a2s)⎦⎥⎥⎤size∗2[w1w2]+b
公式扩展到每层网络:
z
i
+
1
=
a
i
∗
w
i
+
b
i
,
a
i
+
1
=
σ
(
z
i
+
1
)
z_{i+1}=a_i*w_i+b_i,a_{i+1}=\sigma(z_{i+1})
zi+1=ai∗wi+bi,ai+1=σ(zi+1)
- i i i代表网络第 i i i层数
- a i + 1 a_{i+1} ai+1:第 i + 1 i+1 i+1层网络的神经元值
- w i w_i wi:第 i i i和 i + 1 i+1 i+1层网络之间的权值
def ForwardPropagation(x_train, num_layers): # 神经网络的结果
'''
@param num_layers:网络层数
'''
z = {} # 未经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a = {} # 经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a[0] = x_train
for i in range(self.num_layers-2):
z[i+1] = a[i]@w[i]+b[i]
a[i+1] = sigmoid(z[i+1])
# softmax层
z[num_layers-1] = a[num_layers-2]@w[num_layers-2]+b[num_layers-2]
a[num_layers-1] = softmax(z[num_layers-1])
return z,a
2、反向传播
还是上面那张图。
d
l
d
z
=
d
l
d
a
d
a
d
z
=
(
w
3
∗
d
z
′
+
w
4
∗
d
z
′
′
)
∗
d
a
d
z
\frac{dl}{dz}=\frac{dl}{da}\frac{da}{dz}=(w_3*dz'+w_4*dz'')*\frac{da}{dz}
dzdl=dadldzda=(w3∗dz′+w4∗dz′′)∗dzda
用矩阵表达:
[
d
z
1
d
z
2
.
.
.
d
z
s
]
s
i
z
e
∗
1
=
[
d
z
′
1
d
z
′
′
1
d
z
′
2
d
z
′
′
2
.
.
.
.
.
.
d
z
′
s
d
z
′
′
s
]
s
i
z
e
∗
2
[
w
3
w
4
]
.
T
[
d
a
d
z
1
d
a
d
z
2
.
.
.
d
a
d
z
s
]
s
i
z
e
∗
1
\begin{bmatrix}dz^1\\dz^2\\...\\ dz^s\end{bmatrix}_{size*1}=\begin{bmatrix}dz'^1 & dz''^1\\dz'^2 & dz''^2\\... & ...\\ dz'^s & dz''^s\end{bmatrix}_{size*2}\begin{bmatrix}w_3 & w_4\end{bmatrix}.T\begin{bmatrix}\frac{da}{dz}^1\\\frac{da}{dz}^2\\...\\ \frac{da}{dz}^s\end{bmatrix}_{size*1}
⎣⎢⎢⎡dz1dz2...dzs⎦⎥⎥⎤size∗1=⎣⎢⎢⎡dz′1dz′2...dz′sdz′′1dz′′2...dz′′s⎦⎥⎥⎤size∗2[w3w4].T⎣⎢⎢⎢⎡dzda1dzda2...dzdas⎦⎥⎥⎥⎤size∗1
也就相当于
d
z
2
=
d
z
3
∗
w
2
.
T
∗
d
a
2
d
z
2
dz_{2}=dz_{3}*w_2.T*\frac{da_2}{dz_2}
dz2=dz3∗w2.T∗dz2da2
扩展到每一层:
d
z
i
=
d
z
i
+
1
∗
w
i
∗
d
a
i
d
z
i
dz_i=dz_{i+1}*w_i*\frac{da_i}{dz_i}
dzi=dzi+1∗wi∗dzidai
def BackPropagation(z, a, y_train): # bp计算Loss对z的偏微分
dz = {} # dl/dz [第x个层][第y个数据, 第z个神经元]
# softmax层
dz[num_layers-1] = softmax(a[num_layers-1],y_train,driv=True)/y_train.shape[0]
# 隐藏层
for i in range(num_layers-2,0,-1):
dz[i] = dz[i+1]@w[i].T*sigmoid(z[i],driv=True)
return dz
3、计算 d w dw dw和 d b db db
需要每一组数据集
x
x
x分开来算,最后结果求平均。依旧如图:
d
w
3
=
d
l
d
w
3
=
m
e
a
n
(
[
d
w
3
1
d
w
3
2
.
.
.
d
w
3
s
]
s
i
z
e
∗
1
)
=
a
∗
[
d
z
′
1
d
z
′
2
.
.
.
d
z
′
s
]
s
i
z
e
∗
1
dw_3=\frac{dl}{dw_3}=mean(\begin{bmatrix}dw^1_3\\dw^2_3\\...\\ dw^s_3\end{bmatrix}_{size*1})=a*\begin{bmatrix}dz'^1\\dz'^2\\...\\ dz'^s\end{bmatrix}_{size*1}
dw3=dw3dl=mean(⎣⎢⎢⎡dw31dw32...dw3s⎦⎥⎥⎤size∗1)=a∗⎣⎢⎢⎡dz′1dz′2...dz′s⎦⎥⎥⎤size∗1
用矩阵表达:
a
[
i
]
=
[
a
1
1
a
2
1
.
.
.
a
u
1
1
a
1
2
a
2
2
.
.
.
a
u
1
2
.
.
.
.
.
.
.
.
.
a
1
s
a
2
s
.
.
.
a
u
1
s
]
s
i
z
e
∗
u
n
i
t
s
1
,
a
[
i
]
[
j
]
=
[
a
1
j
a
2
j
.
.
.
a
u
1
j
]
a[i] = \begin{bmatrix} a_1^1 & a_{2}^1 & ... & a_{u1}^1 \\ a_{1}^2 & a_{2}^2 & ... & a_{u1}^2 \\ ...&...&&...\\a_{1}^s&a_{2}^s&...&a_{u1}^s \end{bmatrix}_{size*units1},a[i][j]=\begin{bmatrix} a^j_1 &a^j_2&...&a^j_{u1}\end{bmatrix}
a[i]=⎣⎢⎢⎡a11a12...a1sa21a22...a2s.........au11au12...au1s⎦⎥⎥⎤size∗units1,a[i][j]=[a1ja2j...au1j]
d z [ i ] = [ d z 1 1 d z 2 1 . . . d z u 2 1 d z 1 2 d z 2 2 . . . d z u 2 2 . . . . . . . . . d z 1 s d z 2 s . . . d z u 2 s ] s i z e ∗ u n i t s 2 , d z [ i ] [ j ] = [ d z 1 j d z 2 j . . . d z u 2 j ] dz[i]=\begin{bmatrix} dz_{1}^1 & dz^1_{2} & ... & dz^1_{u2} \\ dz^2_{1} & dz^2_{2} & ... & dz^2_{u2} \\ ...&...&&...\\dz^s_{1}&dz^s_{2}&...&dz^s_{u2} \end{bmatrix}_{size*units2},dz[i][j]=\begin{bmatrix} dz^j_1 &dz^j_2&...&dz^j_{u2}\end{bmatrix} dz[i]=⎣⎢⎢⎡dz11dz12...dz1sdz21dz22...dz2s.........dzu21dzu22...dzu2s⎦⎥⎥⎤size∗units2,dz[i][j]=[dz1jdz2j...dzu2j]
用其中一组 x x x得到的前一层的 a a a和下一层的 d z dz dz的矩阵乘: a [ i ] [ j ] . T ∗ d z [ i + 1 ] [ j ] = [ a 1 j a 2 j . . . a u 1 j ] [ d z 1 j d z 2 j . . . d z u 2 j ] a[i][j].T*dz[i+1][j]=\begin{bmatrix} a^j_1 \\a^j_2\\...\\a^j_{u1}\end{bmatrix}\begin{bmatrix} dz_{1}^j & dz^j_{2} & ... & dz^j_{u2} \end{bmatrix} a[i][j].T∗dz[i+1][j]=⎣⎢⎢⎡a1ja2j...au1j⎦⎥⎥⎤[dz1jdz2j...dzu2j]得到一个 u n i t s 1 ∗ u n i t s 2 units1*units2 units1∗units2的矩阵(对应 w [ i ] w[i] w[i])。重复 s s s次取平均值。
def cal_driv(dz,a): # 计算dw和db
dw =0
db = np.mean(dz,axis=0)
for i in range(dz.shape[0]):
dw = dw+a[[i]].T@dz[[i]]
dw = dw/a.shape[0]
return dw,db
4、梯度下降
没什么可说的:
w
=
w
−
d
w
∗
l
e
a
r
n
i
n
g
r
a
t
e
,
b
=
b
−
d
b
∗
l
e
a
r
n
i
n
g
r
a
t
e
w = w-dw*learningrate,b = b-db*learningrate
w=w−dw∗learningrate,b=b−db∗learningrate
def gradient_decent(x_train, y_train):
size = x_train.shape[0]
z,a = ForwardPropagation(x_train) # 进行一次前馈运算
dz = BackPropagation(z,a,y_train)
# 对 w,b进行梯度下降
for i in range(self.num_layers-1):
dw,db = cal_driv(dz[i+1],a[i])
self.w[i] = self.w[i] - dw*learning_rate
self.b[i] = self.b[i] - db*learning_rate
5、代码总览
import numpy as np
class BP_network():
def __init__(self,input_dim, hidden_layer, output_dim):
self.input_dim = input_dim
self.output_dim = output_dim
self.w = {} # 利用字典存储 w
self.b = {} # 利用字典存储 bias
self.num_layers = len(hidden_layer)+2 # 网络层数
hidden_layer.insert(0,input_dim)
hidden_layer.append(output_dim)
for i in range(len(hidden_layer)-1):
self.w[i] = np.random.rand(hidden_layer[i],hidden_layer[i+1])
self.b[i] = np.zeros((1,hidden_layer[i+1]))
def sigmoid(self, z, driv=False): # driv 微分
if(driv == True): # 微分(输入z,返回dz)
return self.sigmoid(z)*(1-self.sigmoid(z))
return 1/(1+np.exp(-z))
def softmax(self, z=None, a=None, y=None, driv=False): # softmax层
if(driv==True): # loss函数对z微分
return a-y
size = z.shape[0]
log_out = np.exp(z) # 对输出结果取ln
log_sum = np.sum(log_out,axis=1).reshape(size,1) # sum(e^output)
return np.exp(z)/log_sum
def fit(self, x_train, y_train, batch_size, learning_rate=0.01, epochs=20):
def Loss(output, y_train): # 交叉熵
return -np.sum(y_train*np.log(output))/output.shape[0]
def ForwardPropagation(x_train): # 神经网络的结果
z = {} # 未经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a = {} # 经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a[0] = x_train
for i in range(self.num_layers-2):
z[i+1] = a[i]@self.w[i]+self.b[i]
a[i+1] = self.sigmoid(z[i+1])
# softmax层
z[self.num_layers-1] = a[self.num_layers-2]@self.w[self.num_layers-2]+self.b[self.num_layers-2]
a[self.num_layers-1] = self.softmax(z[self.num_layers-1])
return z,a
def BackPropagation(z, a, y_train): # bp计算Loss对z的偏微分
dz = {} # dl/dz [第x个层][第y个数据, 第z个神经元]
# softmax层
dz[self.num_layers-1] = self.softmax(a=a[self.num_layers-1],y=y_train,driv=True)/y_train.shape[0]
# 隐藏层
for i in range(self.num_layers-2,0,-1):
dz[i] = dz[i+1]@self.w[i].T*self.sigmoid(z[i],driv=True)
return dz
def cal_driv(dz,a): # 计算dw和db
dw = 0
db = np.mean(dz,axis=0)
for i in range(dz.shape[0]):
dw = dw+a[[i]].T@dz[[i]]
dw = dw/a.shape[0]
return dw,db
def gradient_decent(x_train, y_train):
size = x_train.shape[0]
z,a = ForwardPropagation(x_train) # 进行一次前馈运算
dz = BackPropagation(z,a,y_train)
# 对 w,b进行梯度下降
for i in range(self.num_layers-1):
dw,db = cal_driv(dz[i+1],a[i])
self.w[i] = self.w[i] - dw*learning_rate
self.b[i] = self.b[i] - db*learning_rate
size = x_train.shape[0] # 数据集数量
for i in range(epochs):
for j in range(0,size,batch_size):
gradient_decent(x_train[j:j+batch_size],y_train[j:j+batch_size])
z,a = ForwardPropagation(x_train)
loss = Loss(a[self.num_layers-1], y_train) # 计算loss
accuracy = self.accuracy(self.predict(x_train),y_train)
print('第%d个epochs: Loss=%f,accuracy:%f' %(i+1,loss,accuracy))
def predict(self, x_train):
z = {} # 未经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a = {} # 经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a[0] = x_train
for i in range(self.num_layers-2):
z[i+1] = a[i]@self.w[i]+self.b[i]
a[i+1] = self.sigmoid(z[i+1])
# softmax层
z[self.num_layers-1] = a[self.num_layers-2]@self.w[self.num_layers-2]+self.b[self.num_layers-2]
a[self.num_layers-1] = self.softmax(z[self.num_layers-1])
y = np.zeros(a[self.num_layers-1].shape)
y[a[self.num_layers-1]-a[self.num_layers-1].max(axis=1).reshape(a[self.num_layers-1].shape[0],1)>=0]=1
return y
def accuracy(self,y,y_true):
return np.sum(np.where(y==1)[1]==np.where(y_true==1)[1])/y.shape[0]
6、代码测试
准备数据集(简单的三分类数据集):
x_train = np.random.randint(0,100,size=(100,2))
temp = (2*x_train[:,0]+x_train[:,1])
y_train = np.zeros((100,3))
y_train[temp>150,0] = 1
y_train[temp<50,1] = 1
y_train[((temp<=150) & (temp>=50)),2] = 1
x_train = (x_train-x_train.min(axis=0))/(x_train.max(axis=0)-x_train.min(axis=0))
训练网络
bp = BP_network(input_dim=2, hidden_layer=[5,5], output_dim=3)
bp.fit(x_train,y_train,batch_size=10,learning_rate=0.1,epochs=2000)
结果
第1个epochs: Loss=1.053458,accuracy:0.390000
第2个epochs: Loss=1.005622,accuracy:0.390000
第3个epochs: Loss=0.968565,accuracy:0.390000
第4个epochs: Loss=0.939970,accuracy:0.390000
第5个epochs: Loss=0.917880,accuracy:0.500000
第6个epochs: Loss=0.900729,accuracy:0.570000
第7个epochs: Loss=0.887299,accuracy:0.570000
第8个epochs: Loss=0.876668,accuracy:0.570000
第9个epochs: Loss=0.868147,accuracy:0.570000
第10个epochs: Loss=0.861226,accuracy:0.570000
...
第1997个epochs: Loss=0.561577,accuracy:0.830000
第1998个epochs: Loss=0.561012,accuracy:0.830000
第1999个epochs: Loss=0.560447,accuracy:0.830000
第2000个epochs: Loss=0.559881,accuracy:0.840000
代码正常运行