目录
1.写在前面
BP神经网络算法作为作为机器学习最基础的算法,非常适合入门。透彻掌握其原理将对于今后的机器学习有很大的帮助。
2.BP神经网络推导
2.1前向传播
前向传播过程可以表示为:
O [ l ] = σ ( w [ l ] I [ l − 1 ] + b [ l ] ) O^{[l]}=\sigma\left(w^{[l]} I^{[l-1]}+b^{[l]}\right) O[l]=σ(w[l]I[l−1]+b[l])
2.2反向传播
2.2.1求解梯度矩阵
假设函数 f : R n × 1 → R f:R^{n \times 1} \rightarrow R f:Rn×1→R 将输入的列向量(shape: n × 1 n \times 1 n×1 )映射为一个实数。那么,函数 f f f 的梯度定义为:
∇ x f ( x ) = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] \nabla_{x} f(x)=\left[\begin{array}{c}\frac{\partial f(x)}{\partial x_{1}} \\ \frac{\partial f(x)}{\partial x_{2}} \\ \vdots \\ \frac{\partial f(x)}{\partial x_{n}}\end{array}\right] ∇xf(x)=⎣⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎤
同理,假设函数 f : R m × n → R f: R^{m \times n} \rightarrow R f:Rm×n→R 将输入的矩阵(shape: m × n m \times n m×n )映射为一个实数。函数 f f f 的梯度定义为:
∇ A f ( A ) = [ ∂ f ( A ) ∂ A 11 ∂ f ( A ) ∂ A 12 … ∂ f ( A ) ∂ A 13 ∂ f ( A ) ∂ A 21 ∂ f ( A ) ∂ A 22 … ∂ f ( A ) ∂ A 2 n ⋮ ⋮ ⋱ ⋮ ∂ f ( A ) ∂ A m 1 ∂ f ( A ) ∂ A m 2 … ∂ f ( A ) ∂ A m n ] \nabla_{A} f(A)=\left[\begin{array}{cccc}\frac{\partial f(A)}{\partial A_{11}} & \frac{\partial f(A)}{\partial A_{12}} & \dots & \frac{\partial f(A)}{\partial A_{13}} \\ \frac{\partial f(A)}{\partial A_{21}} & \frac{\partial f(A)}{\partial A_{22}} & \dots & \frac{\partial f(A)}{\partial A_{2 n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f(A)}{\partial A_{m 1}} & \frac{\partial f(A)}{\partial A_{m 2}} & \dots & \frac{\partial f(A)}{\partial A_{m n}}\end{array}\right] ∇Af(A)=⎣⎢⎢⎢⎢⎡∂A11∂f(A)∂A21∂f(A)⋮∂Am1∂f(A)∂A12∂f(A)∂A22∂f(A)⋮∂Am2∂f(A)……⋱…∂A13∂f(A)∂A2n