神经网络 (Neural Networks)

最新推荐文章于 2024-07-24 09:00:00 发布

wf4csdn

最新推荐文章于 2024-07-24 09:00:00 发布

阅读量240

点赞数

分类专栏：笔记文章标签：神经网络

本文链接：https://blog.csdn.net/wf4csdn/article/details/76206554

版权

笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1. Hypothesis 函数

1.二元分类：

以如下神经网络为例：

$\begin{bmatrix}x_0 \newline x_1 \newline x_2 \newline x_3\end{bmatrix}\rightarrow\begin{bmatrix}a_1^{(2)} \newline a_2^{(2)} \newline a_3^{(2)} \newline \end{bmatrix}\rightarrow h_\theta(x)$

其Hypothesis函数为：

$\begin{align*} a_1^{(2)} = g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3) \newline a_2^{(2)} = g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3) \newline a_3^{(2)} = g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3) \newline h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)}) \newline \end{align*}$

2.多元分类：

对如下神经网络：

$\begin{bmatrix}x_0 \newline x_1 \newline x_2 \newline \vdots\\ x_n\end{bmatrix}\rightarrow\begin{bmatrix}a_0^{(2)} \newline a_1^{(2)} \newline a_2^{(2)} \newline \vdots \end{bmatrix}\rightarrow \begin{bmatrix}a_0^{(3)} \newline a_1^{(3)} \newline a_2^{(3)} \newline \vdots \end{bmatrix}\rightarrow \dots \rightarrow \begin{bmatrix}h_\Theta(x)_1 \newline h_\Theta(x)_2 \newline h_\Theta(x)_3 \newline h_\Theta(x)_4 \end{bmatrix}$

有：

$h^{(i)}_\Theta(x)=\begin{bmatrix}h^{(i)}_\Theta(x)_1 \newline h^{(i)}_\Theta(x)_2 \newline h^{(i)}_\Theta(x)_3 \newline h^{(i)}_\Theta(x)_4 \end{bmatrix},\quad y^{(i)}\in\{\begin{bmatrix}1\\0\\0\\0\end{bmatrix},\begin{bmatrix}0\\1\\0\\0\end{bmatrix},\begin{bmatrix}0\\0\\1\\0\end{bmatrix},\begin{bmatrix}0\\0\\0\\1\end{bmatrix}\}$

其中 $h^{(i)}_\Theta(x)$ 计算方法与二元分类类似。
这一计算过程称为前向传播(forward propagation)。

2. Cost 函数

1. 定义

对多元分类神经网络，定义：
$L$ = 网络层数
$s_l$ = 第 $l$ 层的神经元数目（不包含bias units）
$K$ = 分类数目

则其Cost函数为（加入了正则项）：
$\begin{gather*} J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2\end{gather*}$

要获得使Cost函数值最小的 $\Theta$ 值，需求得 $\dfrac{\partial}{\partial \Theta_{i,j}^{(l)}}J(\Theta)$ 。
这一过程可通过反向传播(backpropagation)算法实现：

2. 反向传播算法

给定训练样本 $\lbrace (x^{(1)}, y^{(1)}) \cdots (x^{(m)}, y^{(m)})\rbrace$ ，
对所有 $(l,i,j)$ ，使 $\Delta^{(l)}_{i,j}:=0$ 作为初始值。

针对每个训练样本，做如下循环（ $t=1:m$ ）：
1. 使 $a^{(1)} := x^{(t)}$
2. 通过前向传播，计算出 $l=2,3,\dots,L$ 时 $a^{(l)}$ 的值
3. 借助 $y^{(t)}$ ，计算出 $\delta^{(L)} = a^{(L)} - y^{(t)}$
4. 使用公式 $\delta^{(l)} = ((\Theta^{(l)})^T \delta^{(l+1)})\ .*\ a^{(l)}\ .*\ (1 - a^{(l)})$ ，得到 $\delta^{(L-1)}, \delta^{(L-2)},\dots,\delta^{(2)}$ 的值
5. $\Delta^{(l)}_{i,j} := \Delta^{(l)}_{i,j} + a_j^{(l)} \delta_i^{(l+1)}$
循环结束。

根据循环结果，可求得：
$D^{(l)}_{i,j} := \dfrac{1}{m}\left(\Delta^{(l)}_{i,j} + \lambda\Theta^{(l)}_{i,j}\right),\ \text{if }j\ne0$
$D^{(l)}_{i,j} := \dfrac{1}{m}\Delta^{(l)}_{i,j},\ \text{if }j=0$

最后，有：
$\frac \partial {\partial \Theta_{ij}^{(l)}} J(\Theta)=D_{ij}^{(l)}$

wf4csdn

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
神经网络 (Neural Networks)

1. Hypothesis 函数1.二元分类：以如下神经网络为例：⎡⎣⎢⎢⎢x0x1x2x3⎤⎦⎥⎥⎥→⎡⎣⎢⎢⎢a(2)1a(2)2a(2)3⎤⎦⎥⎥⎥→hθ(x)\begin{bmatrix}x_0 \newline x_1 \newline x_2 \newline x_3\end{bmatrix}\rightarrow\begin{bmatrix}a_1^{(2)} \ne
复制链接

扫一扫