【学习笔记】【Coursera】【MachineLearning】Neural Networks

最新推荐文章于 2024-09-07 21:36:31 发布

维恩囧尼

最新推荐文章于 2024-09-07 21:36:31 发布

阅读量456

点赞数

分类专栏：机器学习学习笔记神经网络文章标签：神经网络机器学习

本文链接：https://blog.csdn.net/q449560173/article/details/53141066

版权

机器学习同时被 3 个专栏收录

3 篇文章 0 订阅

订阅专栏

学习笔记

2 篇文章 0 订阅

订阅专栏

神经网络

2 篇文章 0 订阅

订阅专栏

课程地址：https://www.coursera.org/learn/machine-learning/home/week/4

Representation

Scene

deal with non-linear classification/hypotheses with hundreds of thousands of features
belongs to classification

Model Representation

1. Neuron model: Logistic unit (no hidden layer)
  - input vector: $x= \begin{bmatrix} x_0\\x_1\\x_2\\x_3 \end{bmatrix}$ weights/parameters: $\theta= \begin{bmatrix} \theta_0\\ \theta_1\\ \theta_2\\ \theta_3 \end{bmatrix}$
  - bias unit: $x_0=1$
  - $h_\Theta(x)=\frac1{1+e^{-z}};z=\Theta^Tx$ : sigmoid (logistic) activation function
2. Neural Network (input layer 1; hidden layer 2; output layer 3)
  - $a_i^{(l)}$ = “activation” of unit $i$ in layer $l$
  - $L$ = total no. of layers in network
  - $s_l$ = no. of units(not counting bias unit) in layer $l$
  - bias unit: $x_0=1; a_0^{(2)}=1$ (not drawing in the picture)
  - $a_1^{(2)}=g(\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3)$
  - $h_\Theta(x)=a_1^{(3)}=g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)})$
  - $\Theta^{(l)}$ = matrix of weights controlling function mapping from layer $j$ to layer $l$ +1, will be of dimension $s_{l+1} \times (s_l + 1)$
  - $e.g.\Theta^{(1)}=\begin{bmatrix} \Theta_{10}^{(1)}&\Theta_{11}^{(1)}&\Theta_{12}^{(1)}&\Theta_{13}^{(1)}\\\Theta_{20}^{(1)}&\Theta_{21}^{(1)}&\Theta_{22}^{(1)}&\Theta_{23}^{(1)}\\\Theta_{30}^{(1)}&\Theta_{31}^{(1)}&\Theta_{32}^{(1)}&\Theta_{33}^{(1)} \end{bmatrix};size=3 \times 4$
  - $\{x^{(i)}, y^{(i)} \}$ = $i^{th}$ input
1. in Multi-class classification(K classes & K >= 3)
  $y \in \Bbb R^K$ , $h_\Theta(x) \in \Bbb R^K$ , $S_L = K$
  $y_k^{(i)}$ = $k^{th}$ value of $i^{th}$ target vector
  $(h_\Theta(x^{(i)}))_k$ = $k^{th}$ value of $i^{th}$ output vector
  $e.g. y^{(1)}= \begin{bmatrix} 1\\0\\0 \end{bmatrix} y^{(2)}= \begin{bmatrix} 0\\1\\0 \end{bmatrix} y^{(3)}= \begin{bmatrix} 0\\0\\1 \end{bmatrix} y_1^{(1)}=1$
2. in Binary classification(K = 1 or 2)
  $y \in 0\ or\ 1$ , $h_\Theta(x) \in \Bbb R$ , $S_L = 1$

Vectorization

$z_1^{(2)}=\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3;\ a_1^{(2)}=g(z_1^{(2)})$
$z^{(2)}=\Theta^{(1)}x;\ a^{(2)}=g(z^{(2)})$ => $a^{(2)}=\bigl( \begin{smallmatrix} a_1^{(2)}\\a_2^{(2)}\\a_3^{(2)} \end{smallmatrix} \bigl)$
Add $a_0^{(2)}=1$
$z^{(3)}=\Theta^{(2)}a^{(2)};\ a^{(3)}=g(z^{(3)})$

Cost Function

J (Θ) = - 1 m [\sum i = 1 m \sum k = 1 K y (i) k l o g (h Θ (x (i))) k + (1 - y (i) k) l o g (1 - (h Θ (x (i))) k)] + λ 2 m \sum l = 1 L - 1 \sum i = 1 s l \sum j = 1 s (l + 1) (Θ (l) j i) 2

$J(\Theta) = -\frac 1m \left[ \sum_{i=1}^m \sum_{k=1}^K y_k^{(i)} log(h_\Theta (x^{(i)}))_k + (1-y_k^{(i)}) log(1-(h_\Theta (x^{(i)}))_k) \right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_{(l+1)}} (\Theta_{ji}^{(l)})^2$

分别取输出向量（output）与目标向量（target）的一个对应元素（ $(h_\Theta(x^{(i)}))_k$ 和 $y_k^{(i)}$ ）代入式中求值
$C = y (i) k l o g (h Θ (x (i))) k + (1 - y (i) k) l o g (1 - (h Θ (x (i))) k$ $C = y_k^{(i)} log(h_\Theta (x^{(i)}))_k + (1-y_k^{(i)}) log(1-(h_\Theta (x^{(i)}))_k$
计算所有矩阵中的所有元素求得cost
$J (Θ) = - 1 m \sum i = 1 m \sum k = 1 K C$ $J(\Theta) = -\frac 1m \sum_{i=1}^m \sum_{k=1}^K C$
加上正则化项（regularization term），其值为所有 $\Theta$ 矩阵元素的平方和，再乘以惩罚率 $\lambda$ （ $\Theta_{j0}$ 对应偏项bias term，通常不计入计算）
$+ λ 2 m \sum l = 1 L - 1 \sum i = 1 s l \sum j = 1 s (l + 1) (Θ (l) j i) 2$ $+\frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_{(l+1)}} (\Theta_{ji}^{(l)})^2$

维恩囧尼

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【学习笔记】【Coursera】【MachineLearning】Neural Networks

Note about neural networks of machine learning in CourseraIncluding representation, model and cost function
复制链接

扫一扫

专栏目录