机器学习笔记 ---- Neural Networks

Neural Network

1. Model Summary

At a very simple level, neurons are basically computational units that take inputs (dendrites) as electrical inputs (called “spikes”) that are channeled to outputs (axons). In our model, our dendrites are like the input features x 1 ⋯ x n x_1\cdots x_n x1xn and the output is the result of our hypothesis function. In this model our x 0 = 1 x_{0}=1 x0=1 input node is sometimes called the “bias unit.” It is always equal to 1. In neural networks, we use the same logistic function as in classification, 1 1 + e − θ T x \frac{1}{1 + e^{-\theta^Tx}} 1+eθTx1, yet we sometimes call it a sigmoid (logistic) activation function. In this situation, our “theta” parameters are sometimes called “weights”.
Visually, a simplistic representation looks like:

[ x 0 x 1 x 2 ] → [ ] → h θ ( x ) \begin{bmatrix} x_0 \\ x_1 \\ x_2 \end{bmatrix} →[ \qquad ]→h_θ(x) x0x1x2[]hθ(x)

three layers: input layer / hidden layer / output layer
a i ( j ) a_{i}^{(j)} ai(j) : activation unit i in layer j
Θ ( j ) \Theta^{(j)} Θ(j) : Matrix that controls function mapping from j-th layer to (j+1)-th layer
If layer j has s j s_{j} sj units, layer j+1 has s j + 1 s_{j+1} sj+1 units, then size of Θ ( j ) \Theta^{(j)} Θ(j) is s j + 1 ∗ ( s j + 1 ) s_{j+1}*(s_{j}+1) sj+1(sj+1)
L L L : Number of Layers
s l s_l sl : Number of units in l-th layer
Number of Inputs: the dimension of features in x ( i ) x^{(i)} x(i)
Binary Classification: 1 output unit
K-classes Classification: K output unit

2. Forward Propagation

  1. Add a x ( 0 ) = 1 a_x^{(0)}=1 ax(0)=1 first
  2. z x + 1 = Θ ( x ) a x z_{x+1}=\Theta^{(x)}a_x zx+1=Θ(x)ax
  3. a x + 1 = g ( z x + 1 ) a_{x+1}=g(z_{x+1}) ax+1=g(zx+1) — g(x) : Sigmoid

3. Cost Function

Cost Function
Excluding Bias Term

4. Backpropagation Algorithm

δ j ( l ) \delta_j^{(l)} δj(l) error of node j in layer l, then
δ ( L ) = a ( L ) − y δ ( i ) = ( Θ ( i ) ) T δ ( i + 1 ) . ∗ g ′ ( z ( i ) ) ( i ! = L , i ! = 1 ) \delta^{(L)}=a^{(L)}-y\\ \delta^{(i)}=(\Theta^{(i)})^T\delta^{(i+1)}.*g'(z^{(i)}) \qquad (i!=L,i!=1) δ(L)=a(L)yδ(i)=(Θ(i))Tδ(i+1).g(z(i))(i!=L,i!=1)
where g ′ ( z ( i ) ) = a ( i ) . ∗ ( 1 − a ( i ) ) g'(z^{(i)})=a^{(i)}.*(1-a^{(i)}) g(z(i))=a(i).(1a(i))
这里写图片描述
One thing to note: use one training set to train the model at one time!

5. Unrolling Parameters

Enroll them to vectors/Get back:
这里写图片描述
这里写图片描述

6.Gradient Checking

这里写图片描述

When learning, turn off gradient checking!!!

7. Random Initialization

这里写图片描述

8.Network Architecture

one hidden layer/
more than one hidden layer with same number of units

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值