机器学习系列之coursera week 5 Neural Networks: Learning

目录

1.  Cost Function and Back Propagation

1.1 Cost Function

1.2 Back propagation algorithm

1.3 Back propagation intuition

2. Back Propagation in practice

2.1 Implementation note: unrolling parameters

2.2 Gradient checking

2.3 Random Initialization

2.4 Putting it together(summary)

3. 关于梯度的数学推导


1.  Cost Function and Back Propagation

1.1 Cost Function

Neural Network(classification)

L = total No of layers in networks

S(l) = No of units(not counting bias unit) in layer l

K = # output unit

E.g

L = 4

S1 = 3, S2 = S3 = 5, S4 = 4

图1

(引自coursera week 5 Cost Function)

Cost Function:

note: 不对bias unit的权值进行正则化

1.2 Back propagation algorithm

Gradient computation:

给定J(Θ),求:

Need code to compute:

-J(Θ)

-

Given one training example(x, y):

L = 4

S1 = 3, S2 = S3 = 5, S4 = 4

forword propagation:

back propagation algorithm:

Intuition: δj^l = error of node j in layer l

for each output unit(layer L=4)

algorithm:

(引自coursera week 5 Back propagation algorithm)

1.3 Back propagation intuition

 

2. Back Propagation in practice

2.1 Implementation note: unrolling parameters

将矩阵变成向量。

E.g.

thetavec = Theta1(:);

fminunc(@costFunction, initialTheta, options);

2.2 Gradient checking

Numerical estimation of gradients:

(引自coursera week 5 Gradient check)

parameter vector θ:

code:

thetaPlus = theta + Epsilon;

thetaMinus = theta - Epsilon;

gradApprox = (J(thetaPlus) - thetaMinus) / (2 * Epsilon)

check that gradApprox 约等于 Dvec

2.3 Random Initialization

Initial value of Θ

zero initialization: after each updata, parameters corresponding to inputs going into each of two hidden units are identical.

random initialization: symmetry breaking, initialize each Θ to random value in [-epsilon, epsilon].

E.g. theta = rand(0, 1)  *  (2  *  INIT_Epsilon) - INIT_Epsilon

2.4 Putting it together(summary)

Training a neural network:

pick a network architecture.

No. of input units: Dimension of features x(i)

No. of output units: #classes(when 2 classes, that be 1)

reasonable default: 1 hidden layer, or if > 1 hidden layer, have same No. of hidden units in every layer(ussually the more the better). note:每一隐层的单元数大概为输入特征数的相等,两倍,三倍,四倍都是可以接受的。

(1) Randomly initialize weights

(2) Implement forward propagation to get h(x)

(3) Implement code to compute cost function

(4) Implement back prop to compute partial derivatives

(5) use gradient checking to compare partial derivatives computed using back prop VS using numerical estimate. Then disable gradient checking code.

(6) use gradient descent or advanced optimization method with backpropation to try to minimize J(Θ) as a function of parameters Θ.

note: Neural networks' cost function is a non-convex function

 


3. 关于梯度的数学推导

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值