机器学习系列之coursera week 5 Neural Networks: Learning

最新推荐文章于 2019-06-02 19:09:00 发布

爱战术的码农新人

最新推荐文章于 2019-06-02 19:09:00 发布

阅读量289

点赞数

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/zyh3826/article/details/82118141

版权

目录

1. Cost Function and Back Propagation

1.1 Cost Function

1.2 Back propagation algorithm

1.3 Back propagation intuition

2. Back Propagation in practice

2.1 Implementation note: unrolling parameters

2.2 Gradient checking

2.3 Random Initialization

2.4 Putting it together(summary)

3. 关于梯度的数学推导

1. Cost Function and Back Propagation

1.1 Cost Function

Neural Network(classification)

L = total No of layers in networks

S(l) = No of units(not counting bias unit) in layer l

K = # output unit

E.g

L = 4

S1 = 3, S2 = S3 = 5, S4 = 4

图1

（引自coursera week 5 Cost Function）

Cost Function:

note: 不对bias unit的权值进行正则化

1.2 Back propagation algorithm

Gradient computation:

给定J(Θ)，求：

Need code to compute:

-J(Θ)

-

Given one training example(x, y):

L = 4

S1 = 3, S2 = S3 = 5, S4 = 4

forword propagation:

back propagation algorithm:

Intuition: δj^l = error of node j in layer l

for each output unit(layer L=4)

algorithm:

（引自coursera week 5 Back propagation algorithm）

1.3 Back propagation intuition

2. Back Propagation in practice

2.1 Implementation note: unrolling parameters

将矩阵变成向量。

E.g.

thetavec = Theta1(:);

fminunc(@costFunction, initialTheta, options);

2.2 Gradient checking

Numerical estimation of gradients:

(引自coursera week 5 Gradient check)

parameter vector θ:

code:

thetaPlus = theta + Epsilon;

thetaMinus = theta - Epsilon;

gradApprox = (J(thetaPlus) - thetaMinus) / (2 * Epsilon)

check that gradApprox 约等于 Dvec

2.3 Random Initialization

Initial value of Θ

zero initialization: after each updata, parameters corresponding to inputs going into each of two hidden units are identical.

random initialization: symmetry breaking, initialize each Θ to random value in [-epsilon, epsilon].

E.g. theta = rand(0, 1) * (2 * INIT_Epsilon) - INIT_Epsilon

2.4 Putting it together(summary)

Training a neural network:

pick a network architecture.

No. of input units: Dimension of features x(i)

No. of output units: #classes(when 2 classes, that be 1)

reasonable default: 1 hidden layer, or if > 1 hidden layer, have same No. of hidden units in every layer(ussually the more the better). note：每一隐层的单元数大概为输入特征数的相等，两倍，三倍，四倍都是可以接受的。

(1) Randomly initialize weights

(2) Implement forward propagation to get h(x)

(3) Implement code to compute cost function

(4) Implement back prop to compute partial derivatives

(5) use gradient checking to compare partial derivatives computed using back prop VS using numerical estimate. Then disable gradient checking code.

(6) use gradient descent or advanced optimization method with backpropation to try to minimize J(Θ) as a function of parameters Θ.

note: Neural networks' cost function is a non-convex function

3. 关于梯度的数学推导

爱战术的码农新人

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习系列之coursera week 5 Neural Networks: Learning

目录1. Cost Function and Back Propagation1.1 Cost Function1.2 Back propagation algorithm1.3 Back propagation intuition2. Back Propagation in practice2.1 Implementation note: unrolling param...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。