[note] deep learning tensorflow lecture 2 notes 深度学习笔记 (2) 解决过拟合

1. linear model complexity



Logistic Model is defined as: X*W + b = y

parameter W and b should be determined by optimization method. 

X is 1 by 784. 784 = 28*28

W is 784 by 10

b is 1 by 10

so number of parameters is 784*10 + 10

2. Rectified Linear Unit (ReLU)  and neutron networks

another activation function more like brain activation signal than sigmoid.


picture below shows a two layers neutron networks.



1.The first layer effectively consists of the set of weights and biases applied to X and passed through ReLUs. The output of this layer is fed to the next one, but is not observable outside the network, hence it is known as a hidden layer.
2.The second layer consists of the weights and biases applied to these intermediate outputs, followed by the softmax function to generate probabilities.


3. chain rule

chain rule is a concept in calculus and demonstrates the derivative of a function with a function as its input parameters.



 it has efficient data pipeline and lots of data reuse.

4.back propagation


forward propagation computes output y

back propagation computes all derivatives of weight matrices.

then we can update weight by new_weight = weight - alpha*derivative_weight.

back propagation need two times memory and computation than forward propagation.



5. Deep learning networks


实战(2)中我们实现了一个只有一个隐藏层的神经网络。

其与下图类似。



当然我们可以实现更加深层或更加广度的神经网络。


6.Early termination

在validation data 的准确度达到一定峰值时,要及时结束训练,来避免过拟合。


7.  Regularization 

将权重向量的2范数引入到loss中,作为惩罚项。


8. Drop out

多层神经网络中,一层的输出可一作为下一层的输入。

drop-out的意思是在上一层输出的节点中随机将选取的一半或其他一部分节点丢弃,并将剩下的节点作为下一层的输入。


当drop-out不起作用时,大概我们需要一个更大的神经网络的了。

使用drop-out有一些小技巧。

(1)在训练时,进行drop-out,并将结果放大两倍

(2)在评估时,不进行drop-out。





评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值