Coursera ML笔记5

分类

  • Binaryclassification : K=1 ouutput unit
  • Multi-class classification : K output unit(K>=3)

cost function

  • L = total number of layers in the network
  • sl = number of units (not counting bias unit) in layer l
  • K = number of output units/classes

logistic regression:
J(θ)=1mmi=1[y(i) log(hθ(x(i)))+(1y(i)) log(1hθ(x(i)))]+λ2mnj=1θ2j
neural networks:

J(Θ)=1mi=1mk=1K[y(i)klog((hΘ(x(i)))k)+(1y(i)k)log(1(hΘ(x(i)))k)]+λ2ml=1L1i=1slj=1sl+1(Θ(l)j,i)2

Forwardpropagation Algorithm

Backpropagation Algorithm

calculation

For training example t =1 to m:
1. Set a(1):=x(t)
2. Perform forward propagation to compute a(l) for l=2,3,…,L
3. Using y(t) , compute δ(L)=a(L)y(t)
4. Compute δ(L1),δ(L2),,δ(2) using
δ(l)=((Θ(l))Tδ(l+1)) . g(z(l))=((Θ(l))Tδ(l+1)) . a(l) . (1a(l))
5. Δ(l)i,j:=Δ(l)i,j+a(l)jδ(l+1)i or with vectorization, Δ(l):=Δ(l)+δ(l+1)(a(l))T
Hence we update our new Δ matrix.
- D(l)i,j:=1m(Δ(l)i,j+λΘ(l)i,j)
- D(l)i,j:=1mΔ(l)i,j
Θ(l)ijJ(Θ)=D(l)ij

cost function

cost(t)=y(t) log(hΘ(x(t)))+(1y(t)) log(1hΘ(x(t)))
δ(l)j=z(l)jcost(t)

Gradient Checking

ΘJ(Θ)J(Θ+ϵ)J(Θϵ)2ϵ
ϵ104
ΘjJ(Θ)J(Θ1,,Θj+ϵ,,Θn)J(Θ1,,Θjϵ,,Θn)2ϵ

Random Initialization

Hence, we initialize each Θ(l)ij to a random value between[−ϵ,ϵ].

If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11.
Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

Training a neural network

  • Number of input units = dimension of features x(i)
  • Number of output units = number of classes
  • Number of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)
  • Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer.

    1. Randomly initialize the weights
    2. Implement forward propagation to get hΘ(x(i)) for any x(i)
    3. Implement the cost function
    4. Implement backpropagation to compute partial derivatives
    5. Use gradient checking to confirm that your backpropagation works. Then disable gradient checking.
    6. Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值