Perceptrons and single layer neural nets

来自University of Waterloo
https://www.youtube.com/watch?v=dXxuCARJ1CY&list=PLdAoL1zKcqTW-uzoSVBNEecKHsnug_M0k&index=16
情势所迫不得不好好补一补Machine Learning,因为喜欢这个教授,所以从头开始听他的课~这些笔记基本都是他课程内容的听写 + 犹他大学的Machine Learning ppt,之后抽空会整理一遍,使得内容更加体系化。

1 Perceptron

Perceptron is an online learning algorithm which is very widely used and easy to implement.

Idea: mimic the brain to do computation

  • Brain is made up of nucleus, synapse, dendrite(树突), axon(轴突)… Brain
    just looks like a computer. Neuron - gates, signals - electrical signals; parallel computation - sequential and parallel computation

  • However, brain is robust but computers are fragile. If a gate stops working, the computer will crash.

Artificial neural network

  • Nodes: neurons
  • Links: synapses

ANN Unit

  • For each unit i i i
    • Weight W W W is
      • Strength of the link from unit i i i to unit j j j
      • Input signals x i x_i xi weighted by W j i W_{ji} Wji are linearly combined and produce a new signal a i a_i ai
        a i = ∑ i W j i x i + w 0 = W j x ˉ a_i=\sum_i W_{ji} x_i + w_0 = W_j \bar{x} ai=iWjixi+w0=Wjxˉ
    • Activation function h h h to produce numerical signal y j y_j yj in a non-linear way:
      y j = h ( a j ) y_j = h(a_j) yj=h(aj)
      • Should be non-linear or network will just be a linear function

      • Often chosen to mimic firing in neurons: unit should be ‘active’ (output near 1) when fed with the ‘right’ outputs; ‘Inactive’ (near 0) when ‘wrong’ inputs

      • Common activation functions

        1. Threshold activation function
        2. Sigmoid

        Common Activation Functions

      • Can we design a boolean function using threshold activation function? (And, Or, Not)
        在这里插入图片描述

Network Sturctures

  • Feed-forward network

    • Directed acyclic graph
    • No internal state
    • Simply computes outputs from inputs

  • Recurrent network

    • Directed cyclic graph
      Popular in NLP. We have inputs of varied lengths. So we shall use the cyclic part to adapt to different lengths.
    • Dynamical system with internal states
    • Can memorize information

2 Feed-forward Network

Perceptron: single layer feed-forward network

Shades/Color: different values/magnitude
Lines: higher weights
Single-layer FNN


3 Supervised learning algorithms for neural networks

  • Given list of ( x , y ) (x, y) (x,y) pairs
  • Train feed-forward ANN
    • To compute proper outputs y y y when fed with inputs x x x
    • Consists of adjusting weights W j i W_{ji} Wji

Threshold Perceptron Learning

  • Learning is done separately for each unit j j j since unit do not share weights
  • Perceptron learning for unit j j j:
    1. For each ( x , y ) (x, y) (x,y) pair do:
      • Case 1: correct output produced: ∀ i W j i ← W j i \forall_i W_{ji} \leftarrow W_{ji} iWjiWji
      • Case 2: output produced is 0 instead of 1: add x i x_i xi
      • Case 3: output produced is 0 instead of 1: subtract x i x_i xi

Sigmoid Perceptron Learning

  • Represent ‘soft’ linear separators
  • Same hypothesis space as logistic regression
  • Possible objectives
    • Minimum squared error
      E ( w ) = 1 2 ∑ n E n ( w ) 2 = 1 2 ∑ n ( y n − σ ( w t x n ˉ ) ) 2 E(w)=\frac{1}{2}\sum_nE_n(w)^2\\ =\frac{1}{2}\sum_n (y_n-\sigma(w^t\bar{x_n}))^2 E(w)=21nEn(w)2=21n(ynσ(wtxnˉ))2
    • Maximum likelihood (same algorithm as for logistics regression)
    • Maximum a posterior hypothesis
    • Beyesian learning
  • Gradient
    ∂ E ∂ w i = ∑ n E n ( w ) ∂ E n ∂ w i = − ∑ n E n ( w ) σ ′ ( w T x n ˉ ) x i = − ∑ n E n ( w ) σ ( w T x n ˉ ) ( 1 − σ ( w T x n ˉ ) x i \frac{\partial E}{\partial w_i} =\sum_nE_n(w)\frac{\partial E_n}{\partial w_i}\\ =-\sum_n E_n(w)\sigma ' (w^T\bar{x_n})x_i\\ =-\sum_nE_n(w)\sigma (w^T\bar{x_n})(1-\sigma(w^T\bar{x_n})x_i wiE=nEn(w)wiEn=nEn(w)σ(wTxnˉ)xi=nEn(w)σ(wTxnˉ)(1σ(wTxnˉ)xi
    For sigmoid funcion, σ ′ = σ ( 1 − σ ) \sigma'=\sigma(1-\sigma) σ=σ(1σ)

Sequential Gradient Descent in perceptron learning

  1. Repeat
    For each ( x n , y n ) (x_n, y_n) (xn,yn) in examples to
    E n ← y n − σ ( w t x n ˉ ) w ← w + η E n σ ( w t x n ˉ ) ( 1 − σ ( w t x n ˉ ) ) x n ˉ E_n \leftarrow y_n - \sigma(w^t \bar{x_n}) \\ w \leftarrow w + \eta E_n \sigma(w^t \bar{x_n}) (1-\sigma(w^t \bar{x_n})) \bar{x_n} Enynσ(wtxnˉ)ww+ηEnσ(wtxnˉ)(1σ(wtxnˉ))xnˉ
    η \eta η learning rate
  2. Until some stopping criterion satisfied
  3. Return learnt network

Notes:

  • Prediction = s g n ( w T x ) sgn(w^T x) sgn(wTx)
  • Update only on error. So this is a mistake-driven algorithm

Geometry Representation (from university of utah)

Positive update
Negative Update

Convergence theorem:

If there exist a set of weights that are consistent with the data (i.e. the data is linearly separable, the perceptron algorithm will converge.


Cycling theorem

If the training data is not separable, then the learning algorithm will eventually repeat the same set of weights and enter an infinite loop (never converge)


Mistake Bound Theorem

Mistake Bound Theorem

  • R R R: Look for the farthest data point from the origin
  • u u u and γ \gamma γ: The data and u u u have a margin γ \gamma γ. The data is separable. γ \gamma γ is the complexity parameter that defines the separability of data.-

Variants of Perceptron

  • Hyper parameter: training epoch T T T
  • Margin Perceptron: pick a positive η \eta η and update w w w when
    y i w T x i ∣ ∣ w ∣ ∣ < η \frac{y_i w^T x_i}{||w||} < \eta wyiwTxi<η
  • Voted Perceptron: After updating the weight, update the class. Return ( w i , c i ) (w_i, c_i) (wi,ci). The prediction is
    s g n ( ∑ i = 1 k c i s ˙ g n ( w i T x ) ) sgn(\sum^k_{i=1}c_i \dot sgn(w_i^T x)) sgn(i=1kcis˙gn(wiTx))
  • Average Perceptron: After updating the weight, update a new variable a ← a + w a \leftarrow a+w aa+w ( a a a is initialized as 0 0 0). Return a a a. The prediction is
    s g n ( a T x ) = s g n ( ∑ i = 1 k c i w i T x ) sgn(a^Tx) = sgn(\sum^k_{i=1}c_i w_i^Tx) sgn(aTx)=sgn(i=1kciwiTx)

在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值