Perceptrons and single layer neural nets

最新推荐文章于 2024-07-06 19:05:11 发布

檀檀吸甲烷

最新推荐文章于 2024-07-06 19:05:11 发布

阅读量249

点赞数

分类专栏： Machine Learning 文章标签： Perceptron 机器学习

本文链接：https://blog.csdn.net/qq_41009458/article/details/120622859

版权

Machine Learning 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

来自University of Waterloo
https://www.youtube.com/watch?v=dXxuCARJ1CY&list=PLdAoL1zKcqTW-uzoSVBNEecKHsnug_M0k&index=16
情势所迫不得不好好补一补Machine Learning，因为喜欢这个教授，所以从头开始听他的课~这些笔记基本都是他课程内容的听写 + 犹他大学的Machine Learning ppt，之后抽空会整理一遍，使得内容更加体系化。

Perceptron

1 Perceptron
2 Feed-forward Network
- Perceptron: single layer feed-forward network
3 Supervised learning algorithms for neural networks

1 Perceptron

Perceptron is an online learning algorithm which is very widely used and easy to implement.

Idea: mimic the brain to do computation

Brain is made up of nucleus, synapse, dendrite（树突）, axon（轴突）… Brain
just looks like a computer. Neuron - gates, signals - electrical signals; parallel computation - sequential and parallel computation
However, brain is robust but computers are fragile. If a gate stops working, the computer will crash.

Artificial neural network

Nodes: neurons
Links: synapses

ANN Unit

For each unit $i$
- Weight $W$ is
  - Strength of the link from unit $i$ to unit $j$
  - Input signals $x_i$ weighted by $W_{ji}$ are linearly combined and produce a new signal $a_i$
    $a_i=\sum_i W_{ji} x_i + w_0 = W_j \bar{x}$
- Activation function $h$ to produce numerical signal $y_j$ in a non-linear way:
  $y_j = h(a_j)$
  - Should be non-linear or network will just be a linear function
  - Often chosen to mimic firing in neurons: unit should be ‘active’ (output near 1) when fed with the ‘right’ outputs; ‘Inactive’ (near 0) when ‘wrong’ inputs
  - Common activation functions
    1. Threshold activation function
    2. Sigmoid
    Common Activation Functions
  - Can we design a boolean function using threshold activation function? (And, Or, Not)

Network Sturctures

Feed-forward network
- Directed acyclic graph
- No internal state
- Simply computes outputs from inputs
Recurrent network
- Directed cyclic graph
  Popular in NLP. We have inputs of varied lengths. So we shall use the cyclic part to adapt to different lengths.
- Dynamical system with internal states
- Can memorize information

2 Feed-forward Network

Perceptron: single layer feed-forward network

Shades/Color: different values/magnitude
Lines: higher weights
Single-layer FNN

3 Supervised learning algorithms for neural networks

Given list of $(x, y)$ pairs
Train feed-forward ANN
- To compute proper outputs $y$ when fed with inputs $x$
- Consists of adjusting weights $W_{ji}$

Threshold Perceptron Learning

Learning is done separately for each unit $j$ since unit do not share weights
Perceptron learning for unit $j$ :
1. For each $(x, y)$ pair do:
  - Case 1: correct output produced: $\forall_i W_{ji} \leftarrow W_{ji}$
  - Case 2: output produced is 0 instead of 1: add $x_i$
  - Case 3: output produced is 0 instead of 1: subtract $x_i$

Sigmoid Perceptron Learning

Represent ‘soft’ linear separators
Same hypothesis space as logistic regression
Possible objectives
- Minimum squared error
  $E(w)=\frac{1}{2}\sum_nE_n(w)^2\\ =\frac{1}{2}\sum_n (y_n-\sigma(w^t\bar{x_n}))^2$
- Maximum likelihood (same algorithm as for logistics regression)
- Maximum a posterior hypothesis
- Beyesian learning
Gradient
$\frac{\partial E}{\partial w_i} =\sum_nE_n(w)\frac{\partial E_n}{\partial w_i}\\ =-\sum_n E_n(w)\sigma ' (w^T\bar{x_n})x_i\\ =-\sum_nE_n(w)\sigma (w^T\bar{x_n})(1-\sigma(w^T\bar{x_n})x_i$
For sigmoid funcion, $\sigma'=\sigma(1-\sigma)$

Sequential Gradient Descent in perceptron learning

Repeat
For each $x_n, y_n)$ in examples to
$E_n \leftarrow y_n - \sigma(w^t \bar{x_n}) \\ w \leftarrow w + \eta E_n \sigma(w^t \bar{x_n}) (1-\sigma(w^t \bar{x_n})) \bar{x_n}$
$\eta$ learning rate
Until some stopping criterion satisfied
Return learnt network

Notes:

Prediction = $sgn(w^T x)$
Update only on error. So this is a mistake-driven algorithm

Geometry Representation (from university of utah)

Positive update
Negative Update

Convergence theorem:

If there exist a set of weights that are consistent with the data (i.e. the data is linearly separable, the perceptron algorithm will converge.

Cycling theorem

If the training data is not separable, then the learning algorithm will eventually repeat the same set of weights and enter an infinite loop (never converge)

Mistake Bound Theorem

$R$ : Look for the farthest data point from the origin
$u$ and $\gamma$ : The data and $u$ have a margin $\gamma$ . The data is separable. $\gamma$ is the complexity parameter that defines the separability of data.-

Variants of Perceptron

Hyper parameter: training epoch $T$
Margin Perceptron: pick a positive $\eta$ and update $w$ when
$\frac{y_i w^T x_i}{||w||} < \eta$
Voted Perceptron: After updating the weight, update the class. Return $w_i, c_i)$ . The prediction is
$sgn(\sum^k_{i=1}c_i \dot sgn(w_i^T x))$
Average Perceptron: After updating the weight, update a new variable $\leftarrow a+w$ ( $a$ is initialized as $0$ ). Return $a$ . The prediction is
$sgn(a^Tx) = sgn(\sum^k_{i=1}c_i w_i^Tx)$

在这里插入图片描述

檀檀吸甲烷

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Perceptrons and single layer neural nets

介绍了ANN，Threshold Perceptron Learning，Sigmoid Perceptron Learning，Perceptron Algorithm和一些定理。
复制链接

扫一扫