CMU 11-785 L02 What can a network represent

最新推荐文章于 2021-08-10 16:52:37 发布

zealscott

最新推荐文章于 2021-08-10 16:52:37 发布

阅读量288

点赞数

分类专栏： CMU 11-785 文章标签：深度学习神经网络机器学习

本文链接：https://blog.csdn.net/crazy_scott/article/details/104649218

版权

22 篇文章 1 订阅

订阅专栏

Perceptron

Threshold unit
- “Fires” if the weighted sum of inputs exceeds a threshold
Soft perceptron
- Using sigmoid function instead of a threshold at the output
- Activation: The function that acts on the weighted combination of inputs (and threshold)
Affine combination
- Different from Linear combination: the result of mapping zero is not zero.

Multi-layer perceptron

Depth
- Is the length of the longest path from a source to a sink
- Deep: Depth greater than 2
Inputs/Outputs are real or Boolean stimuli
What can this network compute?

A perceptron can model any simple binary Boolean gate
- Using weight 1 or -1 to model function
- The universal AND gate: $(\bigwedge_{i=1}^{L} X_{i}) \wedge(\bigwedge_{i=L+1}^{N} \bar{X}_{i})$
- The universal OR gate: $(\bigvee_{i=1}^{L} X_{i}) \vee(\bigvee_{i=L+1}^{N} \bar{X}_{i})$
- Cannot compute an XOR
MLPs can compute the XOR

在这里插入图片描述

MLPs are universal Boolean functions
- Can compute any Boolean function
A Boolean function is just a truth table
- So expressed the result in disjunctive normal form, like
- $\begin{aligned} Y=& \bar{X}_{1} \bar{X}_{2} X_{3} X_{4} \bar{X}_{5}+\bar{X}_{1} X_{2} \bar{X}_{3} X_{4} X_{5}+\bar{X}_{1} X_{2} X_{3} \bar{X}_{4} \bar{X}_{5}+\\ & X_{1} \bar{X}_{2} \bar{X}_{3} \bar{X}_{4} X_{5}+X_{1} \bar{X}_{2} X_{3} X_{4} X_{5}+X_{1} X_{2} \bar{X}_{3} \bar{X}_{4} X_{5} \end{aligned}$
- In this case, need 5 neurons in the hidden layer.

A one-hidden-layer MLP is a Universal Boolean Function
- But the largest number of perceptrons is expontial: $2^N$
How about depth?
- Will require $3 (N - 1)$ perceptrons, linear in $N$ to express the same function
- Using associatable rules, can be arranged in $2\log_2 N$ layers
- eg. model $\oplus X \oplus Y \oplus Z$

在这里插入图片描述

The challenge of depth
- Using only $K$ hidden layers will require $O(2^{CN})$ neurons in the $K$ th layer, where $C = 2^{-(k-1)/2}$
- A network with fewer than the minimum required number of neurons cannot model the function

在这里插入图片描述

Using OR to create more decision boundaries
- Can compose arbitrarily complex decision boundaries
- Even using one-layer MLP

A naïve one-hidden-layer neural network will required infinite hidden neurons
Construct basic unit and add more layers to decrese #neurons
The number of neurons required in a shallow network is potentially exponential in the dimensionality of the input

A one-layer MLP can model an arbitrary function of a single input
MLPs can actually compose arbitrary functions in any number of dimensions
- Even without “activation”
Activation
- A universal map from the entire domain of input values to the entire range of the output activation

Deeper networks will require far fewer neurons for the same approximation error
Sufficiency of architecture
- Not all architectures can represent any function
Continuous activation functions result in graded output at the layer
- To capture information “missed” by the lower layer

Narrow layers can still pass information to subsequent layers if the activation function is sufficiently graded
- But will require greater depth, to permit later layers to capture patterns
Capacity of the network
- Information or Storage: how many patterns can it remember
- VC dimension: bounded by the square of the number of …weights… in the network
- Straight forward: largest number of disconnected convex regions it can represent
A network with insufficient capacity cannot exactly model a function that requires a greater minimal number of convex hulls than the capacity of the network