1. What is wrong with back-propagation
- It requires labeled training data. -Almost data is unlabeled
- Unless the weights are highly redundant, labels cannot possibly provide enough information
- The learning time does not scale well. -It’s very slow in networks with multiple hidden layers
- The neurons need to send two different types of signal (Forward pass: activation = a | Backward pass: δ=∂Cost∂z )
2. How to overcome the limitations of back-propagation
One promising approach is that we need to keep the efficiency of using a gradient method for adjusting the weights, but also use it for modeling the structure of the sensory input at the same time.
Adjust the weights to maximize the probability that a generative model would have produced the sensory input. Try to learn p(image) not p(label | image)
Weights → Energies → Probabilites
Each possible joint configuration of the visible and hidden units has a Hopfield “energy”. The energy is determined by the weights and biases.
The energy of a joint configuration of the visible and
hidden units determines the probability that the network
will choose that configuration.
−−RestrcitedBoltzmannMachine