Neural Networks
—a much better way to learn complex hypotheses even when n is large,compared to algorithm above.
Neuron model
Logistic unit
x0
calls bias unit
Sigmoid(logistic) activation function
θ
(parameter) calls weights
Neural Network
**input layer
hidden layer
output layer**
We apply each row of the parameters to our inputs to obtain the value for one activation node.
The +1 comes from the addition in Θ(j) of the “bias nodes”, x0 and Θ0(j). In other words the output nodes will not include the bias nodes while the inputs will.
forward propagation
We can then add a bias unit (equal to 1) to layer j after we have computed a(j).
more and more complex feaures
Multiclass Classification
To classify data into multiple classes, we let our hypothesis function return a vector of values.
We can define our set of resulting classes as y:
L= total no. of layers in network
Sl
=no.of units(not counting bias unit) in layer
l
K = number of output units/classes
Binary classification
y=0 or 1
1 output unit
Multi-class classification(K classes)
y
K(K>=3) output unit
Cost Function
- the double sum simply adds up the logistic regression costs calculated for each cell in the output layer
- the triple sum simply adds up the squares of all the individual Θs in the entire network.
- the i in the triple sum does not refer to training example i
Backpropagation algorithm
to compute the derivatives
Unroll
Put It Together