Normal Equation :
According to Features Matrix X and the Result Matrix y
and the H(x) turns out to be
From pinv(X’*X)*X’*y to get the Correct θ Matrix.
Theory: To set the partial derivative(θj to J(θ)) equal to 0.
(In the same time ,it is not necessary to use Feature Scaling)
h = 0 + 5*X1 + 7*X2
X = [1 2 4; 1 8 6; 1 3 7]
y = [38;82;64]
pinv(X' * X) * X' * y :
ans =
-1.2221e-012
5.0000e+000
7.0000e+000
Disadvantage:
Need to compute pinv(X’X) O(n*n*n)
Slow if n is very large
Notice:
Even the X’*X can’t be inversed. Using pinv() instead of using inv() will compute the answers correctly.
Expand
It’s obviously less possible that a matrix can’t be inversed, but it can happen.
The Reasons are probably below:
1. Some features are same or have linear connection. For example, mile and feet are same, so we may know θx + 3.33θy = AExactNumber, but we can’t decide the θx and θy’s value, because there are too many possibilities.
2. The number of data (m) is less than the number of features. For example, we have 10 datas and 100 features. However ,we have a technique to solve this problem, which can be used to get h(x) which is made up of many variables from small datas.
Technique Explanation:
This technique will also solve the problem of overfitting by plusing a penality to the size of θ in J(θ).
(NOTICE! We only penalize θ from 1 to n, we don’t penalize θ0)
So the J may like this
And the normal equation will change to
newM = eye(n + 1);
newM(1,1) = 0;
pinv(X'*X + λ*newM)*X'*y
It can be proved that the Matrix which is to plus a new Matrix(λ >0) must has inverse Matrix.
So, we can get correct θ from small datas now!(λ>0)