The line of thinking is:
(1) modify the activation function of logistics regression, and then put out the optimization function.
(2) discuss the nature of this new optimiaztion function.
(3) kernel function.
1 optimization objective
where:
C equals to in Regularization.
cost0() and cost(1) are shown below.
2 nature of this new optimiaztion function
In this function, if C is infinite. This function will be:
We hope that:
For postive point y=1,cost(
θ
T
∗
x
i
\theta ^{T}*x_{i}
θT∗xi)=0(z>=1).
For negative point y=0,cos2(
θ
T
∗
x
i
\theta ^{T}*x_{i}
θT∗xi)=0(z<-1).
In this situation(C is infinite), machine will choose the black line as dividing line. Because we make the dividing line of
θ
T
∗
x
i
\theta ^{T}*x_{i}
θT∗xi is 1/-1 instead of 0.
This classifier with maximum interval will provide a more reliable dividing line to classify. We can imgine that if dividing line is green line or pink line, it may be false if there is a little noisy.
3 kernel
The most commonly used kernel function are Gaussian kernel function and linear kernel function.
Kernel function is used to deal with non-linear situation. For example:
Gaussian kernel function is:
We take x1=(3,5) , so its image is as follows:
We find that if a point is closed to x1(3,5), the function will be closed to 1. Considering this function, we difinite a new optimization function:
where:
f
i
f_i
fi is Guassian kernel used on different
x
i
x_i
xi.(A thought is to choose all the sample as
x
i
x_i
xi)
Minimize optimization function, we will find
θ
T
\theta ^{T}
θT.
When σ \sigma σ is high, it may lead to under fitting; when When σ \sigma σ is low, it may lead to over fitting.
If f i f_i fi = x T x x ^{T}x xTx, we named it linear kernel.
There are several other kernel functions, but they are used less commonly.