Let's use an example to elicit logistic regression.
Logistic Regression
(1)introduction
Logistic regression is generally used to solve classification problems
Classification
Linear models are difficult to solve classification problems
Decision boundary is moving right with the increase of right example
The shape of the curve used for logistic regression is usually s-shaped
(2)Sigmoid function
The steps:
The results obtained can be interpreted as probabilities
When w·x+b>=0,probability greater than 0.5.
Might as well predict that y is 1 at this point.
So ,vice versa.(反之亦然)
For example:
The blue circle represents 0 and the red cross repreents 1
Now I suppose the w1 is 1,w2 is 1and the b is -3:
w1=1,w2=1,b=-3
The w·x+b=0 represents the decision boundary.
The carry-in parameter is x1+x2=3,it's the straight line in the diagram.
Now, 0 and 1 are nicely separated:
On the left side of the line, f is less than 0.5 and predicted to be 0
On the right side of the line, f is more than 0.5 and predicted to be 1
Non-linear decision boundaries
Now I can use the non-linear decision boundaries.
Just replace z in the original expression with w1*x1^2+w2*x^2+b.
I suppose the w1 is 1,w2 is 1and the b is -1:
w1=1,w2=1,b=-1
The carry-in parameter is x1^2+x2^2=1,this is an equation for a circle.
Now, 0 and 1 are nicely separated,too:
On the inside of the circle, f is less than 0.5 and predicted to be 0
On the outside of the circle, f is more than 0.5 and predicted to be 1
So,we can use more non-linear,more complex decision boundaries:
Cost function
First, let's take a look at the loss function
How to choose the w and b?
(3)cost function
If the y equal to 1,we predict the loss function is 0;
If y is equal to 0.5, i.e. the probability of predicting to be 1 is 50%, then the value of the loss function will be in the middle;
If y is equal to 0.1, i.e. the probability of predicting to be 1 is 10%, then the loss function value will be higher;
When y tends to 0, the value of the loss function tends to infinity;
In other words, the more the probability tends to be 1 and the smaller the loss function, the more accurate the prediction is.
vice versa.
Let's simplify the loss function first, and then calculate the cost function.
where y is the label and x is the probability.
This function has a name:
Binary_cross_entropy
二分类交叉熵损失函数
(4)Graient descent
It looks like the form is very similar to linear regression, but here the definition of f(x) is different from linear regression.
(5)regularization
Now, we use the regularization method for the logistic model.
(6)sigmoid function code
def sigmoid(z):
"""
Compute the sigmoid of z
Args:
z (ndarray): A scalar, numpy array of any size.
Returns:
g (ndarray): sigmoid(z), with the same shape as z
"""
g = 1/(1+np.exp(-z))
return g