DS Wannabe Prep(2): how to code logistic regression in Python for ML Interviews day23

最新推荐文章于 2024-10-04 20:30:00 发布

wendyponcho

最新推荐文章于 2024-10-04 20:30:00 发布

阅读量775

点赞数 12

文章标签： python 机器学习人工智能

本文链接：https://blog.csdn.net/wendyponcho/article/details/135739134

版权

Introduction

Logistic regression is NOT regression model. It is a binary classification model.

For a data point (X,Y) the model will predict

$P(y_{i}= 1|x_{i})$

How does the model classify data?

$\sigma (w^{T}x_{i}) = \frac{1}{1+e^{-w^{T}x_{i}}}$

1. Compute the probability in class'1'

Logistic function (sigmoid function)

The sigmoid function model the probabilities in a valid interval ([0,1]).

2. Predict classes based on probability & threshold

Use traing data to estimate beta

Cost Function

def cost_function(features, labels, weights):
    '''
    Using Mean Absolute Error

    Features:(100,3)
    Labels: (100,1)
    Weights:(3,1)
    Returns 1D matrix of predictions
    Cost = (labels*log(predictions) + (1-labels)*log(1-predictions) ) / len(labels)
    '''
    observations = len(labels)

    predictions = predict(features, weights)

    #Take the error when label=1
    class1_cost = -labels*np.log(predictions)

    #Take the error when label=0
    class2_cost = (1-labels)*np.log(1-predictions)

    #Take the sum of both costs
    cost = class1_cost - class2_cost

    #Take the average cost
    cost = cost.sum() / observations

    return cost

Coding the logistic regression algorithm by hand

In this section, we see how to code the logistic trick and the logistic regression algorithm by hand. More generally, we’ll code the logistic regression algorithm for a dataset with n weights. The notation we use follows:

Features: x1, x2, … , xn
Label: y
Weights: w1, w2, … , wn
Bias: b

The score for a particular sentence is the sigmoid of the sum of the weight of each word (wi) times the number of times that appears (xi), plus the bias (b). Notice that we use the summation notation for

Prediction: ŷ = σ(w1x1 + w2x2 + … + wnxn + b) = σ(Σin=1wi xi + b).

For our current problem, we’ll refer to xaack and xbeep as x1 and x2, respectively. Their corresponding weights are w1 and w1, and the bias is b.

We start by coding the sigmoid function, the score, and the prediction. Recall that the formula for the sigmoid function is

def sigmoid(x):
    return np.exp(x)/(1+np.exp(x))

For the score function, we use the dot product between the features and the weights. Recall that the dot product between vectors (x1, x2, … , xn) and (w1, w2, … , wn) is w1 x1 + w2 x2 + … + wn xn.

def score(weights, bias, features):
    return np.dot(weights, features) + bias

Finally, recall that the prediction is the sigmoid activation function applied to the score.

def prediction(weights, bias, features):
    return sigmoid(score(weights, bias, features))

Now that we have the prediction, we can proceed to the log loss. Recall that the formula for the log loss is

log loss = –y ln(ŷ) – (1 – y) ln(1 – y).

Let’s code that formula as follows:

def log_loss(weights, bias, features, label):
    pred = prediction(weights, bias, features)
    return -label*np.log(pred) - (1-label)*np.log(1-pred)

We need the log loss over the whole dataset, so we can add over all the data points as shown here:

def total_log_loss(weights, bias, features, labels):
    total_error = 0
    for i in range(len(features)):
        total_error += log_loss(weights, bias, features[i], labels[i])
    return total_error

Now we are ready to code the logistic regression trick, and the logistic regression algorithm. In more than two variables, recall that the logistic regression step for the i-th weight is the following formula, where η is the learning rate:

wi → wi + η(y – ŷ)xi for i = 1, 2, … , n
b → b + η(y – ŷ) for i = 1, 2, … , n.

def logistic_trick(weights, bias, features, label, learning_rate = 0.01):
    pred = prediction(weights, bias, features)
    for i in range(len(weights)):
        weights[i] += (label-pred)*features[i]*learning_rate
        bias += (label-pred)*learning_rate
    return weights, bias

def logistic_regression_algorithm(features, labels, learning_rate = 0.01, epochs = 1000):
    utils.plot_points(features, labels)
    weights = [1.0 for i in range(len(features[0]))]
    bias = 0.0
    errors = []
    for i in range(epochs):
        errors.append(total_log_loss(weights, bias, features, labels))
        j = random.randint(0, len(features)-1)
        weights, bias = logistic_trick(weights, bias, features[j], labels[j])
    return weights, bias

logistic_regression_algorithm(features, labels)

In figure 6.10, we can see the plot of the classifiers corresponding to all the epochs (left) and the plot of the log loss (right). On the plot of the intermediate classifiers, the final one corresponds to the dark line. Notice from the log loss plot that, as we run the algorithm for more epochs, the log loss decreases drastically, which is exactly what we want. Furthermore, the log loss is never zero, even though all the points are correctly classified. This is because for any point, no matter how well classified, the log loss is never zero. Contrast this to figure 5.26 in chapter 5, where the perceptron loss indeed reaches a value of zero when every point is correctly classified.