DS Wannabe Prep(2): how to code logistic regression in Python for ML Interviews day23

Introduction 

Logistic regression is NOT regression model. It is a binary classification model.

For a data point (X,Y) the model will predict

 P(y_{i}= 1|x_{i})

How does the model classify data?

\sigma (w^{T}x_{i}) = \frac{1}{1+e^{-w^{T}x_{i}}}

1. Compute the probability in class'1'

Logistic function (sigmoid function)

The sigmoid function model the probabilities in a valid interval ([0,1]).

2. Predict classes based on probability & threshold

Use traing data to estimate beta

Cost Function

def cost_function(features, labels, weights):
    '''
    Using Mean Absolute Error

    Features:(100,3)
    Labels: (100,1)
    Weights:(3,1)
    Returns 1D matrix of predictions
    Cost = (labels*log(predictions) + (1-labels)*log(1-predictions) ) / len(labels)
    '''
    observations = len(labels)

    predictions = predict(features, weights)

    #Take the error when label=1
    class1_cost = -labels*np.log(predictions)

    #Take the error when label=0
    class2_cost = (1-labels)*np.log(1-predictions)

    #Take the sum of both costs
    cost = class1_cost - class2_cost

    #Take the average cost
    cost = cost.sum() / observations

    return cost

Coding the logistic regression algorithm by hand

In this section, we see how to code the logistic trick and the logistic regression algorithm by hand. More generally, we’ll code the logistic regression algorithm for a dataset with n weights. The notation we use follows:

  • Features: x1, x2, … , xn
  • Label: y
  • Weights: w1, w2, … , wn
  • Bias: b

The score for a particular sentence is the sigmoid of the sum of the weight of each word (wi) times the number of times that appears (xi), plus the bias (b). Notice that we use the summation notation for

.

  • Prediction: ŷ = σ(w1x1 + w2x2 + … + wnxn + b) = σ(Σin=1wxi + b).

For our current problem, we’ll refer to xaack and xbeep as x1 and x2, respectively. Their corresponding weights are w1 and w1, and the bias is b.

We start by coding the sigmoid function, the score, and the prediction. Recall that the formula for the sigmoid function is

 

def sigmoid(x):
    return np.exp(x)/(1+np.exp(x))

For the score function, we use the dot product between the features and the weights. Recall that the dot product between vectors (x1, x2, … , xn) and (w1, w2, … , wn) is wx1 + wx2 + … + wxn.

def score(weights, bias, features):
    return np.dot(weights, features) + bias

Finally, recall that the prediction is the sigmoid activation function applied to the score.

def prediction(weights, bias, features):
    return sigmoid(score(weights, bias, features))

Now that we have the prediction, we can proceed to the log loss. Recall that the formula for the log loss is

log loss = –y ln(ŷ) – (1 – yln(1 – y).

Let’s code that formula as follows:

def log_loss(weights, bias, features, label):
    pred = prediction(weights, bias, features)
    return -label*np.log(pred) - (1-label)*np.log(1-pred)

We need the log loss over the whole dataset, so we can add over all the data points as shown here:

def total_log_loss(weights, bias, features, labels):
    total_error = 0
    for i in range(len(features)):
        total_error += log_loss(weights, bias, features[i], labels[i])
    return total_error

Now we are ready to code the logistic regression trick, and the logistic regression algorithm. In more than two variables, recall that the logistic regression step for the i-th weight is the following formula, where η is the learning rate:

  • wi → wi + η(y – ŷ)xi for i = 1, 2, … , n
  • b → b + η(y – ŷ) for i = 1, 2, … , n.
def logistic_trick(weights, bias, features, label, learning_rate = 0.01):
    pred = prediction(weights, bias, features)
    for i in range(len(weights)):
        weights[i] += (label-pred)*features[i]*learning_rate
        bias += (label-pred)*learning_rate
    return weights, bias

def logistic_regression_algorithm(features, labels, learning_rate = 0.01, epochs = 1000):
    utils.plot_points(features, labels)
    weights = [1.0 for i in range(len(features[0]))]
    bias = 0.0
    errors = []
    for i in range(epochs):
        errors.append(total_log_loss(weights, bias, features, labels))
        j = random.randint(0, len(features)-1)
        weights, bias = logistic_trick(weights, bias, features[j], labels[j])
    return weights, bias
logistic_regression_algorithm(features, labels)

In figure 6.10, we can see the plot of the classifiers corresponding to all the epochs (left) and the plot of the log loss (right). On the plot of the intermediate classifiers, the final one corresponds to the dark line. Notice from the log loss plot that, as we run the algorithm for more epochs, the log loss decreases drastically, which is exactly what we want. Furthermore, the log loss is never zero, even though all the points are correctly classified. This is because for any point, no matter how well classified, the log loss is never zero. Contrast this to figure 5.26 in chapter 5, where the perceptron loss indeed reaches a value of zero when every point is correctly classified.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值