常见的机器学习算法（二）逻辑回归

最新推荐文章于 2023-02-12 23:33:21 发布

乒乒乓乓丫

最新推荐文章于 2023-02-12 23:33:21 发布

阅读量579

点赞数

本文链接：https://blog.csdn.net/qq_39938666/article/details/106314608

版权

算法专栏收录该内容

14 篇文章 3 订阅

订阅专栏

与线性回归不同，Logistic 回归没有封闭解。但由于损失函数是凸函数，因此我们可以使用梯度下降法来训练模型。

我们希望模型得到的目标值概率落在 0 到 1 之间。因此在训练期间，我们希望调整参数，使得模型较大的输出值对应正标签(真实标签为 1)，较小的输出值对应负标签(真实标签为 0 )。这在损失函数中表现为如下形式：

对权重向量和偏置量，计算其对损失函数的梯度。

更新权重和偏置值：

import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs

np.random.seed(123)
x, y_true = make_blobs(n_samples=1000, centers=2)#n_samples待生成的样本总数,centers类别数,n_features每个样本的特征数
# print(x.shape)#(1000,2)
# print(y_true.shape)#一维(1000,)

'数据集'
fig = plt.figure(figsize=(8,6))
#plt.scatter散点图；x，y是大小为(n,)的数组，即绘制散点图的数据点；c是颜色
plt.scatter(x[:,0], x[:, 1], c=y_true)#x[:,0]是数组所有行的第一列数据，x[:,1]是数组所有行的第二列数据
plt.title('Dataset')
plt.xlabel('First feature')
plt.ylabel('Second feature')
plt.show()

# Reshape targets to get column vector with shape (n_samples, 1)
y_true = y_true[:, np.newaxis]
# print(y_true.shape)#二维(1000,1)
x_train, x_test, y_train, y_test = train_test_split(x, y_true)
print('Shape of x_train: ', x_train.shape)
print('Shape of y_train: ', y_train.shape)
print('Shape of x_test: ', x_test.shape)
print('Shape of y_test: ', y_test.shape)

class logisticRegression:
    def __init__(self):
        pass

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def train(self, x, y_true, n_iters, l_r):
        n_samples, n_features = x.shape
        self.weight = np.zeros((n_features, 1))
        self.bias = 0
        costs = []

        for i in range(n_iters):
            y_predict = self.sigmoid(np.dot(x, self.weight) + self.bias)
            cost = (-1 / n_samples) * np.sum(y_true * np.log(y_predict) +
                                             (1 - y_true) * np.log(1 - y_predict))
            dw = (1 / n_samples) * np.dot(x.T, (y_predict - y_true))
            db = (1 / n_samples) * np.sum(y_predict - y_true)

            self.weight = self.weight - l_r * dw
            self.bias = self.bias - l_r * db

            costs.append(cost)
            if(i % 100 == 0):
                print('Cost after iteration {}:{}'.format(i, cost))
        return self.weight, self.bias, costs

    def predict(self, x):
        y_predict = self.sigmoid(np.dot(x, self.weight) + self.bias)
        y_predict_labels = [1 if elem > 0.5 else 0 for elem in y_predict]
        return np.array(y_predict_labels)[:, np.newaxis]

regressor = logisticRegression()
w_trained, b_trained, costs = regressor.train(x_train, y_train, n_iters=600, l_r=0.009)
fig = plt.figure(figsize=(8,6))
plt.plot(np.arange(600), costs)
plt.title('Development of cost over training')
plt.xlabel('Number of iterations')
plt.ylabel('Cost')
plt.show()

y_p_train = regressor.predict(x_train)
y_p_test = regressor.predict(x_test)
print('Train accuracy: ',
      (100 - np.mean(np.abs(y_p_train - y_train))), '%')
print('Test accuracy: ',
      (100 - np.mean(np.abs(y_p_test - y_test))), '%')

(二) 直接调用sklearn的API

from sklearn.linear_model import LogisticRegression         # 逻辑回归 #
module = LogisticRegression()
module.fit(x, y)
module.score(x, y)
module.predict(test)

完整代码：

import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression

np.random.seed(123)
x, y_true = make_blobs(n_samples=1000, centers=2)

'数据集'
fig = plt.figure(figsize=(8, 6))
plt.scatter(x[:, 0], x[:, 1], c=y_true)
plt.title('Dataset')
plt.xlabel('First feature')
plt.ylabel('Second feature')
plt.show()

y_true = y_true[:, np.newaxis]
# print(y_true.shape)#二维(1000,1)
x_train, x_test, y_train, y_test = train_test_split(x, y_true)
print('Shape of x_train: ', x_train.shape)
print('Shape of y_train: ', y_train.shape)
print('Shape of x_test: ', x_test.shape)
print('Shape of y_test: ', y_test.shape)

module = LogisticRegression()
module.fit(x_test, y_test)
y_p_train = module.predict(x_train)
y_p_test = module.predict(x_test)
print('Train accuracy: ',
      (100 - np.mean(np.abs(y_p_train - y_train))), '%')
print('Test accuracy: ',
      (100 - np.mean(np.abs(y_p_test - y_test))), '%')

乒乒乓乓丫

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
常见的机器学习算法（二）逻辑回归

与线性回归不同，Logistic 回归没有封闭解。但由于损失函数是凸函数，因此我们可以使用梯度下降法来训练模型。我们希望模型得到的目标值概率落在 0 到 1 之间。因此在训练期间，我们希望调整参数，使得模型较大的输出值对应正标签(真实标签为 1)，较小的输出值对应负标签(真实标签为 0 )。这在损失函数中表现为如下形式：对权重向量和偏置量，计算其对损失函数的梯度。更新权重和偏置值：import numpy as npfrom sklearn.model_selection .
复制链接

扫一扫