[Meng小白和你一起逐行读代码] 基于python和sklearn的Logistic回归分类器

最新推荐文章于 2023-04-06 11:43:22 发布

Soochow_NJU_Smile

最新推荐文章于 2023-04-06 11:43:22 发布

阅读量1.5k

点赞数 7

文章标签： python machine learning 机器学习

本文链接：https://blog.csdn.net/mengfanze1/article/details/106163809

版权

基于python和sklearn的Logistic回归分类器

参考自：用Python从头开始实现一个神经网络
虽然我很菜，但要一直努力鸭!<&.&>

import numpy as np
import sklearn
from sklearn import*
import matplotlib.pyplot as plt
np.random.seed(5)
x,y=sklearn.datasets.make_moons(300,noise=0.30)
#print(x,y)
plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
#plt.show()
clf=sklearn.linear_model.LogisticRegressionCV()
clf.fit(x,y)
def plot_decision_boundary(pred_func):
    # Set min and max values and give it some padding
    #
    x_min, x_max = x[:, 0].min() - .5, x[:, 0].max() + .5
    y_min, y_max = x[:, 1].min() - .5, x[:, 1].max() + .5
    h = 0.01
    # Generate a grid of points with distance h between them
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    # Predict the function value for the whole gid
    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    # Plot the contour and training examples
    plt.contour(xx, yy, Z, cmap=plt.cm.Spectral)
    plt.scatter(x[:, 0], x[:, 1], c=y, cmap=plt.cm.Spectral)
# Plot the decision boundary
plot_decision_boundary(lambda x: clf.predict(x))
plt.title("Logistic Regression")
plt.show()

1. 生成数据集

首先我们需要一个可以操作的数据集，scikit-learn提供了一些有用的数据集生成器，只需使用make_moons这个函数就可以。

np.random.seed(5)
x,y=sklearn.datasets.make_moons(300,noise=0.30)
#print(x,y)
plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
#plt.show()

np.random.seed(5)

相当于设置了一个盛有随机数的“聚宝盆”，一个数字代表一个“聚宝盆

如果使用相同的seed( )值，则每次生成的随即数都相同；
如果不设置这个值，则系统根据时间来自己选择这个值，此时每次生成的随机数因时间差异而不同。
设置的seed()值仅一次有效
参考于here and here

x,y=sklearn.datasets.make_moons(300,noise=0.30)

def make_moons(n_samples=100, shuffle=True, noise=None, random_state=None):
300: 点的数目
noise: 加入高斯噪声，大小可表示其混乱程度

plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
plt.scatter() 为绘制散点图函数

X = np.array([[0,1],[2,3],[4,5],[6,7],[8,9],[10,11],[12,13],[14,15],[16,17],[18,19]])
print(X[:,0] )
Output:[ 0 2 4 6 8 10 12 14 16 18]
x[:,0] 即取出位置信息的x坐标值
x[:,1] 即取出位置信息的y坐标值
s=40可改变点的大小
c=y
cmap=plt.cm.Spectral
是给label为1的点一种颜色，给label为0的点另一种颜色

2. 训练Logistic回归器

训练一个Logistic回归分类器，这个分类器的输入是坐标x、y，它的输出是预测的数据类型（0或1）。为了方便，我们使用scikit-learn中的Logistic Regression类。

clf=sklearn.linear_model.LogisticRegressionCV()
clf.fit(x,y)
def plot_decision_boundary(pred_func):
    # Set min and max values and give it some padding
    #
    x_min, x_max = x[:, 0].min() - .5, x[:, 0].max() + .5
    y_min, y_max = x[:, 1].min() - .5, x[:, 1].max() + .5
    h = 0.01
    # Generate a grid of points with distance h between them
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    # Predict the function value for the whole gid
    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    # Plot the contour and training examples
    plt.contour(xx, yy, Z, cmap=plt.cm.Spectral)
    plt.scatter(x[:, 0], x[:, 1], c=y, cmap=plt.cm.Spectral)
# Plot the decision boundary
plot_decision_boundary(lambda x: clf.predict(x))
plt.title("Logistic Regression")
plt.show()

clf=sklearn.linear_model.LogisticRegressionCV()
定义一个LogisticRegressionCV类clf

clf.fit(x,y) 类中的一个方法：训练模型
def fit(self, X, y, sample_weight=None):
可以直接Go to去看其定义

x_min, x_max = x[:, 0].min() - .5, x[:, 0].max() + .5
y_min, y_max = x[:, 1].min() - .5, x[:, 1].max() + .5
h = 0.01
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
定义X、Y轴，然后生成网格点矩阵
np.meshgrid： Look here
np.arrange： Look here

Z = pred_func(np.c_[xx.ravel(), yy.ravel()])
注意函数的传递
xx.ravel()：扁平化操作 Look here
np.c_添加列 np.r_[]添加行 Look here

Z = Z.reshape(xx.shape)
按照xx的格式修改矩阵 shape/reshape: here

plt.contour(xx, yy, Z, cmap=plt.cm.Spectral)
plt.coutour与plt.cout.coutourf的区别：绘制轮廓，填充轮廓
coutour

plot_decision_boundary(lambda x: clf.predict(x))
lambda：here