在逻辑回归里,响应变量描述了类似于掷一个硬币结果为正面的概率。如果响应变量等于或超过了指定的临界值,预测结果就是正面,否则预测结果就是反面。响应变量是一个像线性回归中的解释变量构成的函数表示,称为逻辑函数(logistic function)。一个值在{0,1}之间的逻辑函数如下所示:
x = np.arange(-6, 6, 0.1)
y = 1/(1+np.e**(-x))
用iris数据集进行训练:
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import numpy as np
iris = load_iris()
target = iris['target']
target_name = iris['target_names']
data = iris['data'][:,[0,3]]
feature = iris['feature_names']
clf = LogisticRegression()
clf.fit(data, target)
print clf.score(data, target)
def plot_decision_visual(data, labels,classifier, resolution=0.1):
x_min, x_max = data[:, 0].min() - 1, data[:, 0].max() +1
y_min, y_max = data[:, 1].min() - 1, data[:, 1].max() +1
xx, yy = np.meshgrid(np.arange(x_min, x_max, resolution),
np.arange(y_min, y_max, resolution))
z = classifier.predict(np.array([xx.ravel(), yy.ravel()]).T)
z = z.reshape(xx.shape)
plt.contourf(xx, yy, z, alpha=0.5)
for i, m, color in zip(range(3), '>ox', 'rgb'):
plt.scatter(data[labels == i, 0],
data[labels == i, 1],
marker=m,
c=color,
label=target_name[i])
plt.xlabel(feature[0])
plt.ylabel(feature[3])
plt.legend()
plt.show()
plot_decision_visual(data, target, clf)
准确率为:0.907