sklearn的机器学习之路：逻辑回归

1. 基础概念

sigmoid函数：处理二分类问题时，我们最后需要输出分类结果[0,1]，而回归得到的是一个$\left(-\mathrm{\infty },+\mathrm{\infty }\right)$$( -\infty , +\infty )$的数，因此我们需要使用sigmoid函数。函数定义：

$f\left(x\right)=\frac{1}{1+{e}^{-\left({c}_{0}{x}_{0}+{c}_{1}{x}_{1}+...+{c}_{n}{x}_{n}\right)}}$

2. sklearn实战

from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# 读取数据
X = []
Y = []
fr = open("datingTestSet.txt")
index = 0
line = line.strip()
line = line.split('\t')
X.append(line[:3])
Y.append(line[-1])

#归一化
scaler = MinMaxScaler()
X = scaler.fit_transform(X)

# 交叉分类
train_X,test_X, train_y, test_y = train_test_split(X,
Y,
test_size=0.2) # test_size:测试集比例20%

# KNN模型，选择3个邻居
model = LogisticRegression()
model.fit(train_X, train_y)
print(model)

expected = test_y
predicted = model.predict(test_X)
print(metrics.classification_report(expected, predicted))       # 输出分类信息
label = list(set(Y))    # 去重复，得到标签类别
print(metrics.confusion_matrix(expected, predicted, labels=label))  # 输出混淆矩阵信息