Day 6 Logistic实践
github: 100 Days Of ML Code
导入库
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
导入数据
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:,[2,3]].values
y = dataset.iloc[:,4].values
划分数据
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
数据标准化
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)
使用Logistic模型
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
评估模型
混淆矩阵:
在机器学习中, 混淆矩阵是一个误差矩阵, 常用来可视化地评估监督学习算法的性能. 混淆矩阵大小为 (n_classes, n_classes) 的方阵, 其中 n_classes 表示类的数量. 这个矩阵的每一行表示真实类中的实例, 而每一列表示预测类中的实例
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
可视化
c = []
for i in range(len(X_train)):
if y_train[i]:
c.append('r')
else:
c.append('b')
fig, ax = plt.subplots()
ax.scatter(X_train[:,0], X_train[:,1], c=c, s=20)
plt.show()