相同的数据集用不同的模型进行训练最后可视化,观察分类的结果
读取数据
data = pd.read_csv(‘G:\Machine Learning\data\Social_Network_Ads.csv’)
以Age.EstimatedSalary两列作为x,以Purchased作为y
x = data.loc[:,'Age':'EstimatedSalary'].values
y = data.loc[:,'Purchased'].values
划分数据集
from sklearn.model_selection import train_test_split
x_train ,x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 1)
归一化
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
x_train_std = sc_x.fit_transform(x_train)
x_test_std = sc_x.transform(x_test)
- 逻辑回归
将离散数据变为连续数据拟合出分界线
from sklearn.linear_model import LogisticRegression as LR
lr = LR() #建立逻辑回归模型
lr.fit(x_train_std, y_train) #训练模型
支持向量机
用核函数对原数据进行升维,在高纬度中找到分界的面,再降为低纬度中的分界线
# 用核函数进行升维
from sklearn.svm import SVR
rbf_regressor = SVR(kernel = 'rbf')
rbf_regressor.fit(x_train_std,y_train)
KNN
1)计算测试数据与各个训练数据之间的距离;
2)按照距离的递增关系进行排序;
3)选取距离最小的K个点;
4)确定前K个点所在类别的出现频率;
5)返回前K个点中出现频率最高的类别作为测试数据的预测分类。
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(x_train_std, y_train)
简单贝叶斯
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train_std,y_train)