机器学习-分类算法
一、目的
1.熟练掌握logistics线性分类算法
2.了解并掌握朴素贝叶斯分类算法
3.了解并掌握K近邻分类算法
4.熟练掌握决策树分类算法
5.了解并掌握随机森林、提升决策树等集成分类模型
6.熟练掌握分类评价方法
二、题目与解析
“使用logistics线性分类算法实现恶性肿瘤分类测”实例,并在Jupyter环境重现所有结果,要求对每一条Python语句均清楚了解其语法和用法,并重点理解分类算法的评价方法。包括代码
import pandas as pd
import numpy as np
#创建特征列表
column_names=['Sample code number ','Clump Thickness','Uniformity of Cell Size',
'Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size',
'Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']
data=pd.read_csv(r'C:\Users\Administrator\Desktop\breast-cancer-wisconsin.data',names=column_names)
data=data.replace(to_replace='?',value=np.nan)
data=data.dropna(how='any')
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(data[column_names[1:10]],data[column_names[10]],test_size=0.25,random_state=0)
y_train.value_counts()
y_test.value_counts()
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import SGDClassifier
ss=StandardScaler()
X_train=ss.fit_transform(x_train)
X_test=ss.transform(x_test)
#初始化逻辑斯特回归
lr=LogisticRegression()
sgdc= SGDClassifier()
lr.fit(x_train,y_train)
lr_y_predict=lr.predict(X_test)
sgdc.