利用SVM对乳腺癌进行预测

关于SVM的原理可参考https://zhuanlan.zhihu.com/p/24638007

其中的KKT条件和强对偶性的互相推导可参考我之前的博客:https://blog.csdn.net/qq_35985044/article/details/85324714

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = pd.read_csv('breast_cancer_01/data.csv')
print(data.columns)
print(data.head(5))
print(data.describe())

部分结果如下:

Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',
       'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean',
       'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean',
       'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se',
       'compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se',
       'fractal_dimension_se', 'radius_worst', 'texture_worst',
       'perimeter_worst', 'area_worst', 'smoothness_worst',
       'compactness_worst', 'concavity_worst', 'concave points_worst',
       'symmetry_worst', 'fractal_dimension_worst'],
      dtype='object')
         id diagnosis  radius_mean  texture_mean  perimeter_mean  area_mean  \
0    842302         M        17.99         10.38          122.80     1001.0   
1    842517         M        20.57         17.77          132.90     1326.0   
2  84300903         M        19.69         21.25          130.00     1203.0   
3  84348301         M        11.42         20.38           77.58      386.1   
4  84358402         M        20.29         14.34          135.10     1297.0   
#columns取列名,index取行名
features_mean = list(data.columns[2:12])
features_se = list(data.columns[12:22])
features_worst = list(data.columns[22:32])
features_worst
['radius_worst',
 'texture_worst',
 'perimeter_worst',
 'area_worst',
 'smoothness_worst',
 'compactness_worst',
 'concavity_worst',
 'concave points_worst',
 'symmetry_worst',
 'fractal_dimension_worst']
#删除ID列
data.drop("id",axis=1,inplace=True)
#将B良性替换成0, M恶性替换成1
data['diagnosis'] = data['diagnosis'].map({'B':0, 'M':1})
corr = data[features_mean].corr()
plt.figure(figsize=(10,10))

sns.heatmap(corr, annot=True)
plt.show()

 

from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import metrics
from sklearn.preprocessing import StandardScaler

# 特征选择
features_remain = ['radius_mean','texture_mean', 'smoothness_mean','compactness_mean','symmetry_mean', 'fractal_dimension_mean'] 

# 抽取30%的数据作为测试集,其余作为训练集
train, test = train_test_split(data, test_size = 0.3)# in this our main data is splitted into train and test
# 抽取特征选择的数值作为训练和测试数据
train_X = train[features_remain]
train_y=train['diagnosis']
test_X= test[features_remain]
test_y =test['diagnosis']

# 采用Z-Score规范化数据,保证每个特征维度的数据均值为0,方差为1
ss = StandardScaler()
#提取训练集数据的均值和方差,并利用这两个参数对训练集进行标准化
train_X = ss.fit_transform(train_X)
#利用训练集的均值和方差对测试集进行标准化
test_X = ss.transform(test_X)

model = svm.SVC()
model.fit(train_X,train_y)
prediction = model.predict(test_X)
print('准确率: ', metrics.accuracy_score(prediction,test_y))
准确率:  0.9415204678362573

还有用PCA降维的,可参考https://blog.csdn.net/Vincent_Chu/article/details/90046985

  • 2
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值