机器学习-svm

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
sns.set_style('whitegrid')
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
col_names = list(cancer.feature_names)
col_names.append('target')

df = pd.DataFrame(np.c_[cancer.data, cancer.target], columns=col_names)
df.head()
mean radiusmean texturemean perimetermean areamean smoothnessmean compactnessmean concavitymean concave pointsmean symmetrymean fractal dimension...worst textureworst perimeterworst areaworst smoothnessworst compactnessworst concavityworst concave pointsworst symmetryworst fractal dimensiontarget
017.9910.38122.801001.00.118400.277600.30010.147100.24190.07871...17.33184.602019.00.16220.66560.71190.26540.46010.118900.0
120.5717.77132.901326.00.084740.078640.08690.070170.18120.05667...23.41158.801956.00.12380.18660.24160.18600.27500.089020.0
219.6921.25130.001203.00.109600.159900.19740.127900.20690.05999...25.53152.501709.00.14440.42450.45040.24300.36130.087580.0
311.4220.3877.58386.10.142500.283900.24140.105200.25970.09744...26.5098.87567.70.20980.86630.68690.25750.66380.173000.0
420.2914.34135.101297.00.100300.132800.19800.104300.18090.05883...16.67152.201575.00.13740.20500.40000.16250.23640.076780.0

5 rows × 31 columns

df.describe()
mean radiusmean texturemean perimetermean areamean smoothnessmean compactnessmean concavitymean concave pointsmean symmetrymean fractal dimension...worst textureworst perimeterworst areaworst smoothnessworst compactnessworst concavityworst concave pointsworst symmetryworst fractal dimensiontarget
count569.000000569.000000569.000000569.000000569.000000569.000000569.000000569.000000569.000000569.000000...569.000000569.000000569.000000569.000000569.000000569.000000569.000000569.000000569.000000569.000000
mean14.12729219.28964991.969033654.8891040.0963600.1043410.0887990.0489190.1811620.062798...25.677223107.261213880.5831280.1323690.2542650.2721880.1146060.2900760.0839460.627417
std3.5240494.30103624.298981351.9141290.0140640.0528130.0797200.0388030.0274140.007060...6.14625833.602542569.3569930.0228320.1573360.2086240.0657320.0618670.0180610.483918
min6.9810009.71000043.790000143.5000000.0526300.0193800.0000000.0000000.1060000.049960...12.02000050.410000185.2000000.0711700.0272900.0000000.0000000.1565000.0550400.000000
25%11.70000016.17000075.170000420.3000000.0863700.0649200.0295600.0203100.1619000.057700...21.08000084.110000515.3000000.1166000.1472000.1145000.0649300.2504000.0714600.000000
50%13.37000018.84000086.240000551.1000000.0958700.0926300.0615400.0335000.1792000.061540...25.41000097.660000686.5000000.1313000.2119000.2267000.0999300.2822000.0800401.000000
75%15.78000021.800000104.100000782.7000000.1053000.1304000.1307000.0740000.1957000.066120...29.720000125.4000001084.0000000.1460000.3391000.3829000.1614000.3179000.0920801.000000
max28.11000039.280000188.5000002501.0000000.1634000.3454000.4268000.2012000.3040000.097440...49.540000251.2000004254.0000000.2226001.0580001.2520000.2910000.6638000.2075001.000000

8 rows × 31 columns

特征选择

df.columns
Index(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error', 'fractal dimension error',
       'worst radius', 'worst texture', 'worst perimeter', 'worst area',
       'worst smoothness', 'worst compactness', 'worst concavity',
       'worst concave points', 'worst symmetry', 'worst fractal dimension',
       'target'],
      dtype='object')
sns.countplot(x = 'target', label = "Count",data=df)
<matplotlib.axes._subplots.AxesSubplot at 0x7f8baae71f60>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UAFlq8RK-1678272580557)(svm_files/svm_5_1.png)]

plt.figure(figsize=(10, 8))
sns.scatterplot(x = 'mean area', y = 'mean smoothness', hue = 'target', data = df)
<matplotlib.axes._subplots.AxesSubplot at 0x7f8b882cbe10>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pGFIfh4O-1678272580558)(svm_files/svm_6_1.png)]

# 皮尔森系数
plt.figure(figsize=(20,10)) 
sns.heatmap(df.corr(), annot=True) 
<matplotlib.axes._subplots.AxesSubplot at 0x7f8b882b2128>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hfY2Ize2-1678272580558)(svm_files/svm_7_1.png)]

2. 2. 模型训练

from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler

X = df.drop('target', axis=1)
y = df.target

print(f"'X' shape: {X.shape}")
print(f"'y' shape: {y.shape}")

pipeline = Pipeline([
    ('min_max_scaler', MinMaxScaler()),
    ('std_scaler', StandardScaler())
])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
'X' shape: (569, 30)
'y' shape: (569,)
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

def print_score(clf, X_train, y_train, X_test, y_test, train=True):
    if train: # 训练集
        pred = clf.predict(X_train)
        clf_report = pd.DataFrame(classification_report(y_train, pred, output_dict=True))
        print("Train Result:\n================================================")
        print(f"Accuracy Score: {accuracy_score(y_train, pred) * 100:.2f}%")
        print("_______________________________________________")
        print(f"CLASSIFICATION REPORT:\n{clf_report}")
        print("_______________________________________________")
        print(f"Confusion Matrix: \n {confusion_matrix(y_train, pred)}\n")
        
    elif train==False: # 测试集
        pred = clf.predict(X_test)
        clf_report = pd.DataFrame(classification_report(y_test, pred, output_dict=True))
        print("Test Result:\n================================================")        
        print(f"Accuracy Score: {accuracy_score(y_test, pred) * 100:.2f}%")
        print("_______________________________________________")
        print(f"CLASSIFICATION REPORT:\n{clf_report}")
        print("_______________________________________________")
        print(f"Confusion Matrix: \n {confusion_matrix(y_test, pred)}\n")

多项式核

C:C-SVC的惩罚参数C?默认值是1.0

C越大,相当于惩罚松弛变量,希望松弛变量接近0,即对误分类的惩罚增大,趋向于对训练集全分对的情况,这样对训练集测试时准确率很高,但泛化能力弱。C值小,对误分类的惩罚减小,允许容错,将他们当成噪声点,泛化能力较强。

kernel:核函数,默认是rbf,可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’

0 – 线性:u’v
 1 – 多项式:(gamma*u’v + coef0)^degree
2 – RBF函数:exp(-gamma|u-v|^2)
 3 –sigmoid:tanh(gamma
u’*v + coef0)

degree :多项式poly函数的维度,默认是3,选择其他核函数时会被忽略。

gamma : ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’,则会选择1/n_features

coef0 :核函数的常数项。对于‘poly’和 ‘sigmoid’有用。

from sklearn.svm import SVC

linear_model = SVC(kernel='linear')
linear_model.fit(X_train, y_train)

print_score(linear_model, X_train, y_train, X_test, y_test, train=True)
print_score(linear_model, X_train, y_train, X_test, y_test, train=False)
/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['str_']. An error will be raised in 1.2.
  FutureWarning,


Train Result:
================================================
Accuracy Score: 96.92%
_______________________________________________
CLASSIFICATION REPORT:
                  0.0         1.0  accuracy   macro avg  weighted avg
precision    0.975460    0.965753  0.969231    0.970607      0.969359
recall       0.940828    0.986014  0.969231    0.963421      0.969231
f1-score     0.957831    0.975779  0.969231    0.966805      0.969112
support    169.000000  286.000000  0.969231  455.000000    455.000000
_______________________________________________
Confusion Matrix: 
 [[159  10]
 [  4 282]]

Test Result:
================================================
Accuracy Score: 95.61%
_______________________________________________
CLASSIFICATION REPORT:
                 0.0        1.0  accuracy   macro avg  weighted avg
precision   0.975000   0.945946   0.95614    0.960473      0.956905
recall      0.906977   0.985915   0.95614    0.946446      0.956140
f1-score    0.939759   0.965517   0.95614    0.952638      0.955801
support    43.000000  71.000000   0.95614  114.000000    114.000000
_______________________________________________
Confusion Matrix: 
 [[39  4]
 [ 1 70]]



/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['str_']. An error will be raised in 1.2.
  FutureWarning,
/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['str_']. An error will be raised in 1.2.
  FutureWarning,
from sklearn.svm import SVC

poly_model = SVC(kernel='poly', degree=2, gamma='auto', coef0=1, C=5)
poly_model.fit(X_train, y_train)

print_score(poly_model, X_train, y_train, X_test, y_test, train=True)
print_score(poly_model, X_train, y_train, X_test, y_test, train=False)
/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['str_']. An error will be raised in 1.2.
  FutureWarning,


Train Result:
================================================
Accuracy Score: 97.14%
_______________________________________________
CLASSIFICATION REPORT:
                  0.0         1.0  accuracy   macro avg  weighted avg
precision    0.987500    0.962712  0.971429    0.975106      0.971919
recall       0.934911    0.993007  0.971429    0.963959      0.971429
f1-score     0.960486    0.977625  0.971429    0.969056      0.971259
support    169.000000  286.000000  0.971429  455.000000    455.000000
_______________________________________________
Confusion Matrix: 
 [[158  11]
 [  2 284]]

Test Result:
================================================
Accuracy Score: 94.74%
_______________________________________________
CLASSIFICATION REPORT:
                 0.0        1.0  accuracy   macro avg  weighted avg
precision   0.974359   0.933333  0.947368    0.953846      0.948808
recall      0.883721   0.985915  0.947368    0.934818      0.947368
f1-score    0.926829   0.958904  0.947368    0.942867      0.946806
support    43.000000  71.000000  0.947368  114.000000    114.000000
_______________________________________________
Confusion Matrix: 
 [[38  5]
 [ 1 70]]



/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['str_']. An error will be raised in 1.2.
  FutureWarning,
/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['str_']. An error will be raised in 1.2.
  FutureWarning,

高斯核函数

rbf_model = SVC(kernel='rbf', gamma=0.1, C= 1)
rbf_model.fit(X_train, y_train)

print_score(rbf_model, X_train, y_train, X_test, y_test, train=True)
print_score(rbf_model, X_train, y_train, X_test, y_test, train=False)
Train Result:
================================================
Accuracy Score: 100.00%
_______________________________________________
CLASSIFICATION REPORT:
             0.0    1.0  accuracy  macro avg  weighted avg
precision    1.0    1.0       1.0        1.0           1.0
recall       1.0    1.0       1.0        1.0           1.0
f1-score     1.0    1.0       1.0        1.0           1.0
support    169.0  286.0       1.0      455.0         455.0
_______________________________________________
Confusion Matrix: 
 [[169   0]
 [  0 286]]

Test Result:
================================================
Accuracy Score: 62.28%
_______________________________________________
CLASSIFICATION REPORT:
            0.0        1.0  accuracy   macro avg  weighted avg
precision   0.0   0.622807  0.622807    0.311404      0.387889
recall      0.0   1.000000  0.622807    0.500000      0.622807
f1-score    0.0   0.767568  0.622807    0.383784      0.478046
support    43.0  71.000000  0.622807  114.000000    114.000000
_______________________________________________
Confusion Matrix: 
 [[ 0 43]
 [ 0 71]]



/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['str_']. An error will be raised in 1.2.
  FutureWarning,
/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['str_']. An error will be raised in 1.2.
  FutureWarning,
/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['str_']. An error will be raised in 1.2.
  FutureWarning,
/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/gaoguli/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))



  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
### 回答1: WOA-SVM(Walking Optimization Algorithm-Support Vector Machine)是一种机器学习算法,它可以用来构建分类和回归模型,以支持向量机SVM)的形式。它是基于一种叫做行走优化算法(WOA)的迭代算法,通过改进SVM的参数来改进模型的性能。 ### 回答2: WOA-SVM其实是一种基于鲸鱼群算法(WOA)和支持向量机(SVM)的机器学习算法。这个算法结合了两种优秀的算法,可以在处理分类问题时具有较高的性能。 WOA是一种仿生智能算法,灵感来源于鲸鱼的捕食行为。这种算法通过模拟鲸鱼在捕食中的行为,来搜索解空间中的最优解。WOA根据每个个体的适应度值和搜索概率来更新每个解。在每次迭代中,较好的解会被选择,且搜索空间也会收缩,最终找到全局最优解。 SVM是一种非常强大的分类器,主要用于二分类问题。它的目标是找到一个最优超平面,可以将不同类别的数据点最大程度地分开。SVM通过在特征空间中构造一个决策边界,将不同类别的数据点分隔开。 WOA-SVM算法结合了WOA和SVM的优点,能够更好地处理分类问题。该算法的代码实现大致可以分为以下几个步骤: 1. 数据准备:将需要分类的数据集划分为训练集和测试集。 2. 初始化WOA参数:设定WOA算法的迭代次数、种群大小等参数。 3. WOA算法:根据初始化的参数,采用WOA算法进行迭代搜索,逐步优化SVM模型。 4. 构建SVM模型:根据当前的WOA参数,构建SVM模型,根据训练集进行模型训练。 5. 模型评估:使用测试集评估模型的性能,例如计算分类准确率、精确率、召回率等指标。 6. 结果分析:根据评估结果,分析模型的性能,进一步调整WOA参数,以得到更好的分类效果。 总体来说,WOA-SVM算法通过融合WOA和SVM,既能够利用WOA算法的搜索能力,又能够充分发挥SVM的分类性能。这种算法在处理分类问题时具有很好的潜力,并且可以根据实际问题进行调整,以达到更好的结果。 ### 回答3: WOA-SVM(Whale Optimization Algorithm-Support Vector Machine)是一种结合鲸鱼优化算法(Whale Optimization Algorithm)和支持向量机Support Vector Machine)的机器学习算法。该算法在解决分类和回归问题方面具有很高的效果。 首先,WOA-SVM算法通过鲸鱼优化算法来寻找最优解。鲸鱼优化算法中的鲸鱼代表解空间中的候选解,而位置和尺寸表示解的质量。算法通过模拟鲸鱼的迁徙、寻找食物和社交行为等行为规律,来搜索解空间中的最优解。 在WOA-SVM算法中,通过将支持向量机引入到鲸鱼优化算法中,将鲸鱼与支持向量机的分类功能相结合。支持向量机是一种监督学习算法,通过构建一个超平面来进行分类。该算法通过将数据映射到高维空间,并在其中寻找最优的超平面,来实现数据的分类。 最后,WOA-SVM算法通过结合鲸鱼优化算法和支持向量机,对数据进行分类和回归分析。该算法的优势在于能够充分利用鲸鱼优化算法的搜索能力和支持向量机的分类准确性,在解决复杂问题时具有很高的效果。 总之,WOA-SVM机器学习算法是一种结合鲸鱼优化算法和支持向量机的算法,通过利用鲸鱼优化算法的搜索能力和支持向量机的分类准确性,对数据进行分类和回归分析。这种算法在解决分类和回归问题方面具有很高的效果。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

the uzi

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值