贪心科技机器学习训练营（七）

最新推荐文章于 2021-07-23 00:52:05 发布

小刘要努力。

最新推荐文章于 2021-07-23 00:52:05 发布

阅读量1.5k

点赞数

liurunsen

本文链接：https://blog.csdn.net/weixin_44510615/article/details/95739798

版权

先把来源写上

来源：贪心学院，https://www.zhihu.com/people/tan-xin-xue-yuan/activities

以前文章：

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# use seaborn plotting defaults
import seaborn as sns; sns.set()

模拟数据集

from sklearn.datasets.samples_generator import make_blobs
X, y = make_blobs(n_samples=50, centers=2,
                  random_state=0, cluster_std=0.60)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn');

在这里插入图片描述

xfit = np.linspace(-1, 3.5)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')

for m, b in [(1, 0.65), (0.5, 1.6), (-0.2, 2.9)]:
    plt.plot(xfit, m * xfit + b, '-k')

plt.xlim(-1, 3.5);

在这里插入图片描述

假想每一条分割线是有宽度的

xfit = np.linspace(-1, 3.5)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')

for m, b, d in [(1, 0.65, 0.33), (0.5, 1.6, 0.55), (-0.2, 2.9, 0.2)]:
    yfit = m * xfit + b
    plt.plot(xfit, yfit, '-k')
    plt.fill_between(xfit, yfit - d, yfit + d, edgecolor='none',
                     color='#AAAAAA', alpha=0.4)

plt.xlim(-1, 3.5);

在这里插入图片描述

在SVM的框架下, 认为最宽的线为最优的分割线

人脸识别

SVM实现人脸识别

所以pass到了

作业使用SVM检测蘑菇?是否有毒

使用的数据集: https://archive.ics.uci.edu/ml/datasets/mushroom

数据集中每一条数据包含如下特征, 特征包括对蘑菇形状, 质地, 色彩等特征的描述, 我们需要以此判断?是否有毒§或者可以吃(e). 为两个类别的分类问题.

22个
在这里插入图片描述

在这里插入图片描述

建立寻pipeline

# 将特征和类别标签分布赋值给 X 和 y
X_mush = mush_df_encoded.iloc[:,2:]
y_mush = mush_df_encoded.iloc[:,1]
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline


pca = PCA(n_components=14, whiten=True, random_state=42)
svc = SVC(kernel='linear', class_weight='balanced')
model = make_pipeline(pca, svc)

将数据分为训练和测试数据¶

from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X_mush, y_mush,
                                                random_state=41)

调参:通过交叉验证寻找最佳的 C (控制间隔的大小)

from sklearn.model_selection import GridSearchCV

param_grid = {'svc__C': [1, 5, 10, 50]}
grid = GridSearchCV(model, param_grid)

%time grid.fit(Xtrain, ytrain)
print(grid.best_params_)

Wall time: 17.3 s
{'svc__C': 50}

使用训练好的SVM做预测¶

model = grid.best_estimator_
yfit = model.predict(Xtest)

生成性能报告

在这里插入图片描述

比较下老师的

在这里插入图片描述

竟然又一个数高过它

哈哈哈

最后欢迎关注公众号毛利学python

在这里插入图片描述

小刘要努力。

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
2
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

贪心科技机器学习训练营（七）

模拟数据集

假想每一条分割线是有宽度的

人脸识别

作业 使用SVM检测蘑菇?是否有毒

建立寻pipeline

将数据分为训练和测试数据¶

调参:通过交叉验证寻找最佳的 C (控制间隔的大小)

使用训练好的SVM做预测¶

生成性能报告

比较下老师的

哈哈哈

最后 欢迎关注公众号毛利学python

作业使用SVM检测蘑菇?是否有毒

最后欢迎关注公众号毛利学python