一起趣学sklearn（一）

最新推荐文章于 2025-04-02 09:33:11 发布

爱跑步的george

最新推荐文章于 2025-04-02 09:33:11 发布

阅读量209

点赞数

分类专栏： sklearn 文章标签： sklearn

本文链接：https://blog.csdn.net/weixin_38246633/article/details/85332171

版权

sklearn 专栏收录该内容

1 篇文章

订阅专栏

本文精选sklearn官方教程，解析机器学习核心概念，包括监督学习与无监督学习的区别，通过手写体识别和鸢尾兰分类实例，深入浅出讲解模型持久化、一维与多维标签预测，同时演示matplotlib绘制数据可视化图表。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1、这是学习教程的封面（一共五个部分）
https://scikit-learn.org/stable/tutorial/index.html
哈哈，打开下面这个链接，一起来学吧
2、这是第一个模块
https://scikit-learn.org/stable/tutorial/basic/tutorial.html#machine-learning-the-problem-setting
这是sklearn官方教程文档第一页，
把机器学习问题给划分一下，两大块儿，根据有无标签划分为：监督学习，无监督学习。
监督学习问题可以分为两块儿，分类问题（手写体识别和鸢尾兰），回归问题（比如预测房价）
无监督问题：聚类问题，密度估计
个人认为咱们参加比赛什么的，用的比较多的是有监督的问题。
这篇文章主要讲了两个例子，手写体识别和鸢尾兰，这两个都是有监督问题里边的分类问题，文章讲了如何保存模型，说的牛逼点儿就是模型持久化，还讲了如何预测一维标签和多维标签
3、这是第二个模块
https://scikit-learn.org/stable/tutorial/statistical_inference/index.html

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
y = iris.target

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

plt.figure(2, figsize=(8, 6))
plt.clf()

# Plot the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1,
            edgecolor='k')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())

# To getter a better understanding of interaction of the dimensions
# plot the first three PCA dimensions
fig = plt.figure(1, figsize=(8, 6))
ax = Axes3D(fig, elev=-150, azim=110)
X_reduced = PCA(n_components=3).fit_transform(iris.data)
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=y,
           cmap=plt.cm.Set1, edgecolor='k', s=40)
ax.set_title("First three PCA directions")
ax.set_xlabel("1st eigenvector")
ax.w_xaxis.set_ticklabels([])
ax.set_ylabel("2nd eigenvector")
ax.w_yaxis.set_ticklabels([])
ax.set_zlabel("3rd eigenvector")
ax.w_zaxis.set_ticklabels([])

plt.show()

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets

n_neighbors = 15

# import some data to play with
iris = datasets.load_iris()

# we only take the first two features. We could avoid this ugly
# slicing by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target

h = .02  # step size in the mesh

# Create color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])

for weights in ['uniform', 'distance']:
    # we create an instance of Neighbours Classifier and fit the data.
    clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
    clf.fit(X, y)

    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.figure()
    plt.pcolormesh(xx, yy, Z, cmap=cmap_light)

    # Plot also the training points
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold,
                edgecolor='k', s=20)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("3-Class classification (k = %i, weights = '%s')"
              % (n_neighbors, weights))

plt.show()

我发现这些图全是用matplotlib画的，所以还是把这个硬骨头也给啃掉吧，之前一直说要好好学，也没有学，现在得补回来了。