【机器学习】使用Scikit-Learn库实现感知机

最新推荐文章于 2024-07-04 16:04:39 发布

ChenVast

最新推荐文章于 2024-07-04 16:04:39 发布

阅读量2.4k

点赞数 1

分类专栏： Machine Learning 机器学习算法理论与实战文章标签：感知机 Scikit-Learn python 机器学习

本文链接：https://blog.csdn.net/ChenVast/article/details/79196206

版权

机器学习算法理论与实战同时被 2 个专栏收录

156 篇文章 27 订阅

订阅专栏

Machine Learning

132 篇文章 28 订阅

订阅专栏

分类算法的选择：没有一种分类算法可以在所有可能的应用场景下都表现良好，只有比较了多种学习算法的性能，才能为特定问题挑选出最合适的模型。

分类器的性能、计算能力和预测能力，在很大的程度上都依赖于模型的训练的数据。

训练机器学习算法涉及的5个主要步骤：

1、特征选择

2、确定性能评价标准

3、选择分类器及其优化算法

4、对模型性能进行评估

5、算法的优化

Scikit-learn库不仅提供大量的学习算法，还包含许多用于对数据进行预处理、调优和对模型评估的功能。

使用感知机算法进行建模：

统一载入需要的库，以便后续实验使用，后续实验可复用此项。

from sklearn import datasets
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import export_graphviz
from sklearn.neighbors import KNeighborsClassifier

从scikit-learn加载数据集。在这里，第三列表示花瓣长度，第四列是花样品的花瓣宽度。这些类已经转换为整数标签，

其中0=Iris-Setosa,1=Iris-Versicolor, 2=Iris-Virginica。

载入数据集，后续实验可复用此项：

iris = datasets.load_iris ()
X = iris.data[:, [2, 3]]
y = iris.target
# print ('类标签:', np.unique (y))

# 将数据分为70%的培训和30%的测试数据:
X_train, X_test, y_train, y_test = train_test_split (
    X, y, test_size=0.3, random_state=0)

# 标准化的特征:
sc = StandardScaler ()
sc.fit (X_train)
X_train_std = sc.transform (X_train)
X_test_std = sc.transform (X_test)

开始训练和测试，感知机建模。

ppn = Perceptron (max_iter=40, eta0=0.1, random_state=0)

ppn.fit (X_train_std, y_train)
print(y_test.shape)

y_pred = ppn.predict (X_test_std)
print ('分类错误的样本: %d' % (y_test != y_pred).sum ())
print ('精确度: %.2f' % accuracy_score (y_test, y_pred))

构建显示划分的区域边界函数，后续实验可复用此项

# matplotlib 图像输出中文
plt.rcParams['font.sans-serif'] = ['SimHei']

def plot_decision_regions(X, y, classifier, test_idx=None, resolution=0.02):
    # 设置标记生成器和颜色映射。
    markers = ('s', 'x', 'o', '^', 'v')
    colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
    cmap = ListedColormap (colors[:len (np.unique (y))])
    
    # 绘制决策表面
    x1_min, x1_max = X[:, 0].min () - 1, X[:, 0].max () + 1
    x2_min, x2_max = X[:, 1].min () - 1, X[:, 1].max () + 1
    xx1, xx2 = np.meshgrid (np.arange (x1_min, x1_max, resolution),
                            np.arange (x2_min, x2_max, resolution))
    Z = classifier.predict (np.array ([xx1.ravel (), xx2.ravel ()]).T)
    Z = Z.reshape (xx1.shape)
    plt.contourf (xx1, xx2, Z, alpha=0.4, cmap=cmap)
    plt.xlim (xx1.min (), xx1.max ())
    plt.ylim (xx2.min (), xx2.max ())
    
    # 绘制所有样品
    X_test, y_test = X[test_idx, :], y[test_idx]
    for idx, cl in enumerate (np.unique (y)):
        plt.scatter (x=X[y == cl, 0], y=X[y == cl, 1],
                     alpha=0.8, c=cmap (idx),
                     marker=markers[idx], label=cl)
    
    # 强调测试样品
    if test_idx:
        X_test, y_test = X[test_idx, :], y[test_idx]
        plt.scatter (X_test[:, 0], X_test[:, 1], c='',
                     alpha=1.0, linewidth=1, marker='o',
                     s=55, label='test set')


# 使用标准化训练数据训练感知器模型:
X_combined_std = np.vstack ((X_train_std, X_test_std))
y_combined = np.hstack ((y_train, y_test))

绘制边界图像

plot_decision_regions (X=X_combined_std, y=y_combined,
                       classifier=ppn, test_idx=range (105, 150))
plt.xlabel ('花瓣长度 [标准化]')
plt.ylabel ('花瓣宽度 [标准化]')
plt.legend (loc='upper left')

plt.tight_layout ()
# plt.savefig('./figures/iris_perceptron_scikit.png', dpi=300)
plt.show ()

ChenVast

关注

1
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
【机器学习】使用Scikit-Learn库实现感知机

分类算法的选择：没有一种分类算法可以在所有可能的应用场景下都表现良好，只有比较了多种学习算法的性能，才能为特定问题挑选出最合适的模型。分类器的性能、计算能力和预测能力，在很大的程度上都依赖于模型的训练的数据。训练机器学习算法涉及的5个主要步骤：1、特征选择2、确定性能评价标准3、选择分类器及其优化算法4、对模型性
复制链接

扫一扫