神经网络与深度学习实践：利用Adaline算法与Iris数据集实现鸢尾花分类器训练任务

群殴羊癫疯

已于 2024-09-10 20:50:33 修改

阅读量2k

点赞数 57

分类专栏：神经网络与深度学习文章标签：深度学习算法神经网络

于 2024-09-10 20:49:57 首次发布

本文链接：https://blog.csdn.net/2301_76191576/article/details/142106562

版权

神经网络与深度学习专栏收录该内容

4 篇文章

订阅专栏

上一篇博客讲到了神经网络基础知识，不懂的小伙伴可以仔细看看，希望对您有所帮助：

神经网络基础知识

这里我们今天做一点简单的神经网络实践任务：实现Adaline算法

Adaline算法推导

自适应线性神经网络（Adaline）是一种基于单神经元的有监督学习算法，通过输入数据输出预测结果与真实值作差，然后利用梯度下降算法自动的调整权值使下一次误差减小，从而不断迭代，逼近真实值。

主要步骤如下：

A：输入加权求和

输入信号向量：X= $[x_{0},x_{1},x_{2}......x_{n}]$

各输入权重向量：W= $[w_{0},w_{1},w_{2}......w_{n}]$

其中x0=1,w0= $\theta$ .表示阈值信号同样视为输入信号。

$Y=WX^{T}$

Y为加权求和输出结果

B：线性激活

即采用一次函数y=x,即输出值y=Y

C: 梯度下降调整权值

定义误差e=reallyY-Y

均方误差E=e^2/2

我们调整权重的目的即为减小E值,权重的调整公式如下

w(i+1)=w(i)+ $\Delta$ w，因此关键就是得到 $\Delta$ w的表达式使E总能往变小的方向迭代

梯度下降算法便是针对这一过程的一种方法：

将E视为关于w的函数，当w大于极小值点时，E关于w的导数为正，当w小于极小值点时，E关于w的导数为负，这种性质我们可以得到如下表达式：

$\Delta w=-\eta \frac{\partial E}{\partial w}$

$\eta$ 为学习率，一般比较小，如果过大会导致每次迭代过度导致超越极小值点从而永远无法收敛

$-\frac{\partial E}{\partial w}$ 使得每次迭代过程都往靠近极值点的方向变化

计算化简可得：

$\Delta w=\eta ex$

至此我们完成了一整次的迭代，我们可以限定迭代次数或误差值来确认终止时机。

实践解析

A：数据集解释

本次使用的是Iris的csv数据集中前100行的数据，数据格式如下

1~4列是关于鸢尾花特征的数值数据，第五列是鸢尾花的种类，此次分类器任务即为分类鸢尾花中的setosa与versicolor品种

B：代码编写

导入相关库

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

numpy库提供了许多矩阵运算函数，这在神经网络训练中是必不可少的
pandas库提供了处理表格数据的相关函数，非常适用于我们的csv文件
matplotlib提供了可视化工具，帮助我们更只管感受迭代过程

数据读取

"读入Iris数据集"
df =pd.read_csv('D:\\桌面\\Iris数据集\\iris.csv',header=None)
"初步打印文件，了解文件格式"
print(df.head())
"针对本次任务只选取setosa与versicolor的数据，进行数据切片，注意iloc的起始位置"
y = df.iloc[0:100, 4].values
"np.where 二值表达式将字母替换为数字"
y = np.where(y == "setosa", -1, 1)
"提取其他参数：Sepal.length、Petal.length"
X = df.iloc[1:101, [1, 3]].values.astype(np.float32)

read_csv是pandas库读取csv文件的函数，只需传递文件地址即可使用

iloc函数是pandas库做数据切片的函数，在这里我们切片出我们需要的数据并将其赋值给X与y:

where函数类似于三目运算符，这里将种类名称转为数值

X是一个行向量，表示单次迭代使用100个样本，每个元素也是一个行向量，共使用两个特征，

[[5.1 1.4]
 [4.9 1.4]
 [4.7 1.3]
 [4.6 1.5]
 [5.  1.4]
 [5.4 1.7]
 [4.6 1.4]
 [5.  1.5]
 [4.4 1.4]
 [4.9 1.5]
 [5.4 1.5]
 [4.8 1.6]
 [4.8 1.4]
 [4.3 1.1]
 [5.8 1.2]
 [5.7 1.5]
 [5.4 1.3]
 [5.1 1.4]
 [5.7 1.7]
 [5.1 1.5]
 [5.4 1.7]
 [5.1 1.5]
 [4.6 1. ]
 [5.1 1.7]
 [4.8 1.9]
 [5.  1.6]
 [5.  1.6]
 [5.2 1.5]
 [5.2 1.4]
 [4.7 1.6]
 [4.8 1.6]
 [5.4 1.5]
 [5.2 1.5]
 [5.5 1.4]
 [4.9 1.5]
 [5.  1.2]
 [5.5 1.3]
 [4.9 1.4]
 [4.4 1.3]
 [5.1 1.5]
 [5.  1.3]
 [4.5 1.3]
 [4.4 1.3]
 [5.  1.6]
 [5.1 1.9]
 [4.8 1.4]
 [5.1 1.6]
 [4.6 1.4]
 [5.3 1.5]
 [5.  1.4]
 [7.  4.7]
 [6.4 4.5]
 [6.9 4.9]
 [5.5 4. ]
 [6.5 4.6]
 [5.7 4.5]
 [6.3 4.7]
 [4.9 3.3]
 [6.6 4.6]
 [5.2 3.9]
 [5.  3.5]
 [5.9 4.2]
 [6.  4. ]
 [6.1 4.7]
 [5.6 3.6]
 [6.7 4.4]
 [5.6 4.5]
 [5.8 4.1]
 [6.2 4.5]
 [5.6 3.9]
 [5.9 4.8]
 [6.1 4. ]
 [6.3 4.9]
 [6.1 4.7]
 [6.4 4.3]
 [6.6 4.4]
 [6.8 4.8]
 [6.7 5. ]
 [6.  4.5]
 [5.7 3.5]
 [5.5 3.8]
 [5.5 3.7]
 [5.8 3.9]
 [6.  5.1]
 [5.4 4.5]
 [6.  4.5]
 [6.7 4.7]
 [6.3 4.4]
 [5.6 4.1]
 [5.5 4. ]
 [5.5 4.4]
 [6.1 4.6]
 [5.8 4. ]
 [5.  3.3]
 [5.6 4.2]
 [5.7 4.2]
 [5.7 4.2]
 [6.2 4.3]
 [5.1 3. ]
 [5.7 4.1]]

变量说明

"""
    ADAptive LInear NEuron classifier.
    自适应线性神经分类器

    Parameters
    参数
    ------------
    eta : float
      Learning rate (between 0.0 and 1.0)
      学习率
    n_iter : int
      Passes over the training dataset.
      迭代次数
    random_state : int
      Random number generator seed for random weight
      initialization.
      随机初始化种子


    Attributes
    属性
    -----------
    w_ : 1d-array
      Weights after fitting.
      权重
    cost_ : list
      Sum-of-squares cost function value in each epoch.
    偏差平方和
    """

说明了eta n_iter random_state w_ cost_ 等变量的含义

分类器定义

class AdalineGD(object):
    def __init__(self, eta=0.01, n_iter=100, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state

python初始化类的方法，老生常谈不做解释

分类器组件

    def net_input(self, X):
        """Calculate net input"""
        "输入加权求和"
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def activation(self, X):
        """Compute linear activation"""
        "线性激活"
        return X

w_为权重，示例格式如下：

[1.04858019 5.64824219 2.96379755]

这是一个三元行向量，第一个为阈值，后两个为特征值权重

np.dot将每一个样例的特征值分别加权求和得到的net_input的格式如下：

net_input: [34.00393133 32.87428397 31.44825409 31.47619006 33.43910765 36.58754454
 31.17981023 33.73548747 30.05016287 33.17066379 35.99478489 32.90221994
 32.30946029 28.59620007 37.36494318 37.68925593 35.40202524 34.00393133
 38.28201558 34.30031115 36.58754454 34.30031115 29.99429128 34.89307081
 33.79135906 34.0318673  34.0318673  34.86513483 34.56875501 32.33739357
 32.90221994 35.99478489 34.86513483 36.26322874 33.17066379 32.84634835
 35.96684892 32.87428397 29.75378305 34.30031115 33.14272782 30.31860673
 29.75378305 34.0318673  35.48583011 32.30946029 34.59669098 31.17981023
 35.42996121 33.43910765 54.51612346 50.53441974 54.54406014 43.96910245
 51.3956229  46.58064859 50.56235501 38.50549925 51.96044658 41.97824925
 39.66308258 46.82115882 46.79322355 49.43270495 43.34840683 51.93251131
 46.01582491 45.95995566 49.40476969 44.23754666 48.59943848 47.35804723
 51.15511537 49.43270495 49.9416608  51.36768763 53.68285699 53.71078956
 48.27512233 43.61685104 43.3763428  43.07996333 45.36719672 50.05340058
 44.88617755 48.27512233 52.82164973 49.67321659 44.83030561 43.96910245
 45.15462176 49.13632548 45.66357619 39.07032293 45.12668508 45.69150876
 45.69150876 48.81201074 38.74600749 45.39512929]

每一个样例都对应着一个输出

训练网络

    def fit(self, X, y):
        """ Fit training data.
        拟合训练数据

        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
          Training vectors, where n_samples is the number of samples and
          n_features is the number of features.
          样本数与特征数
        y : array-like, shape = [n_samples]
          Target values.
          目标值

        Returns
        -------
        self : object

        """
        # rgen为NumPy随机数生成器，随机种子由用户指定，保证在需要时可以重现以前的结果。
        rgen = np.random.RandomState(self.random_state)
        # 产生标准差为0.01的 正态分布
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=1 + X.shape[1])
        #初始化误差列表
        self.cost_ = []

        for i in range(self.n_iter):
            net_input = self.net_input(X)
            print("net_input:", net_input)
            # Please note that the "activation" method has no effect
            # in the code since it is simply an identity function. We
            # could write `output = self.net_input(X)` directly instead.
            # The purpose of the activation is more conceptual, i.e.,
            # in the case of logistic regression (as we will see later),
            # we could change it to
            # a sigmoid function to implement a logistic regression classifier.
            output = self.activation(net_input)
            errors = (y - output)
            print("errors:", errors)
            self.w_[1:] += self.eta * X.T.dot(errors)
            self.w_[0] += self.eta * errors.sum()
            cost = (errors ** 2).sum() / 2.0
            self.cost_.append(cost)
            print("self.w_[:]:", self.w_[:])
        return self

首先利用随机数种子初始化权重，再开始迭代

计算加权求和，再线性激活，然后计算error值

根据 $\Delta w$ 的公式迭代权重

其中X.T表示X的转置：

X.T:
[[5.1 4.9 4.7 4.6 5.  5.4 4.6 5.  4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
  5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.  5.  5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.
  5.5 4.9 4.4 5.1 5.  4.5 4.4 5.  5.1 4.8 5.1 4.6 5.3 5.  7.  6.4 6.9 5.5
  6.5 5.7 6.3 4.9 6.6 5.2 5.  5.9 6.  6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
  6.3 6.1 6.4 6.6 6.8 6.7 6.  5.7 5.5 5.5 5.8 6.  5.4 6.  6.7 6.3 5.6 5.5
  5.5 6.1 5.8 5.  5.6 5.7 5.7 6.2 5.1 5.7]
 [1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4
  1.7 1.5 1.7 1.5 1.  1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4 1.5 1.2
  1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7 4.5 4.9 4.
  4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.  4.7 3.6 4.4 4.5 4.1 4.5 3.9 4.8 4.
  4.9 4.7 4.3 4.4 4.8 5.  4.5 3.5 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.
  4.4 4.6 4.  3.3 4.2 4.2 4.2 4.3 3.  4.1]]

每一个样本都会产生对应于两个特征的 $\Delta w$ ，同特征累加迭代得到新 $w$

创建分类器实例，可视化训练过程

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))

ada1 = AdalineGD(n_iter=10, eta=0.01).fit(X, y)
ax[0].plot(range(1, len(ada1.cost_) + 1), np.log10(ada1.cost_), marker='o')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('log(Sum-squared-error)')
ax[0].set_title('Adaline - Learning rate 0.01')

ada2 = AdalineGD(n_iter=10, eta=0.0001).fit(X, y)
ax[1].plot(range(1, len(ada2.cost_) + 1), ada2.cost_, marker='o')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Sum-squared-error')
ax[1].set_title('Adaline - Learning rate 0.0001')

plt.savefig('02_11.png', dpi=300)

我们实例化了两个分类器，发现使用0.0001的学习率误差在减小，使用0.01的学习率误差在增大，刚好印证了“学习率一般比较小，如果过大会导致每次迭代过度导致超越极小值点从而永远无法收敛”

此次代码我们重点研究训练过程，对推理过程不做编写与分析

免费献上完整代码

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



"读入Iris数据集"
df =pd.read_csv('D:\\桌面\\Iris数据集\\iris.csv',header=None)
"初步打印文件，了解文件格式"
print(df.head())
"针对本次任务只选取setosa与versicolor的数据，进行数据切片，注意iloc的起始位置"
y = df.iloc[0:100, 4].values
"np.where 二值表达式将字母替换为数字"
y = np.where(y == "setosa", -1, 1)
"提取其他参数：Sepal.length、Petal.length"
X = df.iloc[1:101, [1, 3]].values.astype(np.float32)
print(X)
print(X.T)


class AdalineGD(object):
    """
    ADAptive LInear NEuron classifier.
    自适应线性神经分类器

    Parameters
    参数
    ------------
    eta : float
      Learning rate (between 0.0 and 1.0)
      学习率
    n_iter : int
      Passes over the training dataset.
      迭代次数
    random_state : int
      Random number generator seed for random weight
      initialization.
      随机初始化种子


    Attributes
    属性
    -----------
    w_ : 1d-array
      Weights after fitting.
      权重
    cost_ : list
      Sum-of-squares cost function value in each epoch.
    偏差平方和
    """
    def __init__(self, eta=0.01, n_iter=100, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state
     
    def net_input(self, X):
        """Calculate net input"""
        "输入加权求和"
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def activation(self, X):
        """Compute linear activation"""
        "线性激活"
        return X
    def fit(self, X, y):
        """ Fit training data.
        拟合训练数据

        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
          Training vectors, where n_samples is the number of samples and
          n_features is the number of features.
          样本数与特征数
        y : array-like, shape = [n_samples]
          Target values.
          目标值

        Returns
        -------
        self : object

        """
        # rgen为NumPy随机数生成器，随机种子由用户指定，保证在需要时可以重现以前的结果。
        rgen = np.random.RandomState(self.random_state)
        # 产生标准差为0.01的 正态分布
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=1 + X.shape[1])
        #初始化误差列表
        self.cost_ = []

        for i in range(self.n_iter):
            net_input = self.net_input(X)
            print("net_input:", net_input)
            # Please note that the "activation" method has no effect
            # in the code since it is simply an identity function. We
            # could write `output = self.net_input(X)` directly instead.
            # The purpose of the activation is more conceptual, i.e.,
            # in the case of logistic regression (as we will see later),
            # we could change it to
            # a sigmoid function to implement a logistic regression classifier.
            output = self.activation(net_input)
            errors = (y - output)
            print("errors:", errors)
            self.w_[1:] += self.eta * X.T.dot(errors)
            self.w_[0] += self.eta * errors.sum()
            cost = (errors ** 2).sum() / 2.0
            self.cost_.append(cost)
            print("self.w_[:]:", self.w_[:])
        return self



    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(self.net_input(X)) >= 0.0, 1, -1)


fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))

ada1 = AdalineGD(n_iter=20, eta=0.01).fit(X, y)
ax[0].plot(range(1, len(ada1.cost_) + 1), np.log10(ada1.cost_), marker='o')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('log(Sum-squared-error)')
ax[0].set_title('Adaline - Learning rate 0.01')

ada2 = AdalineGD(n_iter=20, eta=0.0001).fit(X, y)
ax[1].plot(range(1, len(ada2.cost_) + 1), ada2.cost_, marker='o')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Sum-squared-error')
ax[1].set_title('Adaline - Learning rate 0.0001')

plt.savefig('02_11.png', dpi=300)