Perceptron(mlpack)

最新推荐文章于 2024-05-11 09:47:13 发布

胧月夜い

最新推荐文章于 2024-05-11 09:47:13 发布

阅读量212

点赞数

文章标签：机器学习

本文链接：https://blog.csdn.net/qq_46013251/article/details/118758198

版权

本文详细介绍了mlpack库中Perceptron的实现，包括权重初始化策略ZeroInitialization、训练过程中的SimpleWeightUpdate算法以及分类函数。通过《统计学习方法》中的例子进行测试，展示了Perceptron的工作原理和效果。

摘要由CSDN通过智能技术生成

感知机

源码

/**
 * This class implements a simple perceptron (i.e., a single layer neural
 * network).  It converges if the supplied training dataset is linearly
 * separable.
 *
 * @tparam LearnPolicy Options of SimpleWeightUpdate and GradientDescent.
 * @tparam WeightInitializationPolicy Option of ZeroInitialization and
 *      RandomInitialization.
 */
template<typename LearnPolicy = SimpleWeightUpdate,
         typename WeightInitializationPolicy = ZeroInitialization,
         typename MatType = arma::mat>
class Perceptron
{
 public:
  /**
   * Constructor: create the perceptron with the given number of classes and
   * initialize the weight matrix, but do not perform any training.  (Call the
   * Train() function to perform training.)
   *
   * @param numClasses Number of classes in the dataset.
   * @param dimensionality Dimensionality of the dataset.
   * @param maxIterations Maximum number of iterations for the perceptron
   *      learning algorithm.
   */
  Perceptron(const size_t numClasses = 0,
             const size_t dimensionality = 0,
             const size_t maxIterations = 1000);

首先是关于感知机整体的一些参数，例如学习策略是 SimpleWeightUpdate ，权重初始化的方式是 ZeroInitialization

然后是第一种初始化方法，提供三个数字，分别代表数据集的种类数，数据集的维度，以及最大迭代次数。

下面，我们来看一下它的具体实现：

/**
 * Construct the perceptron with the given number of classes and maximum number
 * of iterations.
 */
template<
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
>
Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Perceptron(
    const size_t numClasses,
    const size_t dimensionality,
    const size_t maxIterations) :
    maxIterations(maxIterations)
{
  WeightInitializationPolicy wip;
  wip.Initialize(weights, biases, dimensionality, numClasses);
}

可以看到，它构造了一个类型是 WeightInitializationPolicy 的实例，接着调用该实例的 Initialize 方法，并将权重向量，偏置向量，数据维度，数据种类传递给它

按照头文件里的说明，这里的 WeightInitializationPolicy 默认是 ZeroInitialization ，因此我们不妨去看一下 ZeroInitialization 的实现：

/**
 * This class is used to initialize the matrix weightVectors to zero.
 */
class ZeroInitialization
{
 public:
  ZeroInitialization() { }

  inline static void Initialize(arma::mat& weights,
                                arma::vec& biases,
                                const size_t numFeatures,
                                const size_t numClasses)
  {
    weights.zeros(numFeatures, numClasses);
    biases.zeros(numClasses);
  }
}; // class ZeroInitialization

按照 Armadillo 的官方文档，分别将 weights 初始化为 (数据维度 * 数据种类) 的向量，biases 初始化为列向量，其元素个数等于数据集的种类数。它们的初始元素都为零，至于它们为什么是这个形状的向量，与该实现的 SimpleWeightUpdate 有关。

继续看完其余两个构造方法：

/**
   * Constructor: constructs the perceptron by building the weights matrix,
   * which is later used in classification.  The number of classes should be
   * specified separately, and the labels vector should contain values in the
   * range [0, numClasses - 1].  The data::NormalizeLabels() function can be
   * used if the labels vector does not contain values in the required range.
   *
   * @param data Input, training data.
   * @param labels Labels of dataset.
   * @param numClasses Number of classes in the dataset.
   * @param maxIterations Maximum number of iterations for the perceptron
   *      learning algorithm.
   */
  Perceptron(const MatType& data,
             const arma::Row<size_t>& labels,
             const size_t numClasses,
             const size_t maxIterations = 1000);

实现：

/**
 * Constructor - constructs the perceptron. Or rather, builds the weights
 * matrix, which is later used in classification.  It adds a bias input vector
 * of 1 to the input data to take care of the bias weights.
 *
 * @param data Input, training data.
 * @param labels Labels of dataset.
 * @param maxIterations Maximum number of iterations for the perceptron learning
 *      algorithm.
 */
template<
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
>
Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Perceptron(
    const MatType& data,
    const arma::Row<size_t>& labels,
    const size_t numClasses,
    const size_t maxIterations) :
    maxIterations(maxIterations)
{
  // Start training.
  Train(data, labels, numClasses);
}

在该实现里，可以直接传入训练数据和标签，以及种类数，接着就用这些数据开始训练

还有一个复制构造函数：

/**
   * Alternate constructor which copies parameters from an already initiated
   * perceptron.
   *
   * @param other The other initiated Perceptron object from which we copy the
   *       values from.
   * @param data The data on which to train this Perceptron object on.
   * @param labels The labels of data.
   * @param numClasses Number of classes in the data.
   * @param instanceWeights Weight vector to use while training. For boosting
   *      purposes.
   */
  Perceptron(const Perceptron& other,
             const MatType& data,
             const arma::Row<size_t>& labels,
             const size_t numClasses,
             const arma::rowvec& instanceWeights);

实现：

/**
 * Alternate constructor which copies parameters from an already initiated
 * perceptron.
 *
 * @param other The other initiated Perceptron object from which we copy the
 *      values from.
 * @param data The data on which to train this Perceptron object on.
 * @param instanceWeights Weight vector to use while training. For boosting
 *      purposes.
 * @param labels The labels of data.
 */
template<
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
>
Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Perceptron(
    const Perceptron& other,
    const MatType& data,
    const arma::Row<size_t>& labels,
    const size_t numClasses,
    const arma::rowvec& instanceWeights) :
    maxIterations(other.maxIterations)
{
  Train(data, labels, numClasses, instanceWeights);
}

原理和上一个差不多，只不过在训练时多了一个 instanceWeights ,正如注释里提到，这个向量在训练时有用，下面我们去看一下训练的函数：

/**
 * Training function.  It trains on trainData using the cost matrix
 * instanceWeights.
 *
 * @param data Data to train on.
 * @param labels Labels of data.
 * @param instanceWeights Cost matrix. Stores the cost of mispredicting
 *      instances.  This is useful for boosting.
 */
template<
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
>
void Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Train(
    const MatType& data,
    const arma::Row<size_t>& labels,
    const size_t numClasses,
    const arma::rowvec& instanceWeights)
{
  // Do we need to resize the weights?
  if (weights.n_elem != numClasses)
  {
    WeightInitializationPolicy wip;
    wip.Initialize(weights, biases, data.n_rows, numClasses);
  }

  size_t j, i = 0;
  bool converged = false;
  size_t tempLabel;
  arma::uword maxIndexRow = 0, maxIndexCol = 0;
  arma::mat tempLabelMat;

  LearnPolicy LP;

  const bool hasWeights = (instanceWeights.n_elem > 0);

  while ((i < maxIterations) && (!converged))
  {
    // This outer loop is for each iteration, and we use the 'converged'
    // variable for noting whether or not convergence has been reached.
    ++i;
    converged = true;

    // Now this inner loop is for going through the dataset in each iteration.
    for (j = 0; j < data.n_cols; ++j)
    {
      // Multiply for each variable and check whether the current weight vector
      // correctly classifies this.
      tempLabelMat = weights.t() * data.col(j) + biases;

      tempLabelMat.max(maxIndexRow, maxIndexCol);

      // Check whether prediction is correct.
      if (maxIndexRow != labels(0, j))
      {
        // Due to incorrect prediction, convergence set to false.
        converged = false;
        tempLabel = labels(0, j);

        // Send maxIndexRow for knowing which weight to update, send j to know
        // the value of the vector to update it with.  Send tempLabel to know
        // the correct class.
        if (hasWeights)
          LP.UpdateWeights(data.col(j), weights, biases, maxIndexRow, tempLabel,
              instanceWeights(j));
        else
          LP.UpdateWeights(data.col(j), weights, biases, maxIndexRow,
              tempLabel);
      }
    }
  }
}

开始是一些准备工作，然后对数据的每一列（注意mlpack的Load函数会自动转置读入的数据矩阵），用权重和偏置检查它的类别，如果误分类了，就将该数据，权重，偏置，错误的索引值，正确的索引值传递给更新权重的函数。

这里也就是 SimpleWeightUpdate ：

/**
 * This class is used to update the weightVectors matrix according to the simple
 * update rule as discussed by Rosenblatt:
 *
 *  if a vector x has been incorrectly classified by a weight w,
 *  then w = w - x
 *  and  w'= w'+ x
 *
 *  where w' is the weight vector which correctly classifies x.
 */
namespace mlpack {
namespace perceptron {

class SimpleWeightUpdate
{
 public:
  /**
   * This function is called to update the weightVectors matrix.  It decreases
   * the weights of the incorrectly classified class while increasing the weight
   * of the correct class it should have been classified to.
   *
   * @tparam Type of vector (should be an Armadillo vector like arma::vec or
   *      arma::sp_vec or something similar).
   * @param trainingPoint Point that was misclassified.
   * @param weights Matrix of weights.
   * @param biases Vector of biases.
   * @param incorrectClass Index of class that the point was incorrectly
   *      classified as.
   * @param correctClass Index of the true class of the point.
   * @param instanceWeight Weight to be given to this particular point during
   *      training (this is useful for boosting).
   */
  template<typename VecType>
  void UpdateWeights(const VecType& trainingPoint,
                     arma::mat& weights,
                     arma::vec& biases,
                     const size_t incorrectClass,
                     const size_t correctClass,
                     const double instanceWeight = 1.0)
  {
    weights.col(incorrectClass) -= instanceWeight * trainingPoint;
    biases(incorrectClass) -= instanceWeight;

    weights.col(correctClass) += instanceWeight * trainingPoint;
    biases(correctClass) += instanceWeight;
  }
};

} // namespace perceptron
} // namespace mlpack

更新的方法正如一开始所说，对于待分类的向量 $x$ ，错误的权重向量 $w = w - x$ ，正确的权重向量 $w^{'} = w^{'} + x$
而之前提到的 instanceWeights 在更新函数里被用作学习率的向量，默认的学习率为 1.0
权重向量则按照这个 instanceWeight 进行更新

最后还有分类函数：

/**
 * Classification function. After training, use the weights matrix to classify
 * test, and put the predicted classes in predictedLabels.
 *
 * @param test Testing data or data to classify.
 * @param predictedLabels Vector to store the predicted classes after
 *      classifying test.
 */
template<
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
>
void Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Classify(
    const MatType& test,
    arma::Row<size_t>& predictedLabels)
{
  arma::vec tempLabelMat;
  arma::uword maxIndex = 0;

  // Could probably be faster if done in batch.
  for (size_t i = 0; i < test.n_cols; ++i)
  {
    tempLabelMat = weights.t() * test.col(i) + biases;
    tempLabelMat.max(maxIndex);
    predictedLabels(0, i) = maxIndex;
  }
}

和训练时差不多，用权重的转置乘以测试集的每一列（每一个数据点），再加上偏置，最后取最大元素的索引值作为分类的结果

测试

用《统计学习方法》里，例2.1示范：正实例点是 $x_1=(3,3)^{\mathsf{T}}$ , $x_2=(4,3)^{\mathsf{T}}$ , 负实例点是 $x_3=(1,1)^{\mathsf{T}}$

数据比较少，迭代个10次就差不多了：

#include <iostream>
#include <mlpack/core.hpp>
#include <mlpack/methods/perceptron/perceptron.hpp>

using namespace mlpack;
using namespace mlpack::perceptron;
using namespace arma;
using namespace std;

int main()
{
	mat dataset;
    mlpack::data::Load("../ml_test/data/my_data.csv", dataset);
    Row<size_t> labels;
    labels = conv_to<decltype (labels)>::from(dataset.row(dataset.n_rows-1));
    dataset.shed_row(dataset.n_rows-1);

    cout << "dataset:\n" << dataset << "labels:\n" << labels << endl;
    Perceptron p(2, 2, 10);
    p.Train(dataset, labels, 2);
    cout << "weights:\n" << p.Weights() << endl;
    cout << "bias:\n" <<  p.Biases() << endl;
}

输出：
dataset:
3.0000 4.0000 1.0000
3.0000 3.0000 1.0000
labels:
1 1 0

weights:
-1.0000 1.0000
-1.0000 1.0000

bias:
3.0000
-3.0000

按照先前的解释，对于待分类点 $x_1, \ x_2)$ ，其预测结果为：
$\begin{cases} 0 \ , \ x_1 + x_2 - 3 \leqslant 0 \\ 1 \ , \ otherwise \end{cases}$

例如接着上面的程序：

mat test;
test << 5 << endr << 2 << endr;
Row<size_t> pred_labels(1);
p.Classify(test, pred_labels);
cout << "predicted labels: " << pred_labels[0] << endl;