《统计学习方法》学习笔记（1）perceptron

最新推荐文章于 2024-01-23 11:06:37 发布

spring-again

最新推荐文章于 2024-01-23 11:06:37 发布

阅读量1k

点赞数 1

分类专栏：统计学

本文链接：https://blog.csdn.net/NICHUNQUAN/article/details/44537941

版权

统计学专栏收录该内容

1 篇文章 0 订阅

订阅专栏

《统计学习方法》学习笔记(1) perceptron

感知机，作为后面支持向量机等的学习基础，很有必要好好学学研究透。只能用来在特征空间中线性分类，属于判别模型，形式为: $w\cdot x+b=0$ ，这里 $w\cdot x$ 表示两个向量的点乘，还可以表示为 $w\ast x^{T}$ 其中 $x^{T}$ 表示 $x$ 的转置。对于一个线性可分的数据集： $T=\begin{Bmatrix}(x_{1},y_{1}),(x_{2},y_{2}),...,(x_{N},y_{N})\end{Bmatrix}$ 这里 $x_{i}\in \chi =\mathbf{R}^{n}$ 。李航的书为了简便并未用粗体来区分向量和标量，而国外的教程中一般通过粗体来表示出向量，正常字体表示标量。

感知机的学习策略就是通过从一个初始的超平面开始，通过错分类点，逐步逐步调整参数w和b来使错分类点到达超平面的正确的一边。这里要注意一下一些方面：

1. 随机梯度下降和梯度下降的区别

调整策略使用的是随机梯度下降法（stochastic gradient descent）,它和梯度下降的区别如下：

Gradient Descent： You need to run over every training example before doing an update, which means that if you have a large dataset, you might spend much time on getting something that works.

Stochastic gradient descent：on the other hand, does updates every time it finds a training example, however, since it only uses one update, it may never converge, although you can still be pretty close to the minimum.

简言之就是梯度下降是把所有样本带入计算，而随机梯度下降每次只使用一个样本，当样本很大时迭代一次的速度远远大于梯度下降。

梯度下降：

$Repeat\: until\: convergence \:\{$

θ j : = θ j + α \sum i = 1 m (y (i) - h θ (x (i))) x (i) j] (f o r e v e r y j)

$\theta _{j}:=\theta _{j}+\alpha\sum _{i=1}^{m}(y^{(i)}-h_{\theta}(x^{(i)}))x_{j}^{(i)}]\; \;\;\;\;\;\;(for\;every\;j)$

} $\}$

随机梯度下降：
$loop\:\{$
$\:\;for \;i=1 \;to\; m,\{$

θ j : = θ j + α (y (i) - h θ (x (i))) x (i) j (f o r e v e r y j)

$\theta_{j}:=\theta_{j}+\alpha(y^{(i)}-h_{\theta}(x^{(i)}))x_{j}^{(i)}\; \;\;\;\;\;\;(for\;every\;j)$

} $\;\;\;\;\;\;\;\;\;\}$

} $\}$

2. 问题的原始形式和对偶形式

每一个线性规划问题都伴随着另一个线性规划问题，我们称之为对偶问题，原问题就称为原始问题。对偶的基本想法是将原始问题中的参数表示为样本实例和标记的线性组合形式，通过求解系数来求解原参数，这里有句话叫做实例点更新次数越多，意味着它离超平面越近，即越难分类。
贴上书上例题对偶形式的算法：正样本点（3，3），（4，3），负样本点（1，1）

#include<iostream>
using namespace std;

int main()
{
    int x[3][2]={
        {3,3},
        {4,3},
        {1,1}
    };
    int y[3]={1,1,-1};
    int alp[3]={0};
    int g[3][3],k,m,b,flag;
    k=b=0;
    flag=1;
    for(int i=0;i<3;i++)
    for(int j=0;j<3;j++)
        g[i][j]=x[i][0]*x[j][0]+x[i][1]*x[j][1];
    while(flag)
    {
        flag=0; 
         for(int i=0;i<3;i++)
         {
            while(true)
            {
               m=0;
               for(int j=0;j<3;j++)
                  m+=alp[j]*y[j]*g[j][i]; 
                  m+=b;
                  m*=y[i];
               if(m<=0)
               {
                  flag=1;
                  alp[i]++;
                  b=b+y[i]; 
               }
               else
               break;
            }
         }
    }
    for(int i=0;i<3;i++)
    cout<<"alpa"<<i+1<<": "<<alp[i]<<endl;
    cout<<"b: "<<b<<endl;

}