# 感知机模型

## 感知机模型的基本概念

f ( x ) = s i g n ( w ⋅ x + b ) f(x)=sign(w\cdot x +b)
w被称为权值，b称为偏置，sign(x)是符号函数。

## 感知机的学习策略

### 感知机的损失函数

− 1 ∣ ∣ w ∣ ∣ y i ( w ⋅ x i + b ) -\frac{1}{||w||}y_i (w\cdot x_i +b)

− 1 ∣ ∣ w ∣ ∣ Σ x i ∈ M y i ( w ⋅ x i + b ) -\frac{1}{||w||}\Sigma_{x_i\in M} y_i(w\cdot x_i +b)

### 感知机的学习算法

m i n w , b L ( w , b ) = − Σ x i ∈ M y i ( w ⋅ x i + b ) min_{w,b}L(w,b)=-\Sigma_{x_i\in M}y_i (w\cdot x_i +b)

​ 输入：训练数据集T，学习率η；

​ 输出：w和b

​ (1)选取初值w0和b0；

​ (2)选取数据点(xi,yi)

​ (3)如果被误分类，则更新w和b：
w ← w + η y i x i b ← b + η y i w \leftarrow w+\eta y_i x_i \\b \leftarrow b+\eta y_i
​ (4)跳转到第(2)步，知道没有被误分类的数据点

### 学习算法的收敛性证明

(1)存在满足条件的||w||为1的超平面将数据集完全分开，且存在γ>0，使得对i=1,2,…,N
y i ( w ^ o p t ⋅ x ^ i ) = y i ( w o p t ⋅ x i + b o p t ) y_i(\hat{w}_{opt}\cdot \hat{x}_i)=y_i (w_{opt}\cdot x_i +b_{opt})
(2)令
R = m a x 1 ≤ i ≤ N ∣ ∣ x ^ i ∣ ∣ R = max_{1\le i \le N}||\hat{x}_i||

k ≤ ( R γ ) 2 k\le (\frac{R}{\gamma})^2

(1)：由于数据集线性可分，则存在超平面将数据集完全分开，取此超平面并将其模正则化为1，对于有限的i=1,2,…,N
y i ( w ^ o p t ⋅ x i ) = y i ( w o p t ⋅ x i + b o p t ) > 0 y_i (\hat{w}_{opt}\cdot x_i )=y_i (w_{opt}\cdot x_i +b_{opt})>0

γ = m i n i { y i ( w o p t ⋅ x i + b o p t ) } \gamma =min_i \{y_i(w_{opt}\cdot x_i +b_{opt})\}

y i ( w ^ o p t ⋅ x i ) = y i ( w o p t ⋅ x i + b o p t ) ≥ γ y_i (\hat{w}_{opt}\cdot x_i )=y_i (w_{opt}\cdot x_i +b_{opt})\ge \gamma
(2)不妨设感知机算法从
w ^ 0 \hat{w}_0

w ^ k − 1 = ( w k − 1 T , b k − 1 ) T \hat{w}_{k-1}=(w_{k-1}^T,b_{k-1})^T

y i ( w ^ k − 1 ⋅ x i ) = y i ( w k − 1 ⋅ x i + b k − 1 ) ≤ 0 y_i (\hat{w}_{k-1}\cdot x_i )=y_i (w_{k-1}\cdot x_i +b_{k-1})\le 0

w k ← w k − 1 + η y i x i b k ← b k − 1 + η y i w_k \leftarrow w_{k-1}+\eta y_i x_i \\b_k \leftarrow b_{k-1}+\eta y_i

w ^ k ← w ^ k − 1 + η y i x ^ i \hat{w}_k \leftarrow \hat{w}_{k-1}+\eta y_i \hat{x}_i

w ^ k ⋅ w ^ o p t = w ^ k − 1 ⋅ w ^ o p t + η y i x ^ i ⋅ w ^ o p t ≥ w ^ k − 1 ⋅ w ^ o p t + η γ \hat{w}_k \cdot \hat{w}_{opt}=\hat{w}_{k-1} \cdot \hat{w}_{opt}+\eta y_i \hat{x}_i \cdot \hat{w}_{opt}\\\ge \hat{w}_{k-1} \cdot \hat{w}_{opt}+\eta \gamma

w ^ k ⋅ w ^ o p t ≥ w ^ k − 1 ⋅ w ^ o p t + η γ ≥ ⋯ ≥ k η γ \hat{w}_k \cdot \hat{w}_{opt}\ge\hat{w}_{k-1} \cdot \hat{w}_{opt}+\eta \gamma \ge \cdots \ge k\eta \gamma

∣ ∣ w ^ k ∣ ∣ 2 = ∣ ∣ w ^ k − 1 ∣ ∣ 2 − 2 η y i w ^ k − 1 + η 2 ∣ ∣ x ^ i ∣ ∣ 2 ≤ ∣ ∣ w ^ k − 1 ∣ ∣ 2 + η 2 ∣ ∣ x ^ i ∣ ∣ 2 ≤ ⋯ ≤ k η 2 R 2 ||\hat{w}_k||^2 =||\hat{w}_{k-1}||^2-2\eta y_i\hat{w}_{k-1}+\eta^2||\hat{x}_i||^2\\\le ||\hat{w}_{k-1}||^2+\eta^2||\hat{x}_i||^2\le\cdots \le k\eta^2R^2

k η γ ≤ w ^ k ⋅ w ^ o p t ≤ ∣ ∣ w ^ k ∣ ∣ ∣ ∣ w ^ o p t ∣ ∣ ≤ k η R k 2 γ 2 ≤ k R 2 k\eta\gamma \le \hat{w}_k \cdot \hat{w}_{opt} \le ||\hat{w}_k||||\hat{w}_{opt}||\le \sqrt{k}\eta R \\k^2\gamma^2 \le kR^2

k ≤ ( R γ ) 2 k\le (\frac{R}{\gamma})^2

### 学习算法的对偶形式

​ 输入：训练数据集T，学习率η；

​ 输出：α和b；感知机模型为
f ( x ) = s i g n ( Σ j = 1 N α j y j x j ⋅ x + b ) f(x)= sign(\Sigma_{j=1}^N \alpha_j y_j x_j \cdot x + b)

​ (1)选取初值α=0和b=0；

​ (2)选取数据点(xi,yi)

​ (3)如果被误分类，即
y i ( Σ j = 1 N α j y j x j ⋅ x + b ) y_i (\Sigma_{j=1}^N \alpha_j y_j x_j \cdot x + b)
​ 则更新w和b：
α ← α + η b ← b + η y i \alpha \leftarrow \alpha+\eta\\b \leftarrow b+\eta y_i
yi)

​ (3)如果被误分类，即
y i ( Σ j = 1 N α j y j x j ⋅ x + b ) y_i (\Sigma_{j=1}^N \alpha_j y_j x_j \cdot x + b)
​ 则更新w和b：
α ← α + η b ← b + η y i \alpha \leftarrow \alpha+\eta\\b \leftarrow b+\eta y_i
​ (4)跳转到第(2)步，知道没有被误分类的数据点

01-13 1406