线性可分感知机
数据建模,每个样本
x
=
[
x
1
,
x
2
,
,
.
.
.
x
n
]
\mathbb{x}=[x_1,x_2,_,...x_n]
x=[x1,x2,,...xn],二分类问题,对应标签
y
∈
{
−
1
,
1
}
y\in\{-1, 1\}
y∈{−1,1},构造线性分类器:
y
=
sign
(
w
⋅
x
T
+
b
)
y = \text{sign}\left(\mathbb{w \cdot x}^T+b\right)
y=sign(w⋅xT+b)
对任意一样本对 ( x i , y i ) (\mathbb{x}_i, y_i) (xi,yi),则分类正确时满足:
y
i
(
w
⋅
x
T
+
b
)
>
0
y_i\left(\mathbb{w\cdot x}^T + b\right) > 0
yi(w⋅xT+b)>0
对应分类错误时候时满足:
f ( w , b ) = y i ( w ⋅ x T + b ) ≤ 0 f(\mathbb{w}, b) = y_i\left(\mathbb{w\cdot x}^T + b\right) \leq 0 f(w,b)=yi(w⋅xT+b)≤0
更新参数 ( w , b ) (\mathbb{w}, b) (w,b)使其大于0,即需要朝梯度是上升方向更新参数:
∂
f
(
w
,
b
)
∂
w
=
x
i
⋅
y
i
∂
f
(
w
,
b
)
∂
b
=
y
i
\frac{\partial f(\mathbb{w}, b)}{\partial\mathbb{w}} = \mathbb{x}_i\cdot y_i\\ \frac{\partial f(\mathbb{w}, b)}{\partial b}=y_i
∂w∂f(w,b)=xi⋅yi∂b∂f(w,b)=yi
对应更新方式为:
w
←
w
+
η
⋅
x
i
⋅
y
i
b
←
η
⋅
y
i
\mathbf{w} \leftarrow \mathbf{w} + \eta\cdot\mathbb{x}_i\cdot y_i\\ b \leftarrow \eta\cdot y_i
w←w+η⋅xi⋅yib←η⋅yi
可以采用SGD进行优化即可
扩充权重向量
可以通过对 x \mathbb{x} x补1,即 x ^ = [ x , 1 ] \hat{\mathbb{x}} = [\mathbb{x}, 1] x^=[x,1] 从而把 b b b并入 w \mathbb{w} w,此时
f
(
x
^
)
=
sign
(
w
⋅
x
^
)
f(\hat{\mathbb{x}}) =\text{sign}\left(\mathbb{w\cdot\hat{x}}\right)
f(x^)=sign(w⋅x^)
对应更新方式为
w
←
w
+
η
⋅
x
i
^
⋅
y
i
\mathbf{w} \leftarrow \mathbf{w} + \eta\cdot\hat{\mathbf{x_i}}\cdot y_i
w←w+η⋅xi^⋅yi
对偶形式
与《统计学习方法》里面略有不同,这里讨论扩充权重向量时的对偶形式,首先分析更新方式:
w
←
w
+
η
⋅
x
i
^
⋅
y
i
\mathbf{w} \leftarrow \mathbf{w} + \eta\cdot\hat{\mathbf{x_i}}\cdot y_i
w←w+η⋅xi^⋅yi
初始化
w
=
0
⃗
\mathbf{w} = \vec{0}
w=0,可以发现,最终的结果
w
\mathbf{w}
w只与
(
x
i
^
,
y
i
)
(\hat{\mathbf{x}_i},y_i)
(xi^,yi) 被记为负压样本的次数
n
i
n_i
ni有关,那么可以通过如下形式表示
w
\mathbf{w}
w:
w
=
∑
i
=
0
k
η
⋅
n
i
⋅
x
i
^
⋅
y
i
=
∑
i
=
0
k
α
i
⋅
y
i
⋅
x
i
^
\mathbf{w} = \sum_{i=0}^k\eta\cdot n_i\cdot\hat{\mathbf{x}_i}\cdot y_i = \sum_{i=0}^k \alpha_i\cdot y_i \cdot \hat{\mathbf{x}_i}
w=i=0∑kη⋅ni⋅xi^⋅yi=i=0∑kαi⋅yi⋅xi^
其中
k
k
k 为样本数量,对应推理方程可以变为:
f
(
x
^
)
=
sign
(
∑
i
=
0
k
α
i
⋅
y
i
⋅
x
i
^
×
x
T
)
f(\hat{\mathbf{x}}) = \text{sign}\left(\sum_{i =0}^k\alpha_i\cdot y_i \cdot \hat{\mathbf{x}_i} \times{\mathbf{x}}^T\right)
f(x^)=sign(i=0∑kαi⋅yi⋅xi^×xT)
分类错误时:
∑
i
=
0
k
α
i
⋅
y
i
⋅
x
i
^
×
x
j
^
T
⋅
y
j
≤
0
\sum_{i =0}^k\alpha_i\cdot y_i \cdot \hat{\mathbf{x}_i}\times \hat{\mathbf{x_j}}^T\cdot y_j\leq0
i=0∑kαi⋅yi⋅xi^×xj^T⋅yj≤0
此时第
j
j
j个样本出了问题,类似原始问题,只需要让
n
j
+
1
n_j + 1
nj+1即可
KaTeX parse error: Expected 'EOF', got '&' at position 6: n_j &̲\leftarrow n_j …
为什么使用对偶形式:
对偶形式训练的时候使用了 x i ^ × x j ^ \hat{\mathbf{x}_i}\times\hat{\mathbf{x}_j} xi^×xj^ ,可以预先计算他们的值加速计算,Gram矩阵。
#[k, n + 1]
Extend_X = np.hstack([X, np.ones([X.shape[0], 1])])
# [k, k]
Gram = Extend_X.dot(Extend_X.T)
下面是扩充向量对偶形式的Python代码
import numpy as np
import random
class Perceptron(object):
def __init__(self,
max_iter=5000,
eta=1,
):
self.eta = eta
self.max_iter_ = max_iter
self.w = 0
def fit(self, X, y):
"""
X: [k, n]
y: [k, ]
compute w:[n + 1,]
"""
# [1, k]
self.alpha = np.zeros([1, X.shape[0]])
n_iter_ = 0
# [k, n + 1]
Extend_X = np.hstack([X, np.ones([X.shape[0], 1])])
# [k, k]
self.Gram = Extend_X.dot(Extend_X.T)
while n_iter_ < self.max_iter_:
index = random.randint(0, y.shape[0] - 1)
# \sum(\alpha x y_i x x_i x x_j)
pred = self.alpha.dot(np.multiply(y, self.Gram[index, :]))
# y_j x pred
if y[index] * pred <= 0:
self.alpha[0, index] += self.eta
n_iter_ += 1
# 恢复扩充权重向量
self.w = self.alpha.dot(np.multiply(y, Extend_X.T).T)
def predict(self, X):
X = np.hstack([X, np.ones(X.shape[0]).reshape((-1, 1))])
rst = np.array([1 if rst else -1 for rst in np.dot(X, self.w.T) > 0])
return rst