算法流程
输入:线性可分的数据集
T
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
⋅
⋅
⋅
,
(
x
N
,
y
N
)
}
T= \left\{ (x_1,y_1), (x_2,y_2),···,(x_N,y_N)\right\}
T={(x1,y1),(x2,y2),⋅⋅⋅,(xN,yN)},其中
x
i
∈
χ
=
R
n
x_i \in\chi=\mathbf{R}^n
xi∈χ=Rn,
y
i
∈
Y
=
{
−
1
,
+
1
}
,
i
=
1
,
2
,
⋅
⋅
⋅
,
N
y_i\in Y=\left\{-1,+1\right\},i=1,2,···,N
yi∈Y={−1,+1},i=1,2,⋅⋅⋅,N;学习率
η
(
0
<
η
≤
1
)
\eta(0<\eta \le1)
η(0<η≤1);
输出:
a
,
b
a,b
a,b;感知机模型
f
(
x
)
=
s
i
g
n
(
∑
j
=
1
N
α
j
y
j
x
j
⋅
x
i
+
b
)
f(x)=sign(\sum_{j=1}^N\alpha_jy_jx_j·x_i+b)
f(x)=sign(j=1∑Nαjyjxj⋅xi+b)
其中
a
=
(
a
1
,
a
2
,
⋅
⋅
⋅
a
N
)
T
a=(a_1,a_2,···a_N)^T
a=(a1,a2,⋅⋅⋅aN)T。
-
解的过程:
(1) a ← 0 , b ← 0 a\gets0,b\gets0 a←0,b←0;
(2)在训练集中选取数据 ( x i , y j ) (x_i,y_j) (xi,yj)
(3)如果 y i ( w ⋅ x i + b ) = y i ( ∑ j = 1 N α j y j x j ⋅ x i + b ) ≤ 0 y_i(w·x_i+b)=y_i(\sum_{j=1}^N\alpha_jy_jx_j·x_i+b)\le0 yi(w⋅xi+b)=yi(j=1∑Nαjyjxj⋅xi+b)≤0
则 a i ← a i + η , b ← b + η y i a_i \gets a_i+\eta , b \gets b+ \eta y_i ai←ai+η,b←b+ηyi
(4)转至(2)直至没有误分类数据。 -
注
对偶形式中训练实例仅以内积的形式出现。为了方便,可以预先将训练集中实例间的内积计算出来并以矩阵的形式存储,这个矩阵就是所谓的Gram矩阵
G = [ x i ⋅ x j ] N ∗ N = [ < x 1 ⋅ x 1 > < x 1 ⋅ x 2 > ⋅ ⋅ ⋅ < x 1 ⋅ x n > < x 2 ⋅ x 1 > < x 2 ⋅ x 2 > ⋅ ⋅ ⋅ < x 2 ⋅ x n > ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ < x n ⋅ x 1 > < x n ⋅ x 2 > ⋅ ⋅ ⋅ < x n ⋅ x n > ] G=[x_i·x_j]_{N*N} = \begin{bmatrix} <x_1·x_1> \quad <x_1·x_2> \quad ··· \quad <x_1·x_n> \\ <x_2·x_1> \quad <x_2·x_2> \quad ··· \quad <x_2·x_n> \\ · \quad \quad\quad\quad\quad\quad· \quad\quad\quad ··· \quad\quad\quad\quad · \\ · \quad \quad\quad\quad\quad\quad· \quad\quad\quad ··· \quad\quad\quad\quad · \\ · \quad \quad\quad\quad\quad\quad· \quad\quad\quad ··· \quad\quad\quad\quad · \\ <x_n·x_1> \quad <x_n·x_2> \quad ··· \quad <x_n·x_n> \end{bmatrix} G=[xi⋅xj]N∗N=⎣⎢⎢⎢⎢⎢⎢⎡<x1⋅x1><x1⋅x2>⋅⋅⋅<x1⋅xn><x2⋅x1><x2⋅x2>⋅⋅⋅<x2⋅xn>⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅<xn⋅x1><xn⋅x2>⋅⋅⋅<xn⋅xn>⎦⎥⎥⎥⎥⎥⎥⎤
算法示例
例2.2:数据通2.1,其正实例点是 x 1 = ( 3 , 3 ) T x_1=(3,3)^T x1=(3,3)T, x 2 = ( 4 , 3 ) T x_2=(4,3)^T x2=(4,3)T,其负实例点是 x 3 = ( 1 , 1 ) T x_3=(1,1)^T x3=(1,1)T,试用感知机学习算法的对偶形式求感知机模型。
- 解
(1)取 a i = 0 , i = 1 , 2 , 3 , b = 0 , η = 1 ; a_i=0,i=1,2,3,b=0,\eta=1; ai=0,i=1,2,3,b=0,η=1;
(2)计算Gram矩阵:
G = [ x i ⋅ x j ] N ∗ N = [ < x 1 ⋅ x 1 > < x 1 ⋅ x 2 > < x 1 ⋅ x 3 > < x 2 ⋅ x 1 > < x 2 ⋅ x 2 > < x 2 ⋅ x 3 > < x 3 ⋅ x 1 > < x 3 ⋅ x 2 > < x 3 ⋅ x n > ] = [ 3 ∗ 3 + 3 ∗ 3 3 ∗ 4 + 3 ∗ 3 3 ∗ 1 + 3 ∗ 1 4 ∗ 3 + 3 ∗ 3 4 ∗ 4 + 3 + 3 4 ∗ 1 + 3 ∗ 1 1 ∗ 3 + 1 ∗ 3 1 ∗ 3 + 1 ∗ 3 1 ∗ 1 + 1 ∗ 1 ] = [ 18 21 6 21 25 7 6 7 2 ] G=[x_i·x_j]_{N*N} = \begin{bmatrix} <x_1·x_1> \quad <x_1·x_2> \quad <x_1·x_3> \\ <x_2·x_1> \quad <x_2·x_2> \quad <x_2·x_3> \\ <x_3·x_1> \quad <x_3·x_2> \quad <x_3·x_n> \end{bmatrix}= \begin{bmatrix} 3*3+3*3 \quad 3*4+3*3 \quad 3*1+3*1 \\ 4*3+3*3 \quad 4*4+3+3 \quad 4*1+3*1 \\ 1*3+1*3 \quad 1*3+1*3 \quad 1*1+1*1 \end{bmatrix}= \begin{bmatrix} 18 \quad 21 \quad 6 \\ 21 \quad 25 \quad 7 \\ 6 \quad 7 \quad 2 \end{bmatrix} G=[xi⋅xj]N∗N=⎣⎡<x1⋅x1><x1⋅x2><x1⋅x3><x2⋅x1><x2⋅x2><x2⋅x3><x3⋅x1><x3⋅x2><x3⋅xn>⎦⎤=⎣⎡3∗3+3∗33∗4+3∗33∗1+3∗14∗3+3∗34∗4+3+34∗1+3∗11∗3+1∗31∗3+1∗31∗1+1∗1⎦⎤=⎣⎡1821621257672⎦⎤
(3)误分条件 y i ( w ⋅ x i + b ) = y i ( ∑ j = 1 N α j y j x j ⋅ x i + b ) ≤ 0 y_i(w·x_i+b)=y_i(\sum_{j=1}^N\alpha_jy_jx_j·x_i+b)\le0 yi(w⋅xi+b)=yi(j=1∑Nαjyjxj⋅xi+b)≤0
更新参数 a i ← a i + 1 , b ← b + y i a_i \gets a_i+1 , b \gets b+ y_i ai←ai+1,b←b+yi
(4)迭代。过程从略见表
k k k | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
x 1 x_1 x1 | x 3 x_3 x3 | x 3 x_3 x3 | x 3 x_3 x3 | x 1 x_1 x1 | x 3 x_3 x3 | x 3 x_3 x3 | ||
a 1 a_1 a1 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 2 |
a 2 a_2 a2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
a 3 a_3 a3 | 0 | 0 | 1 | 2 | 3 | 3 | 4 | 5 |
b b b | 0 | 1 | 0 | -1 | -2 | -1 | -2 | -3 |
(5)
w
=
2
x
1
+
0
x
2
−
5
x
3
=
(
1
,
1
)
T
w=2x_1+0x_2-5x_3=(1,1)^T
w=2x1+0x2−5x3=(1,1)T
b
=
−
3
b=-3
b=−3
分离超平面
x
(
1
)
+
x
(
2
)
−
3
=
0
x^{(1)}+x^{(2)}-3=0
x(1)+x(2)−3=0
感知机模型
f
(
x
)
=
s
i
g
n
(
x
(
1
)
+
x
(
2
)
−
3
)
f(x)=sign(x^{(1)}+x^{(2)}-3)
f(x)=sign(x(1)+x(2)−3)
- 注:
与原始形式一样,感知机学习算法的对偶形式迭代是收敛的,存在多个解。