文章目录
4-1 总结Hard Margin SVM的数学推导过程
我们从原问题:
min
ω
∣
∣
ω
∣
∣
2
2
\min \limits_{\bold{\omega}}\frac{||\omega||^2}{2}
ωmin2∣∣ω∣∣2
s
.
t
.
y
(
i
)
(
ω
T
x
(
i
)
+
b
)
⩾
1
i=1,2,...m
s.t.\quad y^{(i)}(\bold{\omega}^T\bold{x^{(i)}}+b)\geqslant1 \quad\text{i=1,2,...m}
s.t.y(i)(ωTx(i)+b)⩾1i=1,2,...m
转化为对偶问题:
min
α
1
2
∑
i
=
1
m
∑
j
=
1
m
α
i
α
j
y
i
y
j
x
(
i
)
⋅
x
(
j
)
−
∑
i
=
1
m
α
i
\min\limits_{\alpha} \frac{1}{2}\sum\limits_{i=1}^m\sum\limits_{j=1}^m\alpha_i\alpha_jy_iy_jx^{(i)}\cdot x^{(j)}-\sum\limits_{i=1}^m\alpha_i
αmin21i=1∑mj=1∑mαiαjyiyjx(i)⋅x(j)−i=1∑mαi
s
.
t
.
s.t.
s.t.
α
i
⩾
0
i
=
1
,
2
,
.
.
.
,
m
\alpha_i\geqslant0\quad i=1,2,...,m
αi⩾0i=1,2,...,m
∑
i
=
1
m
α
i
y
i
=
0
\sum\limits_{i=1}^m\alpha_iy_i=0
i=1∑mαiyi=0
同时因为原问题满足Slater条件,所以原问题与对偶问题为强对偶关系,最优解是一致的。所以接下来全力以赴解决对偶问题就可以啦。
解决对偶问题的SMO算法:
-
为 α i \alpha_i αi设置初始值,让 α i = 0 , i = 1 , 2 , . . . m \alpha_i=0,\quad i=1,2,...m αi=0,i=1,2,...m
-
外层循环:
(1)根据KKT条件选择两个优化变量
变量一:根据KKT条件选择出第一个违反KKT条件的变量,作为 α i \alpha_i αi
{ α i > 0 y i u i = 1 α i = 0 y i u i ⩾ 1 \begin{cases} \alpha_i>0 & y_iu_i=1\\ \alpha_i=0 & y_iu_i\geqslant1 \end{cases} {αi>0αi=0yiui=1yiui⩾1
其中 u i = ∑ j = 1 m y j α j X j ⋅ X i + b u_i=\sum\limits_{j=1}^my_j\alpha_jX_j\cdot X_i+b ui=j=1∑myjαjXj⋅Xi+b第二个变量 α j \alpha_j αj的寻找
通过在所有不违反KKT条件的乘子中,寻找使 ∣ E i − E j ∣ |E_i-E_j| ∣Ei−Ej∣最大的 α j \alpha_j αj
其中 E i = u i − y i E_i=u_i-y_i Ei=ui−yi(2)求解子问题
α j \alpha_j αj的值为
α j n e w = { L if α j b e s t < L α j b e s t if L ⩽ α j b e s t ⩽ H H if α j b e s t > H \alpha_j^{new}=\left\{ \begin{aligned} &L & \quad\text{if }\alpha_j^{best}<L\\ &\alpha_j^{best} &\quad\text{if }L\leqslant\alpha_j^{best}\leqslant H \\ &H &\quad\text{if }\alpha_j^{best}>H \end{aligned} \right. αjnew=⎩⎪⎪⎨⎪⎪⎧LαjbestHif αjbest<Lif L⩽αjbest⩽Hif αjbest>H其中 α j b e s t = α j ∗ + y j ( E i − E j ) η \alpha_j^{best}=\alpha_j^*+\frac{y_j(E_i-E_j)}{\eta} αjbest=αj∗+ηyj(Ei−Ej)
E i = u i − y i E_i=u_i-y_i Ei=ui−yi, η = K i i + K j j − 2 K i j \eta=K_{ii}+K_{jj}-2K_{ij} η=Kii+Kjj−2Kij{ L = 0 , H = α i + α j y i y j = 1 L = max { 0 , α i − α j } , H = + ∞ y i y j = − 1 \begin{cases} L=0,&H=\alpha_i+\alpha_j&\quad y_iy_j=1\\ L=\max\{0,\alpha_i - \alpha_j\},&H=+\infin&\quad y_iy_j=-1 \end{cases} {L=0,L=max{0,αi−αj},H=αi+αjH=+∞yiyj=1yiyj=−1
α i \alpha_i αi的值
α i n e w = α i ∗ + s ( α j ∗ − α j n e w ) \alpha_i^{new}=\alpha_i^*+s(\alpha_j^*-\alpha_j^{new}) αinew=αi∗+s(αj∗−αjnew)更新b的值
b 1 n e w = b ∗ − E 1 + y 1 K 11 ( α 1 ∗ − α 1 n e w ) + y 2 K 21 ( α 2 ∗ − α 2 n e w ) b_1^{new}=b^*-E_1+y_1K_{11}(\alpha_1^*-\alpha_1^{new})+y_2K_{21}(\alpha_2^*-\alpha_2^{new}) b1new=b∗−E1+y1K11(α1∗−α1new)+y2K21(α2∗−α2new)
b 2 n e w = b ∗ − E 2 + y 1 K 12 ( α 1 ∗ − α 1 n e w ) + y 2 K 22 ( α 2 ∗ − α 2 n e w ) b_2^{new}=b^*-E_2+y_1K_{12}(\alpha_1^*-\alpha_1^{new})+y_2K_{22}(\alpha_2^*-\alpha_2^{new}) b2new=b∗−E2+y1K12(α1∗−α1new)+y2K22(α2∗−α2new)
最终 b n e w b^{new} bnew的取值为
b n e w = b 1 n e w + b 2 n e w 2 b^{new}=\frac{b_1^{new}+b_2^{new}}{2} bnew=2b1new+b2new
对b的更新还不是十分确定,先暂时按这样的方式实现下代码更新 E k E_k Ek的值
E k n e w = ∑ i = 1 m y i α i K i k + b n e w − y k E_k^{new}=\sum\limits_{i=1}^my_i\alpha_iK_{ik}+b^{new}-y_k Eknew=i=1∑myiαiKik+bnew−yk -
判断是否收敛,如果已经收敛,则退出结束循环,否则继续循环
思路大体是清晰了,接下来就试着用代码实现下简单的Hard Margin SVM
4-2 python实现Hard Margin SVM
4-2-1 建立数据集
为了方便验证计算的结果,手动建立数据集
import numpy as np
import matplotlib.pyplot as plt
X=np.array([
[-1,6],
[1,5],
[1,7],
[3,3],
[5,4],
[2,0]])
y=np.array([1,1,1,-1,-1,-1])
plt.scatter(X[y==1,0],X[y==1,1],label='+',color='r')
plt.scatter(X[y==-1,0],X[y==-1,1],label='-',color='b')
plt.legend()
可以很明显看出支撑向量分别为(1,5),(3,3)
4-2-2 初始化
我们需要初始化的量:
- α \alpha α=0
- b=0
- E的值
- m的值
def calE(i):
return u(i)-y[i]
def u(i):
return np.sum(X.dot(X[i])*alpha*y)+b
4-2-3 设置循环,寻找第一个不满足kkt条件的 α \alpha α
{ α i > 0 y i u i = 1 α i = 0 y i u i ⩾ 1 \begin{cases} \alpha_i>0 & y_iu_i=1\\ \alpha_i=0 & y_iu_i\geqslant1 \end{cases} {αi>0αi=0yiui=1yiui⩾1
def KKT(i):
yu=y[i]*u(i)
if alpha[i]==0:
return yu >= 1
else:
return yu == 1
def search_alpha():
index_list = np.where(alpha>0)
non_satisfy = np.where(alpha==0)
index_list= np.append([index_list],[non_satisfy]) #为了优先搜索alpha>0的项
for i in index_list:
if KKT(i):
continue
E_temp=np.abs(E-E[i])
j=np.argsort(E_temp)[-1]#取-1是因为argsort是升序排列
return i,j
4-2-4 求解子问题
确定
α
j
\alpha_j
αj的边界
{
L
=
0
,
H
=
α
i
+
α
j
y
i
y
j
=
1
L
=
max
{
0
,
α
i
−
α
j
}
,
H
=
+
inf
y
i
y
j
=
−
1
\begin{cases} L=0,&H=\alpha_i+\alpha_j&\quad y_iy_j=1\\ L=\max\{0,\alpha_i - \alpha_j\},&H=+\inf &\quad y_iy_j=-1 \end{cases}
{L=0,L=max{0,αi−αj},H=αi+αjH=+infyiyj=1yiyj=−1
def HL(i,j):
if y[i]*y[j]==1:
L=0
H=alpha[i]+alpha[j]
else:
L=np.max([0,alpha[i]-alpha[j]])
H=2147483647 #numpy为int32
return L,H
确定
α
j
\alpha_j
αj在无约束情况下的最值点
α
j
b
e
s
t
=
α
j
∗
+
y
j
(
E
i
−
E
j
)
η
\alpha_j^{best}=\alpha_j^*+\frac{y_j(E_i-E_j)}{\eta}
αjbest=αj∗+ηyj(Ei−Ej)
其中
E
i
=
u
i
−
y
i
E_i=u_i-y_i
Ei=ui−yi,
η
=
K
i
i
+
K
j
j
−
2
K
i
j
\eta=K_{ii}+K_{jj}-2K_{ij}
η=Kii+Kjj−2Kij
def K(i,j):
return X[i].dot(X[j])
def alpha_unclip(i,j):
eta = K(i,i)+K(j,j)-2*K(i,j)
if eta<0:
print(eta<0)
return alpha[j]+y[j]*(E[i]-E[j])/eta
确定
α
j
n
e
w
\alpha_j^{new}
αjnew和
α
i
\alpha_i
αi的值
α
j
n
e
w
=
{
L
if
α
j
b
e
s
t
<
L
α
j
b
e
s
t
if
L
⩽
α
j
b
e
s
t
⩽
H
H
if
α
j
b
e
s
t
>
H
\alpha_j^{new}=\left\{ \begin{aligned} &L & \quad\text{if }\alpha_j^{best}<L\\ &\alpha_j^{best} &\quad\text{if }L\leqslant\alpha_j^{best}\leqslant H \\ &H &\quad\text{if }\alpha_j^{best}>H \end{aligned} \right.
αjnew=⎩⎪⎪⎨⎪⎪⎧LαjbestHif αjbest<Lif L⩽αjbest⩽Hif αjbest>H
α
i
n
e
w
=
α
i
∗
+
s
(
α
j
∗
−
α
j
n
e
w
)
\alpha_i^{new}=\alpha_i^*+s(\alpha_j^*-\alpha_j^{new})
αinew=αi∗+s(αj∗−αjnew)
def alpha_new(i,j):
alpha_new_j=0
alpha_new_i=0
L,H=HL(i,j)
unclip=alpha_unclip(i,j)
if unclip<L:
alpha_new_j=L
elif L<=unclip<=H:
alpha_new_j=unclip
else:
alpha_new_j=H
alpha_new_i=alpha[i]+y[i]*y[j]*(alpha[j]-alpha_new_j)
return alpha_new_i,alpha_new_j
更新b
b
1
n
e
w
=
b
∗
−
E
1
+
y
1
K
11
(
α
1
∗
−
α
1
n
e
w
)
+
y
2
K
21
(
α
2
∗
−
α
2
n
e
w
)
b_1^{new}=b^*-E_1+y_1K_{11}(\alpha_1^*-\alpha_1^{new})+y_2K_{21}(\alpha_2^*-\alpha_2^{new})
b1new=b∗−E1+y1K11(α1∗−α1new)+y2K21(α2∗−α2new)
b
2
n
e
w
=
b
∗
−
E
2
+
y
1
K
12
(
α
1
∗
−
α
1
n
e
w
)
+
y
2
K
22
(
α
2
∗
−
α
2
n
e
w
)
b_2^{new}=b^*-E_2+y_1K_{12}(\alpha_1^*-\alpha_1^{new})+y_2K_{22}(\alpha_2^*-\alpha_2^{new})
b2new=b∗−E2+y1K12(α1∗−α1new)+y2K22(α2∗−α2new)
最终
b
n
e
w
b^{new}
bnew的取值为
b
n
e
w
=
b
1
n
e
w
+
b
2
n
e
w
2
b^{new}=\frac{b_1^{new}+b_2^{new}}{2}
bnew=2b1new+b2new
对b的更新还不是十分确定,先暂时按这样的方式实现下代码
def find_b_new(i,j,alpha_new_i,alpha_new_j):
b1_new = b - E[i] + y[i]*K(i,i)*(alpha[i]-alpha_new_i)+y[j]*K(j,i)*(alpha[j]-alpha_new_j)
b2_new = b - E[j] + y[i]*K(i,j)*(alpha[i]-alpha_new_i)+y[j]*K(j,j)*(alpha[j]-alpha_new_j)
if alpha_new_i>0:
return b1_new
elif alpha_new_j>0:
return b2_new
else:
return (b1_new+b2_new)/2
更新
E
k
E_k
Ek的值
E
k
n
e
w
=
∑
i
=
1
m
y
i
α
i
K
i
k
+
b
n
e
w
−
y
k
E_k^{new}=\sum\limits_{i=1}^my_i\alpha_iK_{ik}+b^{new}-y_k
Eknew=i=1∑myiαiKik+bnew−yk
def E_update(i,j):
E[i]=calE(i)
E[j]=calE(j)
4-2-5 设置完整的计算函数
- 初始化
- 循环不满足KKT条件的alpha,并更新
def ifstop():
for i in range(len(X)):
if ~KKT(i):
return False
return True
m = len(X)
global b
b = 0
alpha = np.zeros(m)
E = np.array([calE(i) for i in range(m)])
def fit(X,y):
for k in range(10000):
i,j=search_alpha()
alpha_new_i,alpha_new_j = alpha_new(i,j)
b_new = find_b_new(i,j,alpha_new_i,alpha_new_j)
alpha[i]=alpha_new_i
alpha[j]=alpha_new_j
E_update(i,j)
b=b_new
w=np.sum(X*np.tile(y.reshape(-1,1),(1,X.shape[1]))*np.tile(alpha.reshape(-1,1),(1,X.shape[1])),axis=0)
return w,b
w,b = fit(X,y)
w
array([-0.48169935, 0.73137255])
b
-2.175163398692811
x_plot=np.linspace(-1,5,1000)
y_plot=(-b-w[0]*x_plot)/w[1]
plt.scatter(X[y==1,0],X[y==1,1])
plt.scatter(X[y==-1,0],X[y==-1,1])
plt.plot(x_plot,y_plot)
看起来好还可以,但实际上和sklearn中的svm结果做对比就会发现。。。
from sklearn.svm import LinearSVC
clf=LinearSVC()
clf.fit(X,y)
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=1000,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
x_plot=np.linspace(-1,5,1000)
y_plot=(-b-w[0]*x_plot)/w[1]
y_plot2 = (-clf.intercept_-clf.coef_[0,0]*x_plot)/clf.coef_[0,1]
plt.scatter(X[y==1,0],X[y==1,1])
plt.scatter(X[y==-1,0],X[y==-1,1])
plt.plot(x_plot,y_plot)
plt.plot(x_plot,y_plot2)
肯定有问题的
4-3 总结
虽然表面上依照原问题->对偶问题->求解对偶问题,完成了最终的编码,但肯定哪里是有问题。决定把含有松弛变量和惩罚因子的内容再推导一遍后,也许就清晰了。