# 一, 推导至SMO要解决的SVM对偶形式

$\begin{array}{r}\\ \text{(1)}& & \underset{w,b,\xi }{min}\phantom{\rule{1em}{0ex}}\frac{1}{2}‖w{‖}^{2}+C\sum _{i=1}^{N}{\xi }_{i}\text{(2)}& & s.t.\phantom{\rule{1em}{0ex}}{y}_{i}\left(w\cdot {x}_{i}+b\right)\ge 1-{\xi }_{i},\phantom{\rule{1em}{0ex}}i=1,2,\cdots ,N\text{(3)}& & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}{\xi }_{i}\ge 0,\phantom{\rule{1em}{0ex}}i=1,2,\cdots ,N\end{array}$

## 1, 构建Lagrange Function

$\begin{array}{r}\\ \text{(4)}& L\left(w,b,\xi ,\alpha ,\mu \right)& =\frac{1}{2}‖w{‖}^{2}+C\sum _{i=1}^{N}{\xi }_{i}+\sum _{i=1}^{N}{\alpha }_{i}\left(-{y}_{i}\left(w\cdot {x}_{i}+b\right)+1-{\xi }_{i}\right)-\sum _{i=1}^{N}{\mu }_{i}{\xi }_{i}\end{array}$

## 2,转化为Lagrange Dual Problem

$\begin{array}{}\text{(5)}& \underset{\alpha ,\mu }{max}\underset{w,b,\xi }{min}L\left(w,b,\xi ,\alpha ,\mu \right)\end{array}$

## 3,先求内层的min

$\begin{array}{r}\\ \text{(6)}& & {\mathrm{\nabla }}_{w}L\left(w,b,\xi ,\alpha ,\mu \right)=w-\sum _{i=1}^{N}{\alpha }_{i}{y}_{i}{x}_{i}=0\text{(7)}& & {\mathrm{\nabla }}_{b}L\left(w,b,\xi ,\alpha ,\mu \right)=-\sum _{i=1}^{N}{\alpha }_{i}{y}_{i}=0\text{(8)}& & {\mathrm{\nabla }}_{{\xi }_{i}}L\left(w,b,\xi ,\alpha ,\mu \right)=C-{\alpha }_{i}-{\mu }_{i}=0\end{array}$

$\begin{array}{r}\\ \text{(9)}& & w=\sum _{i=1}^{N}{\alpha }_{i}{y}_{i}{x}_{i}\text{(10)}& & \sum _{i=1}^{N}{\alpha }_{i}{y}_{i}=0\text{(11)}& & C-{\alpha }_{i}-{\mu }_{i}=0\end{array}$

$\begin{array}{}\text{(12)}& \underset{w,b,\xi }{min}L\left(w,b,\xi ,\alpha ,\mu \right)=-\frac{1}{2}\sum _{i=1}^{N}\sum _{i=1}^{N}{\alpha }_{i}{\alpha }_{j}{y}_{i}{y}_{j}K\left({x}_{i},{x}_{j}\right)+\sum _{i=1}^{N}{\alpha }_{i}\end{array}$

## 4, 求解外层的max

$\begin{array}{}\text{(13)}& \underset{\alpha ,\mu }{max}-\frac{1}{2}& \sum _{i=1}^{N}\sum _{i=1}^{N}{\alpha }_{i}{\alpha }_{j}{y}_{i}{y}_{j}K\left({x}_{i},{x}_{j}\right)+\sum _{i=1}^{N}{\alpha }_{i}\text{(14)}& s.t.\phantom{\rule{1em}{0ex}}& \sum _{i=1}^{N}{\alpha }_{i}{y}_{i}=0\text{(15)}& & C-{\alpha }_{i}-{\mu }_{i}=0\text{(16)}& & {\alpha }_{i}\ge 0\text{(17)}& & {\mu }_{i}\ge 0\end{array}$

$\begin{array}{}\text{(18)}& 0\le {\alpha }_{i}\le C\end{array}$

## 5,最终形式:一个凸二次规划的对偶问题

$\begin{array}{}\text{(19)}& \underset{\alpha }{min}\phantom{\rule{1em}{0ex}}& \frac{1}{2}\sum _{i=1}^{N}\sum _{j=1}^{N}{\alpha }_{i}{\alpha }_{j}{y}_{i}{y}_{j}K\left({x}_{i},{x}_{j}\right)-\sum _{i=1}^{N}{\alpha }_{i}\text{(20)}& s.t.\phantom{\rule{1em}{0ex}}& \sum _{i=1}^{N}{\alpha }_{i}{y}_{i}=0\text{(21)}& \phantom{\rule{1em}{0ex}}& 0\le {\alpha }_{i}\le C,\phantom{\rule{1em}{0ex}}i=1,2,\cdots ,N\end{array}$

$\begin{array}{r}\\ \text{(22)}& {\mathrm{\nabla }}_{w}L\left({w}^{\ast },{b}^{\ast },{\xi }^{\ast },{\alpha }^{\ast },{\mu }^{\ast }\right)& ={w}^{\ast }-\sum _{i=1}^{N}{\alpha }_{i}^{\ast }{y}_{i}{x}_{i}=0\text{(23)}& {\mathrm{\nabla }}_{b}L\left({w}^{\ast },{b}^{\ast },{\xi }^{\ast },{\alpha }^{\ast },{\mu }^{\ast }\right)& =-\sum _{i=1}^{N}{\alpha }_{i}^{\ast }{y}_{i}=0\text{(24)}& {\mathrm{\nabla }}_{{\xi }_{i}}L\left({w}^{\ast },{b}^{\ast },{\xi }^{\ast },{\alpha }^{\ast },{\mu }^{\ast }\right)& =C-{\alpha }_{i}^{\ast }-{\mu }_{i}^{\ast }=0\text{(25)}& {\alpha }_{i}^{\ast }& \ge 0\text{(26)}& 1-{\xi }_{i}^{\ast }+{y}_{i}\left(& {w}^{\ast }\cdot {x}^{\ast }+{b}^{\ast }\right)\le 0\text{(27)}& {\alpha }_{i}^{\ast }\left(1-{\xi }_{i}^{\ast }+{y}_{i}& \left({w}^{\ast }\cdot {x}^{\ast }+{b}^{\ast }\right)\right)=0\text{(28)}& {\mu }_{i}^{\ast }& \ge 0\text{(29)}& {\xi }_{i}^{\ast }& \le 0\text{(30)}& {\mu }_{i}^{\ast }{\xi }_{i}^{\ast }& =0\end{array}$

# 二,切入正题:SMO算法完整推导

SMO算法，首先选择$\alpha$$\alpha$的两个分量来求解式(19)的子问题，思路是不断迭代求解原问题的子问题来逼近原始问题的解, 具体怎么选取这两个分量待会儿再说

## 1, 选取两个α$\alpha$$\alpha$的分量,求解式(19)子问题，目的是获取对这两个分量进行更新的方法

$\begin{array}{r}\\ \text{(31)}& & \underset{\alpha 1,\alpha 2}{min}\phantom{\rule{1em}{0ex}}W\left({\alpha }_{1},{\alpha }_{2}\right)=\frac{1}{2}{\alpha }_{1}^{2}{K}_{11}+\frac{1}{2}{\alpha }_{2}^{2}{K}_{22}+{\alpha }_{1}{\alpha }_{2}{y}_{1}{y}_{2}{K}_{12}+{\alpha }_{1}{y}_{1}\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{1i}+{\alpha }_{2}{y}_{2}\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{2i}-{\alpha }_{1}-{\alpha }_{2}\text{(32)}& & s.t.\phantom{\rule{1em}{0ex}}{\alpha }_{1}{y}_{1}+{\alpha }_{2}{y}_{2}=-\sum _{i=3}^{N}{y}_{i}{\alpha }_{i}=\varsigma \text{(33)}& & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}0\le {\alpha }_{i}\le C,\phantom{\rule{1em}{0ex}}i=1,2\end{array}$

### 1.1 换元

$\begin{array}{}\text{(34)}& \underset{{\alpha }_{2}}{min}W\left({\alpha }_{2}\right)=\frac{1}{2}\left(\varsigma -{\alpha }_{2}{y}_{2}{\right)}^{2}{K}_{11}+\frac{1}{2}{\alpha }_{2}^{2}{K}_{22}+\left(\varsigma -{\alpha }_{2}{y}_{2}\right){\alpha }_{2}{y}_{2}{K}_{12}+\left(\varsigma -{\alpha }_{2}{y}_{2}\right)\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{1i}+{\alpha }_{2}{y}_{2}\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{2i}-\left(\varsigma -{\alpha }_{2}{y}_{2}\right){y}_{1}-{\alpha }_{2}\end{array}$

### 1.2 求极值点

$\begin{array}{r}\\ \text{(35)}& \frac{\mathrm{\partial }W}{{\alpha }_{2}}& =-\varsigma {y}_{2}{K}_{11}+{K}_{11}{\alpha }_{2}+{K}_{22}{\alpha }_{2}+\varsigma {y}_{2}{K}_{12}-2{K}_{12}{\alpha }_{2}-{y}_{2}\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{1i}+{y}_{2}\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{2i}+{y}_{1}{y}_{2}-1\text{(36)}& & =\left({K}_{11}+{K}_{22}-2{K}_{12}\right){\alpha }_{2}+{y}_{2}\left(-\varsigma {K}_{11}+\varsigma {K}_{12}-\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{1i}+\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{2i}+{y}_{1}-{y}_{2}\right)\end{array}$

1, 模型对$x$$x$的预测值
$\begin{array}{}\text{(37)}& g\left(x\right)=\sum _{i=1}^{N}{\alpha }_{i}{y}_{i}K\left({x}_{i},x\right)+b\end{array}$

2, 预测值减去真实值
$\begin{array}{}\text{(38)}& {E}_{i}=g\left({x}_{i}\right)-{y}_{i}=\left(\sum _{j=1}^{N}{\alpha }_{j}{y}_{j}K\left({x}_{j},{x}_{i}\right)+b\right)-{y}_{i},\phantom{\rule{1em}{0ex}}i=1,2\end{array}$

3,式(36)中比较难处理的那一坨
$\begin{array}{}\text{(39)}& {v}_{i}=\sum _{j=3}^{N}{\alpha }_{j}{y}_{j}{K}_{ij}& =g\left({x}_{i}\right)-\sum _{j=1}^{2}{\alpha }_{j}{y}_{j}{K}_{ij}-b,\phantom{\rule{1em}{0ex}}i=1,2\text{(40)}& & ={E}_{i}+{y}_{i}-\sum _{j=1}^{2}{\alpha }_{j}{y}_{j}{K}_{ij}-b,\phantom{\rule{1em}{0ex}}i=1,2\end{array}$

$\begin{array}{}\text{(41)}& \left({K}_{11}+{K}_{22}-2{K}_{12}\right){\alpha }_{2}^{new,unc}& ={y}_{2}\left(\varsigma {K}_{11}-\varsigma {K}_{12}+\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{1i}-\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{2i}-{y}_{1}+{y}_{2}\right)\text{(42)}& & ={y}_{2}\left(\sum _{i=1}^{2}{\alpha }_{i}{y}_{i}{K}_{11}-\sum _{i=1}^{2}{\alpha }_{i}{y}_{i}{K}_{12}+{v}_{1}-{v}_{2}-{y}_{1}+{y}_{2}\right)\text{(43)}& & ={y}_{2}\left(\sum _{i=1}^{2}{\alpha }_{i}{y}_{i}{K}_{11}-\sum _{i=1}^{2}{\alpha }_{i}{y}_{i}{K}_{12}+{E}_{1}+{y}_{1}-\sum _{i=1}^{2}{\alpha }_{i}{y}_{i}{K}_{1i}-b-{E}_{2}-{y}_{2}+\sum _{i=1}^{2}{\alpha }_{i}{y}_{i}{K}_{2i}+b-{y}_{1}+{y}_{2}\right)\text{(44)}& & ={y}_{2}\left({E}_{1}-{E}_{2}+{\alpha }_{2}{y}_{2}{K}_{11}-2{\alpha }_{2}{y}_{2}{K}_{12}+{\alpha }_{2}{y}_{2}{K}_{22}\right)\text{(45)}& & =\left({K}_{11}-2{K}_{12}+{K}_{22}\right){\alpha }_{2}+{y}_{2}\left({E}_{1}-{E}_{2}\right)\end{array}$

### 1.3 获取αnew,unc2${\alpha }_{2}^{new,unc}$$\alpha_2^{new,unc}$的迭代方式

$\begin{array}{}\text{(46)}& {\alpha }_{2}^{new,unc}={\alpha }_{2}^{old}+\frac{{y}_{2}\left({E}_{1}-{E}_{2}\right)}{{K}_{11}-2{K}_{12}+{K}_{22}}\end{array}$

### 1.4 根据α2${\alpha }_{2}$$\alpha_2$的定义域裁剪得到αnew2${\alpha }_{2}^{new}$$\alpha_2^{new}$

• ${y}_{1}={y}_{2}$$y_1=y_2$
根据式(32),设${\alpha }_{1}+{\alpha }_{2}=k$$\alpha_1 +\alpha_2 = k$
根据式(33),可得:
$\left\{\begin{array}{rl}0& \le {\alpha }_{2}\le C\\ 0& \le k-{\alpha }_{2}\le C\end{array}⇒\left\{\begin{array}{rl}& 0\le {\alpha }_{2}\le C\\ & k-C\le {\alpha }_{2}\le k\end{array}⇒\left\{\begin{array}{rl}& 0\le {\alpha }_{2}\le C\\ & {\alpha }_{1}^{old}+{\alpha }_{2}^{old}-C\le {\alpha }_{2}\le {\alpha }_{1}^{old}+{\alpha }_{2}^{old}\end{array}$

${\alpha }_{2}$$\alpha_2$上界为$H$$H$,下界为$L$$L$, 则有:
$\begin{array}{}\text{(47)}& L=max\left(0,{\alpha }_{1}^{old}+{\alpha }_{2}^{old}-C\right)\text{(48)}& H=min\left(C,{\alpha }_{1}^{old}+{\alpha }_{2}^{old}\right)\end{array}$
• ${y}_{1}\ne {y}_{2}$$y_1 \neq y_2$
根据式(32),设${\alpha }_{1}-{\alpha }_{2}=k$$\alpha_1 - \alpha_2 = k$
根据式(33),可得:
$\left\{\begin{array}{rl}0& \le {\alpha }_{2}\le C\\ 0& \le {\alpha }_{2}+k\le C\end{array}⇒\left\{\begin{array}{rl}0& \le {\alpha }_{2}\le C\\ -k& \le {\alpha }_{2}\le C-k\end{array}⇒\left\{\begin{array}{rl}0& \le {\alpha }_{2}\le C\\ {\alpha }_{2}^{old}-{\alpha }_{1}^{old}& \le {\alpha }_{2}\le C+{\alpha }_{2}^{old}-{\alpha }_{1}^{old}\end{array}$

可得新的上下界:
$\begin{array}{}\text{(49)}& L=max\left(0,{\alpha }_{2}^{old}-{\alpha }_{1}^{old}\right)\text{(50)}& H=min\left(C,C+{\alpha }_{2}^{old}-{\alpha }_{1}^{old}\right)\end{array}$

${\alpha }_{2}^{new}=\left\{\begin{array}{rlr}& H\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}},& {\alpha }_{2}^{new,unc}>H\\ & {\alpha }_{2}^{new,unc},& L\le {\alpha }_{2}^{new,unc}\le C\\ & L\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}},& {\alpha }_{2}^{new,unc}

### 1.5 根据约束条件得到αnew1${\alpha }_{1}^{new}$$\alpha_1^{new}$

$\begin{array}{}\text{(51)}& {\alpha }_{1}^{old}{y}_{1}+{\alpha }_{2}^{old}{y}_{2}={\alpha }_{1}^{new}{y}_{1}+{\alpha }_{2}^{new}{y}_{2}\end{array}$

$\begin{array}{}\text{(52)}& {\alpha }_{1}^{new}=& \left({\alpha }_{1}^{old}{y}_{1}+{\alpha }_{2}^{old}{y}_{2}-{\alpha }_{2}^{new}{y}_{2}\right){y}_{1}\text{(53)}& =& {\alpha }_{1}^{old}+\left({\alpha }_{2}^{old}-{\alpha }_{2}^{new}\right){y}_{1}{y}_{2}\end{array}$

## 2, 通过对这两个α$\alpha$$\alpha$分量的更新,获取其它变量的更新

### 2.1 计算阈值b

$\begin{array}{}\text{(54)}& b={y}_{i}-\sum _{j=1}^{N}{\alpha }_{j}{y}_{j}{K}_{ij}\end{array}$

$\begin{array}{}\text{(55)}& {b}_{1}^{new}={y}_{1}-\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{1i}-{\alpha }_{1}^{new}{y}_{1}{K}_{11}-{\alpha }_{2}^{new}{y}_{2}{K}_{12}\end{array}$

$\begin{array}{}\text{(56)}& {E}_{1}=g\left({x}_{1}\right)-{y}_{1}=\sum _{i=3}^{N}{\alpha }_{i}{y}_{i}{K}_{1i}+{\alpha }_{1}^{old}{y}_{1}{K}_{11}+{\alpha }_{2}^{old}{y}_{2}{K}_{21}+{b}^{old}-{y}_{1}\end{array}$

$\begin{array}{}\text{(57)}& {b}_{1}^{new}=-{E}_{1}+{y}_{1}{K}_{11}\left({\alpha }_{1}^{old}-{\alpha }_{1}^{new}\right)+{y}_{2}{K}_{12}\left({\alpha }_{2}^{old}-{\alpha }_{2}^{new}\right)+{b}^{old}\end{array}$

$\begin{array}{}\text{(58)}& {b}_{2}^{new}=-{E}_{2}+{y}_{1}{K}_{12}\left({\alpha }_{1}^{old}-{\alpha }_{1}^{new}\right)+{y}_{2}{K}_{22}\left({\alpha }_{2}^{old}-{\alpha }_{2}^{new}\right)+{b}^{old}\end{array}$

• $0<{\alpha }_{1}$0<\alpha_1, $0<{\alpha }_{2}$0<\alpha_2:
${b}^{new}={b}_{1}^{new}={b}_{2}^{new}$$b^{new} = b_1^{new} = b_2^{new}$ (此时${x}_{1}$$x_1$${x}_{2}$$x_2$都在间隔边界上)

• 若只有一个$0<{\alpha }_{i}$0<\alpha_i
${b}^{new}={b}_{i}^{new}$$b^{new} = b_i^{new}$

• ${\alpha }_{1},{\alpha }_{2}\in \left\{0,C\right\}$$\alpha_1,\alpha_2 \in \{0,C\}$
${b}^{new}=\frac{1}{2}\left({b}_{1}^{new}+{b}_{2}^{new}\right)$$b^{new} = \frac{1}{2}(b_1^{new}+b_2^{new})$, (若${\alpha }_{i}=0$$\alpha_i=0$说明${x}_{i}$$x_i$不是支持向量，${y}_{i}g\left({x}_{i}\right)\ge 1$$y_ig(x_i) \ge 1$, ${x}_{i}$$x_i$在正确分类的间隔一侧, ${\alpha }_{i}=C$$\alpha_i=C$ 说明 ${y}_{i}g\left({x}_{i}\right)\le 1$$y_ig(x_i) \le 1$, 这些都可以从式(22)-式(30)的KKT条件推出，下面还会推导)

### 2.2 更新Ei${E}_{i}$$E_i$,方便下一次的 b$b$$b$ 计算

$\begin{array}{}\text{(59)}& {E}_{i}=\sum _{s}{\alpha }_{j}{y}_{j}{K}_{ij}-{y}_{i}\end{array}$

## 3 α$\alpha$$\alpha$选取策略

### 3.1 通过满足KKT条件与否选择α1${\alpha }_{1}$$\alpha_1$

• ${\alpha }_{i}=0$$\alpha_i = 0$
(1), 根据$C-{\alpha }_{i}^{\ast }-{u}_{i}^{\ast }=0$$C-\alpha_i^*-u_i^*=0$ 可得 ${u}_{i}^{\ast }=C>0$$u_i^*=C > 0$
(2), 根据${u}_{i}^{\ast }{\xi }_{i}^{\ast }=0$$u_i^*\xi_i^* = 0$ 可得 ${\xi }_{i}^{\ast }=0$$\xi_i^*=0$
(3), 根据${y}_{i}\left({w}^{\ast }{x}_{i}+{b}^{\ast }\right)\ge 1-{\xi }_{i}^{\ast }$$y_i(w^*x_i+b^*) \ge 1 - \xi_i^*$ 可得 ${y}_{i}\left({w}^{\ast }{x}_{i}+{b}^{\ast }\right)\ge 1$$y_i(w^*x_i +b^*) \ge 1$
(4), 综上，$\begin{array}{}\text{(60)}& {\alpha }_{i}=0⇔{y}_{i}g\left({x}_{i}\right)\ge 1\end{array}$$\alpha_i = 0 \Leftrightarrow y_ig(x_i) \ge 1 \tag{60}$

• $0<{\alpha }_{i}$0 < \alpha_i
(1), 根据$C-{\alpha }_{i}^{\ast }-{u}_{i}^{\ast }=0$$C-\alpha_i^*-u_i^*=0$ 可得 ${u}_{i}^{\ast }>0$$u_i^*> 0$
(2), 根据${u}_{i}^{\ast }{\xi }_{i}^{\ast }=0$$u_i^*\xi_i^* = 0$ 可得 ${\xi }_{i}^{\ast }=0$$\xi_i^*=0$
(3), 根据${\alpha }_{i}^{\ast }\left({y}_{i}\left({w}^{\ast }{x}_{i}+{b}^{\ast }\right)-1+{\xi }^{\ast }\right)=0$$\alpha_i^*(y_i(w^*x_i+b^*)-1+\xi^*) = 0$ 及上面一条可得 ${y}_{i}\left({w}^{\ast }{x}_{i}+{b}^{\ast }\right)-1=0$$y_i(w^*x_i+b^*) - 1=0$
(4), 综上, $\begin{array}{}\text{(61)}& 0<{\alpha }_{i}$0 < \alpha_i < C \Leftrightarrow y_ig(x_i) = 1 \tag{61}$

• ${\alpha }_{i}=C$$\alpha_i = C$
(1). 根据$C-{\alpha }_{i}^{\ast }-{u}_{i}^{\ast }=0$$C-\alpha_i^*-u_i^*=0$ 可得 ${u}_{i}^{\ast }=0$$u_i^*= 0$
(2). 根据${u}_{i}^{\ast }=0$$u_i^* = 0$${u}_{i}^{\ast }{\xi }_{i}^{\ast }=0$$u_i^*\xi_i^*=0$, ${\xi }_{i}^{\ast }\ge 0$$\xi_i^* \ge 0$可得${\xi }_{i}^{\ast }\ge 0$$\xi_i^* \ge 0$
(3). 根据${\alpha }_{i}^{\ast }\left({y}_{i}\left({w}^{\ast }{x}_{i}+{b}^{\ast }\right)-1+{\xi }_{i}^{\ast }\right)=0$$\alpha_i^*(y_i(w^*x_i+b^*)-1+\xi_i^*) = 0$${\alpha }_{i}=C>0$$\alpha_i=C>0$可得