文章目录
鲁棒最小二乘法的主要思想是对误差大的样本进行抑制,减小他们对结果的影响。这里主要整理一下参考部分的CVX代码思路。这个代码给出了三种等价的优化形式
数据初始部分
测试数据是随机生成的
randn('state',0);
m = 16; n = 8;
A = randn(m,n);
b = randn(m,1);
M = 2;
(a) robust least-squares problem
m
i
n
i
m
i
z
e
β
∑
i
=
1
m
h
u
b
e
r
(
β
T
x
i
−
y
i
)
h
u
b
e
r
(
u
)
=
{
u
2
,
∣
u
∣
<
=
M
M
(
2
∣
u
∣
−
M
)
,
∣
u
∣
>
M
\underset{\beta}{minimize}\sum_{i=1}^{m} huber(\beta^Tx_i - y_i)\\ huber(u)=\left\{\begin{matrix} u^2 ,&|u| <= M \\ M(2|u| - M),& |u| > M \end{matrix}\right.
βminimizei=1∑mhuber(βTxi−yi)huber(u)={u2,M(2∣u∣−M),∣u∣<=M∣u∣>M
这里是对他们的误差进行限制,以免回归系数过度偏向误差大的样本造成过拟合,M是一个常数。
当
∣
β
T
x
i
−
y
i
∣
>
M
|\beta^Tx_i - y_i|>M
∣βTxi−yi∣>M,令
∣
u
∣
=
∣
β
T
x
i
−
y
i
∣
=
M
+
v
,
v
>
0
|u|=|\beta^Tx_i - y_i|=M+v,v>0
∣u∣=∣βTxi−yi∣=M+v,v>0,则有
M
(
2
∣
u
∣
−
M
)
=
M
(
2
(
M
+
v
)
−
M
)
=
M
2
+
2
M
v
=
(
M
+
v
)
2
−
v
2
M(2|u| - M)=M(2(M+v)-M)=M^2+2Mv=(M+v)^2-v^2
M(2∣u∣−M)=M(2(M+v)−M)=M2+2Mv=(M+v)2−v2
v视为超出M的部分,为了放缓残差的增长速率,这个函数实际时扔掉了v的二次项
disp('Computing the solution of the robust least-squares problem...');
cvx_begin
variable x1(n)
minimize( sum(huber(A*x1-b,M)) )
cvx_end
(b)least-squares problem with variable weights
权值优化
disp('Computing the solution of the least-squares problem with variable weights...');
cvx_begin
variable x2(n)
variable w(m)
minimize( sum(quad_over_lin(diag(A*x2-b),w'+1)) + M^2*ones(1,m)*w)
w >= 0;
cvx_end
这个形式感觉有一些突兀,没有第一种来得直观
先看看误差函数
f
(
w
)
=
u
2
/
(
w
+
1
)
+
M
2
∗
w
,
u
=
β
T
x
i
−
y
i
,
w
>
=
0
f(w)=u^2/(w+1)+M^2*w,u =\beta^Tx_i - y_i,w>=0
f(w)=u2/(w+1)+M2∗w,u=βTxi−yi,w>=0
f ′ ( w ) = M 2 − u 2 / ( w + 1 ) 2 f'(w)=M^2 - u^2/(w + 1)^2 f′(w)=M2−u2/(w+1)2
可以看到,当
∣
u
∣
<
M
,
f
′
(
w
)
>
0
,
f
(
w
)
|u|<M,f'(w)>0,f(w)
∣u∣<M,f′(w)>0,f(w)单调递增,因此极值在
w
=
0
w=0
w=0处得到,反之,取
f
′
(
w
)
=
0
⇒
w
∗
=
∣
u
∣
M
−
1
f'(w)=0\Rightarrow w^*=\frac{|u|}{M}-1
f′(w)=0⇒w∗=M∣u∣−1
因此带入
f
(
w
)
f(w)
f(w)得到
f
(
w
)
=
{
u
2
,
∣
u
∣
<
=
M
M
(
2
∣
u
∣
−
M
)
,
∣
u
∣
>
M
f(w)=\left\{\begin{matrix} u^2 ,&|u| <= M \\ M(2|u| - M),& |u| > M \end{matrix}\right.
f(w)={u2,M(2∣u∣−M),∣u∣<=M∣u∣>M
等价于huber函数
这里需要说明的是
quad_over_lin(diag(A*x2-b),w’+1)的意思
A ∗ x 2 − b ∈ R m × 1 A*x2-b \in \mathbb{R}^{m \times 1} A∗x2−b∈Rm×1属于向量, d i a g diag diag将其转为对角矩阵,对应其对角元素。个人理解纯粹是为了quad_over_lin(x,y)计算
f quad_over_lin ( x , y ) = { x T x / y y > 0 + ∞ y ≤ 0 f_{\text{quad\_over\_lin}}(x,y) = \begin{cases} x^Tx/y & y > 0 \\ +\infty & y\leq 0 \end{cases} fquad_over_lin(x,y)={xTx/y+∞y>0y≤0
©quadratic program
disp('Computing the solution of the quadratic program...');
cvx_begin
variable x3(n)
variable u(m)
variable v(m)
minimize( sum(square(u) + 2*M*v) )
A*x3 - b <= u + v;
A*x3 - b >= -u - v;
u >= 0;
u <= M;
v >= 0;
cvx_end
目标值关于v,u的单调递增的函数,因此u和v越小越好
假设有
t
i
=
∣
A
i
T
x
3
−
b
i
∣
t_i=|A_i^Tx_3 - b_i|
ti=∣AiTx3−bi∣,则有
0
<
=
t
i
<
=
u
i
+
v
i
0<=t_i<=u_i+v_i
0<=ti<=ui+vi
极值条件下,必有
u
i
+
v
i
=
t
i
u_i+v_i=t_i
ui+vi=ti
v
i
=
t
i
−
u
i
,
v
i
>
=
0
⇒
u
i
<
=
t
i
v_i = t_i-u_i,v_i>=0\Rightarrow u_i<=t_i
vi=ti−ui,vi>=0⇒ui<=ti。
优化目标
f
(
u
i
)
=
s
q
u
a
r
e
(
u
i
)
+
2
∗
M
∗
v
i
)
=
u
i
2
−
2
M
u
i
+
2
M
t
i
f(u_i)=square(u_i) + 2*M*v_i)=u_i^2-2Mu_i+2Mt_i
f(ui)=square(ui)+2∗M∗vi)=ui2−2Mui+2Mti
一阶导得到
f
′
(
u
i
)
=
2
u
i
−
2
M
f'(u_i)=2u_i-2M
f′(ui)=2ui−2M,由于
∣
0
∣
<
=
u
i
<
=
M
|0|<=u_i<=M
∣0∣<=ui<=M
当
u
i
<
=
M
u_i<=M
ui<=M目标单调递减,目标的极值位置取决于t_i
u i ∗ = { ∣ t i ∣ , t i < = M M , t i > M u_i^*=\left\{\begin{matrix} |t_i| ,&t_i <= M \\ M,& t_i > M \end{matrix}\right. ui∗={∣ti∣,M,ti<=Mti>M
相对应的
v
i
∗
=
{
0
,
t
i
<
=
M
t
i
−
M
,
t
i
>
M
v_i^*=\left\{\begin{matrix} 0 ,&t_i <= M \\ t_i-M,& t_i > M \end{matrix}\right.
vi∗={0,ti−M,ti<=Mti>M
将最优解带入到优化目标种可以得到
f
(
u
,
v
)
=
{
t
i
2
,
t
i
<
=
M
M
(
2
t
i
−
M
)
,
t
i
>
M
f(u,v)=\left\{\begin{matrix} t_i^2 ,&t_i <= M \\ M(2 t_i-M),& t_i > M \end{matrix}\right.
f(u,v)={ti2,M(2ti−M),ti<=Mti>M
参考
http://web.cvxr.com/cvx/examples/cvxbook/Ch04_cvx_opt_probs/html/ex_4_5.html#source